Trusted by Students Everywhere
Why Choose Us?
0% AI Guarantee

Human-written only.

24/7 Support

Anytime, anywhere.

Plagiarism Free

100% Original.

Expert Tutors

Masters & PhDs.

100% Confidential

Your privacy matters.

On-Time Delivery

Never miss a deadline.

Description: There are two components to this project : UPGMA and Neighbor Join implementation

Computer Science Nov 18, 2022

Description: There are two components to this project : UPGMA and Neighbor Join implementation. (i) You are to implement the UPGMA clustering algorithm on distance matrix inputs to construct phylogenetic trees in Newick format.

 (ii) You are to implement the Neighbor Joining algorithm on distance matrix inputs inputs to construct phylogenetic trees in Newick format with branch distances shown.

Specifications: There are two components to this project : UPGMA and Neighbor Join implementation. The input specs for both are the same format of distance matrix inputs.

  1. UPGMA: Your input (sample files are DM-p127.txt and DM-p139.txt) are distance matrix inputs, where the sample examples were completed in the video lectures on UPGMA.

 You will run the centroid linkage based UPGMA algorithm on such distance matrix inputs as described in the video lectures and textbook chapter 11.1. Calculate average inter-cluster distances efficiently as described in videos 3 and 4 on UPGMA, without needing to look again at individual pairwise distances in original distance matrix. During program execution, your output to the console should indicate at each step which two clusters are being merged in addition to the average distance between them. Your final UPGMA output to the console should include the phylogenetic tree in the parentheses-based Newick format. An inefficient implementation of UPGMA is uploaded as pysip-DM.pl with I/O expectations consistent here.

  1. The neighbor joining algorithm is described in slide 20 of NeighborJoining.pptx and corresponding video lectures. Your input s (sample files are DM-p127.txt and DM-p139.txt) are distance matrix inputs. You will run the Neighbor Joining algorithm on such distance matrix inputs. From the initial distance matrix as well as after each merge step, you are to recompute the average distances r, the transition distance matrix, and the updated distance matrix. You are to output these to the console upon their computation. I have uploaded a perl implementation of NeighborJoin as oyop-DM-modGE.pl which you are encouraged to run and test. Your output should be consistent with that program’s output.
  2.  What to turn in: You must turn in a single zipped file containing your source code, a Makefile if needed for compilation, and a README file indicating how to execute your program.

The attached perl code is based on the (former) textbook implementation of Neighbor Join that I modified heavily, but it still does not incorporate any meaningful tie-breakers.  That means that it returns a valid Neighbor Join tree, but if you want to incorporate tie-breakers you would need to use your own modifications to do so (it is not hard but just needs slightly more bookkeeping).  When the Neighbor Join tree is unique, then the perl code does return the only valid tree.  When it is not unique, then it returns a valid tree, but there may be other valid outputs too.  I am not restricting your output in the case of ties as long as you return a valid Neighbor Join tree.

Archived Solution
Unlocked Solution

You have full access to this solution. To save a copy with all formatting and attachments, use the button below.

Already a member? Sign In
Important Note: This solution is from our archive and has been purchased by others. Submitting it as-is may trigger plagiarism detection. Use it for reference only.

For ready-to-submit work, please order a fresh solution below.

Or get 100% fresh solution
Get Custom Quote
Secure Payment