Fill This Form To Receive Instant Help
Homework answers / question archive / Description: There are two components to this project : UPGMA and Neighbor Join implementation
Description: There are two components to this project : UPGMA and Neighbor Join implementation. (i) You are to implement the UPGMA clustering algorithm on distance matrix inputs to construct phylogenetic trees in Newick format.
(ii) You are to implement the Neighbor Joining algorithm on distance matrix inputs inputs to construct phylogenetic trees in Newick format with branch distances shown.
Specifications: There are two components to this project : UPGMA and Neighbor Join implementation. The input specs for both are the same format of distance matrix inputs.
You will run the centroid linkage based UPGMA algorithm on such distance matrix inputs as described in the video lectures and textbook chapter 11.1. Calculate average inter-cluster distances efficiently as described in videos 3 and 4 on UPGMA, without needing to look again at individual pairwise distances in original distance matrix. During program execution, your output to the console should indicate at each step which two clusters are being merged in addition to the average distance between them. Your final UPGMA output to the console should include the phylogenetic tree in the parentheses-based Newick format. An inefficient implementation of UPGMA is uploaded as pysip-DM.pl with I/O expectations consistent here.
The attached perl code is based on the (former) textbook implementation of Neighbor Join that I modified heavily, but it still does not incorporate any meaningful tie-breakers. That means that it returns a valid Neighbor Join tree, but if you want to incorporate tie-breakers you would need to use your own modifications to do so (it is not hard but just needs slightly more bookkeeping). When the Neighbor Join tree is unique, then the perl code does return the only valid tree. When it is not unique, then it returns a valid tree, but there may be other valid outputs too. I am not restricting your output in the case of ties as long as you return a valid Neighbor Join tree.