Over at my old site, I lamented the apparent death of distance based tree building algorithms. Just as all of life on earth can be divided into three domains, phylogenetic methods can be split into three groups: distance based, maximum parsimony, and maximum likelihood. Distance and parsimony based approaches have been around for a while (and were used prior to the availability of molecular data). The combination of molecular data and more powerful computers allowed large molecular datasets to be analyzed using parsimony methods. Our great computing power has also allowed for the advent of maximum likelihood methods to be applied to solving phylogenies. Bayesian likelihood algorithms are the en vogue tree building methods and they can be tuned to the specific parameters observed in your data. But, as I asked in the post, what about distance based methods?
More below the fold...
The discussion above is far from comprehensive, and I don't spend a lot of time building trees so I'm not qualified to judge which method is best. That said, the appropriate method definitely depends on your data, and it's always good to confirm your phylogeny using multiple methods. Despite being published nearly twenty years ago, the neighbor joining method remains one of the most popular tree building algorithms. The article has been cited an amazing 9,820 times (according to Google Scholar). That may be an underestimate, as ISI lists it as having 13,353 citations.
The token phylogeny is shown to the left. This is the first ever neighbor joining phylogeny constructed using real data. The evolutionary distance between these frog species (from the genus Rana) were measured using allozyme loci and biochemical interactions -- not exactly DNA sequences, but the original data were published in 1978. The numbers represent the evolutionary distance along each branch. DNA sequencing was still quite difficult in the 1980s, but technological advances made in the 1990s lead to a rapid increase of DNA sequences in public databases. The neighbor joining algorithm was used to construct many of the early phylogenies using molecular data (some of these may appear in Phylogeny Friday in the coming weeks).
- Log in to post comments
Everybody has their preferences but since Neighbor-Joining, parsimony and UPGMA have different assumptions it's worth running all three on a dataset. It's technically not an obstacle so why not explore the dataset? I agree that keeping the simplest analysis is the best but the different views afforded by parsimony and UPGMA night turn up some interesting tidbits in the dataset.
Oh, and like flossing after ever meal, don't forget to bootstrap.