Remember when I said that the near future of eukaryotic genome sequence would involve sequencing EST libraries (collections of mRNA, or transcribed genes) rather than de novo sequencing of whole genomes? Well, I did, even if you don't remember. Anyway, a new paper in PLoS ONE puts that approach to the test for the purpose of generating sequence data to study mammalian evolution. Here is the last paragraph from the paper, summarizing why EST sequencing projects are useful in phylogenetics:
While complete genomes are the ultimate data sets for resolving phylogenetic and evolutionary issues of different kinds, the costs of producing these data sets are still at a level that that precludes a dense taxonomic sampling among higher organisms. There is therefore a need to establish methods that at reasonable costs allow the production of sequence data that can be of general interest for phylogenetic studies. Producing EST sequences is such a method that will gain more attention in the future.
The authors found that the sequences they generated were of high enough quality for phylogenetics. Additionally, they evolve at a close enough to neutral rate, so the trees constructed from these data should be reliable. EST sequencing is cheap and effective, and it allows you to sample from many more lineages than whole genome sequencing. Until de novo whole genome projects drop to thousands of dollars (rather than millions), these EST projects are the way to go.
Kullberg et al. 2007. Expressed sequence tags as a tool for phylogenetic analysis of placental mammal evolution. PLoS ONE 2: e775. doi:10.1371/journal.pone.0000775
- Log in to post comments
"Additionally, they evolve at a close enough to neutral rate, so the trees constructed from these data should be reliable."
I haven't read the article, but that claim surprises me. I would think that the transcribed portions of the genome would be subject to more selective pressure than those that are not.
But I'm just a physiologist, so maybe I have no idea what I'm talking about.
Depending on the evolutionary distance between taxa, protein coding sequences may actually be the best type of sequence to use. Non-coding DNA evolves too fast to be used for phylogenetic reconstruction for taxa as diverged as all mammals. And, recall that the original phylogenies were constructed using phenotypes, which are probably under more selection than protein coding sequences.
"[M]aybe I have no idea what I'm talking about."
Apparently so. Thanks for the explanation.
Which is why we submitted a proposal to NSF to generate thousands of EST's for a bunch of annelids representing phylogenetically interesting lineages. Keep your fingers crossed for us! It ain't cheap, but it's a hell of a lot cheaper than whole-genome sequencing, and the data we'll get should be more useful for the deep phylogenetic work we want to do.
Gotta get that mammal paper...