The Future of Eukaryotic Genome Sequencing Is Here

Blogging on Peer-Reviewed Research

Remember when I said that the near future of eukaryotic genome sequence would involve sequencing EST libraries (collections of mRNA, or transcribed genes) rather than de novo sequencing of whole genomes? Well, I did, even if you don't remember. Anyway, a new paper in PLoS ONE puts that approach to the test for the purpose of generating sequence data to study mammalian evolution. Here is the last paragraph from the paper, summarizing why EST sequencing projects are useful in phylogenetics:

While complete genomes are the ultimate data sets for resolving phylogenetic and evolutionary issues of different kinds, the costs of producing these data sets are still at a level that that precludes a dense taxonomic sampling among higher organisms. There is therefore a need to establish methods that at reasonable costs allow the production of sequence data that can be of general interest for phylogenetic studies. Producing EST sequences is such a method that will gain more attention in the future.

The authors found that the sequences they generated were of high enough quality for phylogenetics. Additionally, they evolve at a close enough to neutral rate, so the trees constructed from these data should be reliable. EST sequencing is cheap and effective, and it allows you to sample from many more lineages than whole genome sequencing. Until de novo whole genome projects drop to thousands of dollars (rather than millions), these EST projects are the way to go.


Kullberg et al. 2007. Expressed sequence tags as a tool for phylogenetic analysis of placental mammal evolution. PLoS ONE 2: e775. doi:10.1371/journal.pone.0000775

More like this

A couple of weeks ago I suggested that the National Human Genome Research Institute (NHGRI) would no longer be funding de novo genome sequencing projects via white papers. They appear to be shifting their focus to resequencing projects to study variation (ie, this) and take a closer look at well…
The world of genomics is changing. It was initially about sequencing the genome a single representative individual from a particular species. Now, there's a large focus on polymorphism -- that is, sequencing multiple individuals from a single species to study the genomic variation in that species.…
As I have mentioned before, de novo sequencing of whole eukaryotic genomes may be a thing of the past (or, at least, these whole genome projects won't be getting very much more common). Instead, I proposed that people would use the new high-throughput technologies to sequence parts of the genome…
Mike Lynch has been getting a fair bit of hype recently for his nearly neutral model of genome evolution (see here and here). The nearly neutral theory riffs off the idea that the ability of natural selection to purge deleterious mutations and fix advantageous mutations depends on the effective…

"Additionally, they evolve at a close enough to neutral rate, so the trees constructed from these data should be reliable."

I haven't read the article, but that claim surprises me. I would think that the transcribed portions of the genome would be subject to more selective pressure than those that are not.

But I'm just a physiologist, so maybe I have no idea what I'm talking about.

By PhysioProf (not verified) on 02 Sep 2007 #permalink

Depending on the evolutionary distance between taxa, protein coding sequences may actually be the best type of sequence to use. Non-coding DNA evolves too fast to be used for phylogenetic reconstruction for taxa as diverged as all mammals. And, recall that the original phylogenies were constructed using phenotypes, which are probably under more selection than protein coding sequences.

"[M]aybe I have no idea what I'm talking about."

Apparently so. Thanks for the explanation.

By PhysioProf (not verified) on 02 Sep 2007 #permalink

Which is why we submitted a proposal to NSF to generate thousands of EST's for a bunch of annelids representing phylogenetically interesting lineages. Keep your fingers crossed for us! It ain't cheap, but it's a hell of a lot cheaper than whole-genome sequencing, and the data we'll get should be more useful for the deep phylogenetic work we want to do.

Gotta get that mammal paper...