The Fifty Most Sequenced Bacterial Genomes

Nick Loman listed the fifty most sequenced bacterial genomes according to NCBI. A reader at Nick's blog came up with an improved list--one that reflects the genomes for which we actually have data (depending on who is doing the sequencing, a project will be registered with NCBI, often months before any sequencing is done). Here's the 'improved' top twenty:

173 Escherichia coli
82 Salmonella enterica
78 Staphylococcus aureus
69 Propionibacterium acnes
56 Streptococcus pneumoniae
56 Enterococcus faecalis
45 Bacillus cereus
42 Mycobacterium tuberculosis
36 Vibrio cholerae
29 Pseudomonas syringae
28 Listeria monocytogenes
27 Neisseria meningitidis
27 Helicobacter pylori
27 Enterococcus faecium
27 Acinetobacter baumannii
25 Yersinia pestis
23 Methanobrevibacter smithii
23 Clostridium difficile
23 Burkholderia pseudomallei
22 Campylobacter jejuni

It's kinda very cool to realize that half of the E. coli genomes are, in part, my fault (obviously lots of people are involved with the project). I've discussed that project before, but those genomes are actually commensals (bacteria that live on us and in us and typically don't cause disease): while NIAID cares about pathogens (bacteria associated with disease), they realize that we also need non-pathogens to make sense of the pathogens. In fact, when you look at the list, most of the organisms are commensals, although my sense is that most of them sequenced strains were isolated from sick patients and are thought to be associated with disease.

Also, the list contains only de novo genomes: we start from DNA and wind up with a new sequence. These are not resequenced genomes (SNP calling) where we map a strain's mutations back to a previously sequenced genome (there was some confusion about that over at Nick's place).

In the coming attractions department, I feel pretty confident that we (here I mean the larger scientific community) will be increasing these numbers massively: E. coli will probably triple, S. aureus and Enterococcus will explode many-fold, B. cereus will triple. And this could very well be a large underestimate on my part.

Of course, this then leads to the question of how one goes about analyzing hundreds of genomes. If people want to read a post about this, let me know (I have to give a talk this week though; you can tweet me at @mikethemadbiol or email, which is on the sidebar).

More like this

I've blogged before about how, for children under five, it's not the 'sexy' microbes that kill, but instead, the run of the mill ones: the bacteria that cause diarrhea and pneumonia are the culprits. One of the things I have heard a lot of recently regarding antibiotic development (and related…
It sure looks that way. Last night, I was talking to a colleague and he told me that several groups, including his, are seeing a very interesting pattern in commensal Escherichia coli (those E. coli that live in everyone's gut and aren't making us sick). In humans, it appears that roughly 20% of…
The Infectious Disease Society of America (IDSA) has released a list of the six drug-resistant pathogens scientists should be most concerned about. The AATF (Antimicrobial Availability Task Force) created a list of high-priority bacterial and fungal pathogens on the basis of ⩾1 of the…
Antibiotic-resistant infections kill 23,000 people in the US and sicken two million each year, and the problem is getting worse, warns a new report from the Centers for Disease Control and Prevention. Antibiotic Resistance Threats in the United States, 2013 ranks several strains of bacteria…

Yes, go Bacillus cereus! I find it kind of cool that we will be responsible for the tripling ;)

Please do post about the analysis part.

By Geraldine (not verified) on 05 Apr 2011 #permalink

Another month or two and my lab will add five more S. aureus genomes to that list. Must do battle with the Salmonella enterica| folk for 2nd place!

These numbers are not really surprising since it is easier sequencing the whole genome compared to cloning a single gene. In reallity, E. coli has been sequenced more than 1000-fold including all the training runs for customers bying NGS systems. Of course these (drafts) are not published. However, it could be interesting to decipher evolution of E. coli in the fridge?!