Are Transcription Factors Enriched in Every Dataset?

Is it just me or does every analysis that looks for over-represented gene ontology (GO) terms turn up transcription factors? It doesn't matter if the study is looking for genes under positive selection or something else. It just seems like transcription factors are enriched in every dataset.

Tags

More like this

I commented a couple of days ago on a news item about a journal article on the evolution of gene expression in primates that had yet to be published. Well, the article has been published, and I've read it (Nature has also published a news and views piece on the study by Rasmus Nielsen). I have a…
...or how a learned to stop worrying and love evo-devo. As my mind gets a chance to process some of the stuff I heard and talked about at the meeting I just returned from, I'll post some thoughts that will help me organize my ideas (hopefully better organized than that last sentence). This is the…
(Disclaimer: this is not my field but the paper looked interesting so here goes ...) Promoters, enhancers and other DNA regulatory elements that turn on or off gene transcription are important. We've known this for quite a while. Many would argue that metazoans all have the same major gene families…
There are 24 new articles in PLoS ONE today. As always, you should rate the articles, post notes and comments and send trackbacks when you blog about the papers. You can now also easily place articles on various social services (CiteULike, Connotea, Stumbleupon, Facebook and Digg) with just one…

Rich,

Every protein that I've ever work on (be it a cytoskeletal, RNA binding or membrane associated protein) has been described as a trnascription factor has in some crapy paper. In fact I've been contemplating writing a post entitled "Why Does Every Freakin' Protein Have a Night Job as a Transcription Factor?"

Okay, so there are a bunch of proteins that are misannotated as transcription factors. Is it logical to assume that they are distributed randomly amongst all proteins? If so, this shouldn't lead to the over-representation of transcription factors in various datasets.

Is it just me or does every analysis that looks for over-represented gene ontology (GO) terms turn up transcription factors?

I think it might just be you :)

which papers are you thinking of?

A lot of GO enrichment analyses are biased by gene length. For example...if you're looking for enrichment in genes that have, say, some miRNA binding site or other sequence motif, then longer genes are more likely to have such binding sites by chance. If you just use a hypergeometric distribution (treating every gene as an equivalent "ball in a bag") to look for a GO enrichment, as is very common, the significance of long genes will be amplified. I am not sure if this applies to transcription factors, but metazoan nervous system genes tend to have long UTRs and come up (questionably) in these analyses all the time. Of course, one could argue that the longer UTRs might reflect the biology of more complex regulation and shouldn't be argued away.

I love reading papers from people who are too purely computational and get excited about enrichments in "macromolecular biosynthesis" or "cytosol". Thanks for narrowing it down for us :)

Heh! The most common annotation in GO data is "unknown". Summary: truth be told, we know sod all about what most genes do.

The last GO analysis I ran showed enrichment for "unknown function", "unknown component" and "unknown process". Conclusion: we know less than sod all about my particular system...

By Peter Ellis (not verified) on 14 Nov 2007 #permalink

Another possibility is that genomics folk tend to report enrichment of transcription factors as often as they possibly can. in part because it's one of the easiest types of overrepresentation to weave into a story about how your set of upregulated genes is mechanistically involved in subject X.

Also: Peter, I'm totally with you on the "unknown" genes. They're routinely the most prevalent in my own GO-type analysis. Either I'm really breaking ground or completely barking up the wrong tree... :-)

.