Wired Magazine has published an article by Chris Anderson arguing that theory is dead (The End of Theory: The Data Deluge Makes the Scientific Method Obsolete). The argument: with our ability to generate vast amounts of data, there is no need for theory. Now, it's hard to parse what Anderson means by "theory" from the article. But he seems to be arguing that scientists are merely looking for correlations between various parameters, and claiming that's a sufficient analysis. Is it? Well, sometimes, yes, if it's based on a sound theoretical framework.
Deepak Singh has already called out Anderson (Chris Anderson, you are wrong), and Andrew at the Social Statistics blog has commented (The End of Theory: The Data Deluge Makes the Scientific Method Obsolete). I would like to weigh in with my perspective as an evolutionary biologist. Is theory dead in this subfield of biology?
In short, no, theory is not dead. Until a few decades back, theory was pretty much all we had in population genetics. There were some coarse experiments, but geneticists had yet to discover molecular biology. This was when the theoretical foundations of the field were developed. However, the molecular revolution changed all disciplines in biology, including those interested in answering evolutionary questions. First came allozyme analysis, then small-scale DNA sequencing, and, most recently, high throughput technologies. Along with these technical achievements came massive increases in computational power. And the contemporary interdisciplinary approaches in biology have added new experimental approaches and expertise to the data deluge.
What happened to the theory? There's a saying in the field: first, we had all theory and no data; now, we have all data and no theory. That's not quite true, but it fits Anderson's thesis. What's actually happened is that we now have the data to test much of the theory that was developed prior to the data dump. Additionally, the theory provides a framework for the experimental design of the data collection. We don't just go out and collect data, although it may seem that way. Instead, the theory helps dictate which data to collect, how to collect it, and, finally, how to analyze it. Yes, the analysis of the data requires all that theory.
Is theoretical work dead now that we have so much data? No. In fact, I would argue that theory is as healthy as it has always been. There is merely more data and empirical work. Rather than a decrease in theory, there has been a massive increase in data, which makes it appear that the amount of theoretical work has decreased. Sure, the relative contribution of theory has decreased, but that's only because the amount of empirical work has increased.
- Log in to post comments
We've been talking about this over at Nature networks too (including a link to my contribution).
I interpret Anderson as only thinking about prediction, where formally we don't need to understand the systems. But even then I still think using theory will help us to improve predictions.
And if we want to understand a system, I can't see how we can do it without a theory.
Anderson should try removing every theory based calculation from his GPS device and take a walk in the desert.
To paraphrase Kant, "Theory without data is empty, data without theory is blind."
As somebody who sits on the data analysis end of theory, let me say that Chris Anderson is completely off his rocker. Data is completely and utterly meaningless without theory to interpret it.
Like most things Wired, Anderson's article is full of hype and exaggeration, but there is an interesting idea or two hiding there. I don't think there is an "End of Theory", but the idea that one must gather data with a hypothesis or theory *already* in mind (as per the grade school version of the "scientific method") is quite thankfully coming to an end. There is nothing wrong with collecting data and *then* looking for interesting trends in it which could eventually lead to new hypotheses or theories. This is what genomicists (and now metagenomicists) do now. For what it's worth, it can even be argued that this method is truer to the original inductive scientific method proposed by Francis Bacon than the grade school version.
I would second Jason Dick's comment--data is only data in relation to theory, otherwise it's noise.
My father had this long term friend that would always ask me "Ary you poitive about that"
I'd reply "yes"
he'd say "Well only fools are positive"
With that in mind, I'd say everything is a theory, kind of the "you can't prove anything only disprove", therefore using that, everything we know as fact is actually a theory. Granted I can't disprove 1+1=2, but I can't prove it (kind of a stretch, but I'm not mathematician).
J Badger: There is nothing wrong with collecting data and *then* looking for interesting trends in it which could eventually lead to new hypotheses or theories. This is what genomicists (and now metagenomicists) do now. For what it's worth, it can even be argued that this method is truer to the original inductive scientific method proposed by Francis Bacon than the grade school version.
No, no, no! You can't gather data without having some form of theory in your head --- theory is the structure of meaning which allows you to collect data, to separate what you measure from what you don't measure. What you must mean is that you needn't have a fully worked out, explicit theory to collect data (which is very true). But data and theory are not independent -- they are two faces of the same coin.
You can't find trends or correlations without, in the first place, having decided what you're looking at, what is within the realm of relevance. Otherwise, you're going to spend your entire career correlating cell number in your dish to eyelid ticks of the birds outside your window.
I'm a genomicist, and I've always joked that my "hypothesis" when starting to analyze a new genome is that interesting new genes are going to be found there. Seriously, I don't know *what* will be interesting in a new genome until I analyze it. I haven't been disappointed yet, though. Typically, to satisfy the pedantic lovers of "hypothesis driven research" we claim in the paper that we sequenced the genome in order to find whatever cool things we found, but in all honesty, the most interesting things are invariably serendipitous and could not have been predicted by any hypothesis or theory.
Jonathan, I understand what you're saying. However, which genome you pick (ie, which taxon you sample, which species, and which individual within that species) or which sample you meta-sequence is a somewhat informed decision. You're not just collecting genomic data willy-nilly, right?
You can't find trends or correlations without, in the first place, having decided what you're looking at, what is within the realm of relevance. Otherwise, you're going to spend your entire career correlating cell number in your dish to eyelid ticks of the birds outside your window.
hear hear! That sums it up.
However, which genome you pick (ie, which taxon you sample, which species, and which individual within that species) or which sample you meta-sequence is a somewhat informed decision. You're not just collecting genomic data willy-nilly, right?
Obviously, every genome sequenced has been done so for a reason. Important crop plants have had their genomes sequenced out of a hope that the sequence will help crop breeders. Pathogens have been sequenced in the hope that this will aid medical research. Model organisms have huge research communities that likewise hope to benefit from the sequence. Evolutionary biologists clamor for more sequences of organisms in parts of the "Tree of Life" that have been neglected so far. Sometimes in this last case people *are* trying to test a pre-defined hypothesis like "Is organism X more closely related to Y or Z?", but in the other cases typically people are just hoping that the genome will yield *something* interesting and open up new paths for future research. As an evolutionary genomicist, I've been involved in both expressly evolutionary projects and served as the "tree guy" on other genome projects.
Interesting read (as well as the comments). Thanks for sharing.
It seems to me that genome papers are becoming repetitive and stale. There are exceptions, like the Pinot Noir genome paper, but most just read like statistics on the back of a baseball card. This is not to say that having a genome is not immensely useful. I am just wondering whether or not genomics/bioinformatics is now entering a phase where it will transform into a more predictive and hypothesis driven science.