Why most discovered true associations are inflated: Type M errors are all over the place

By agelman on November 21, 2009.

Jimmy points me to this article, "Why most discovered true associations are inflated," by J. P. Ioannidis. As Jimmy pointed out, this is exactly what we call type M (for magnitude) errors. I completely agree with Ioannidis's point, which he seems to be making more systematically than David Weakliem and I did in our recent article on the topic.

My only suggestion beyond what Ioannidis wrote has to do with potential solutions to the problem. His ideas include: "being cautious about newly discovered effect sizes, considering some rational down-adjustment, using analytical methods that correct for the anticipated inflation, ignoring the magnitude of the effect (if not necessary), conducting large studies in the discovery phase, using strict protocols for analyses, pursuing complete and transparent reporting of all results, placing emphasis on replication, and being fair with interpretation of results."

These are all good ideas. Here are two more suggestions:

1. Retrospective power calculations. See page 312 of our article for the classical version or page 313 for the Bayesian version. I think these can be considered as implementations of Iaonnides's ideas of caution, adjustment, and correction.

2. Hierarchical modeling, which partially pools estimated effects and reduces Type M errors as well as handling many multiple comparisons issues. Fuller discussion here (or see here for the soon-to-go-viral video version).

More like this

The single most useful piece of advice I can give you, along with a theory as to why it isn't better known, all embedded in some comments on a recent article that appeared in the Journal of the American College of Cardiology

Our story begins with this article by Sanjay Kaul and George Diamond: The randomized controlled clinical trial is the gold standard scientific method for the evaluation of diagnostic and treatment interventions. Such trials are cited frequently as the authoritative foundation for evidence-based…

A Quantum Bogosity Updated

One of the coauthors on the paper which I claimed was shoddy has written a comment in the original post. Which merits more commenting! But why comment in the comment section when you can write a whole blog post replying! The paper in question is 0804.3076, and the commenter is George Viamontes:…

The Winner's Curse and Scientific Publishing

Neal Young, John Ioannidis, and Omar Al-Ubaydli have an article in PLoS suggesting that because the emphasis in scientific publishing is too much on the big positive results in the big journal, many results are going to be wrong. (Remember that Ioannidis published another paper saying that many…

Gene for the placebo response? Not even close.

New Scientist trumpets the discovery of "the first placebo gene". The study in question is here. I usually don't comment on this type of study, but this time the hype is just too much for me: New Scientist describes the study as "a milestone in the quest to understand" the placebo effect; an…

Hi Andrew,

I read Russ Lenth's article

(Lenth, R. V. (2001), ``Some Practical Guidelines for Effective Sample Size Determination,'' The American Statistician, 55, 187-193)

Where he described retrospective power calculations with observed effect and sample sizes as an "empty question" b/c if the study was powerful enough the result would have been significant (not necessarily scientifically/clinically significant).

Are you talking about retrospective power calculations with the observed effect and sample size? If so how is this useful?

I think Lenth means something else that I do. Basically all I know about retrospective power calculations is in my article with Weakliem.

Retrospective power calculations are conceptually (and computationally) very straight forward. Counterfactually, if everything about the study was exactly the same except the sample size was different... So if I had twice the sample size would the result have been statistically significant? OK copy the data twice and redo the analysis (or just change the recorded "n" appropriately in the formulas.) Getting some sense of the fraction of the sample size that would have given statistical significance might be a useful metric for some.

And with a study in hand, its a quick and dirty way to do a power calculation for a future study (especially when there multiple covariates that need to be adjusted for). Stephen Senn has published something on finessing the calculations and someone else a paper on refining for power calculations for a future study.

To me its of interest in that the differential reaction of people to the copying of the data twice versus changing the "n" in formulas demonstates the difficulty of understanding random and sytematic approximation (i.e. simulation versus quadrature)

Keith

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Bye

July 11, 2010

I realize that I haven't been posting much here. We had some plans to use the Applied Statistics blog for other purposes but it didn't really work out, so from now on you can go to my main blog for your statistical entertainment.

"How many zombies do you know?" Using indirect survey methods to measure alien attacks and outbreaks of the undead

July 1, 2010

I've been told that it's zombie day, so I thought I'd link to this research article by Gelman and Romero: The zombie menace has so far been studied only qualitatively or through the use of mathematical models without empirical content. We propose to use a new tool in survey research to allow…

Scientists can read your mind . . . as long as the're allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

June 23, 2010

Maggie Fox writes: Brain scans may be able to predict what you will do better than you can yourself . . . They found a way to interpret "real time" brain images to show whether people who viewed messages about using sunscreen would actually use sunscreen during the following week. The scans were…

Ethical and data-integrity problems in a study of mortality in Iraq

April 27, 2010

See discussion here. I've linked to it from here because ScienceBlogger and investigative journalist Tim Lambert has written some on the topic.

Random matrices in the news

April 12, 2010

Mark Buchanan wrote a cover article for the New Scientist on random matrices, a heretofore obscure area of probability theory that his headline writer characterizes as "the deep law that shapes our reality." It's interesting stuff, and he gets into some statistical applications at the end, so I'll…