The Lott calling the kettle black

Posts by d-squared and John Quiggin on data mining and Lott reminded me that Lott accused his critics of data mining in a response to Webster:

The Black and Nagin paper excludes Florida after they have already excluded the 86 percent of the counties with populations fewer than 100,000. Eliminating Florida as well as counties with fewer than 100,0000 does eliminate the significance in the one particular type of specification that they report for a couple of crimes, but the vast majority of estimates were unaffected from this extreme data mining and they ignore that doing this actually strengthens some of the results.

and in a Reason interview:

I wanted all the data that were available....I didn't pick and choose, and when somebody drops out 86 percent of the counties along with Florida, you know they must have tried all sorts of combinations. This wasn't the first obvious combination that sprang to mind. And it's the only combination they report....If, after doing all these gymnastics, and recording only one type of specification, dealing with before-and-after averages that are biased against finding a benefit, they still find only benefits, and no cost, to me that strengthens the results.

So, how accurate is Lott's claim that Black and Nagin were doing "extreme data mining"?

Well, Lott's comment about dropping 86% of the counties is a red herring. Black and Nagin also got similar results if they included the small counties and just dropped Florida. They wrote:

"Nor is this result a function of our use of the large-county sample. Without Florida in the sample, the estimation of Lott and Mustard's model, which is given by equation (1), for all counties provides no evidence of an impact of RTC laws on homicide and rape."

Even if Lott failed to notice this sentence, he must have known that dropping the small counties didn't matter, since he said he reran all the regressions without Florida. Lott's accusation of "extreme data mining" was deliberately misleading.

Lott's 86% figure is also misleading. Maltz and Targonski noted problems in the county crime data that Lott used, with about 13% of counties having significant under-reporting. In Lott's reply, he argued out the regressions were weighted by population, so the size of the problem was best measured by the percentage of the population in the problem counties, which was only 6.8%. And yet when he criticized Black and Nagin he used the percentage of counties that they dropped (86%), rather than the percentage of population in those counties (about 30%).

Is it data mining to check to see if the results depend on the inclusion of a particular state? No, that is a legitimate test of the robustness of the result. It does not prove that the carry laws did not reduce crime, but it strongly suggests that something is wrong with the model.

More like this

This year’s County Health Rankings once again illustrate why geography and good health go hand-in-hand. They’re also a poignant reminder that there may be no better way to improve health for all than by focusing on the social determinants of health. Released earlier this week, the 2014 County…
As you know, I’ve been running a model to predict the outcomes of upcoming Democratic Primary contests. The model has change over time, as described below, but has always been pretty accurate. Here, I present the final, last, ultimate version of the model, covering the final contests coming up in…
I felt a little like Claude Rains (as Capt. Louis Renault) in the film Casablanca. He's the actor with the famous line "I'm shocked, shocked to find that gambling is going on here." On Sunday my neighbor asked me: “What do you think about all those coal miners with black lung?” “Shocked, shocked,”…
The margin of error is the most widely misunderstood and misleading concept in statistics. It's positively frightening to people who actually understand what it means to see how it's commonly used in the media, in conversation, sometimes even by other scientists! The basic idea of it is very…