"Finding signal from noise": Dr. Bancel responds

The other day I commented on an article by Peter Bancel and Roger Nelson that reported evidence that "the coherent attention or emotional response of large populations" can affect the output of quantum-mechanical random number generators.

I was pretty dismissive of the article; in fact elsewhere I gave my post the title, "Some ESP-bashing red meat for you ScienceBlogs readers out there."

Dr. Bancel was pointed to my blog and felt I wasn't giving the full story. I'll give his comments and then at the end add some thoughts of my own. Bancel wrote:

I find it disappointing that a Columbia faculty member should, in his public blog, be content to substitute facile derision for informed argument in criticizing a research article. It is an unfortunate choice, as it merely adds to today's wearisome environment of ad hominem public discourse, while missing an opportunity to educate.

I won't bother to explain here the errors in your post. Such explanations are all in the article - you would only need to spend more than "a few minutes" to appreciate that. There we explain why the RNGs are shielded, and we emphasize that the effect size is very small, which is essential to understanding why the experiment is run and analyzed the way it is. We do verify our results (as you suggest) with a re-sampling analysis over the full database of 4,000 days. All of this, again, is detailed in the paper.

You say these issues are incidental to your main critique. But it is not clear just what your main objection is. You indicate that the article is "very professional", but flawed, because we propose no theoretical framework (my interpretation of your second paragraph). This might be the entry point for an interesting discussion. But then - after tangential remarks - you pick this up at the end by suggesting (if I correctly read past the polemics) that we blindly manipulate our data, which is grossly wrong, and inconsistent with your opening comments.

It is regrettable that you have used a public forum to misrepresent work which you have, as you state, spent but a few minutes reviewing. It is also unfortunate if you have passed these misrepresentations on to a journalist. In your American Scientist article this summer, you warn journalists not to be misled by brash statements and to seek the best advice of scientists when writing about science. That works only if the scientists go the whole nine yards and make the honest effort to give good advice.

Lastly, I would say that I am doubly disappointed in your post since your own expertise is complementary to our own, and we benefit from any valid criticism based on a careful reading of our paper. Without hesitation, I'd say we welcome it.

I replied:

Thanks for the response. My blog represented my opinion based on a quick look, but I agree this is not my area of expertise. I would like to run your response (without comment from me) on both of my blogs. Would that be ok with you? I would like the readers to get both sides of the story.

To which Bancel replied:

If you feel my personal email to you is appropriate to post, please do so, but a brief response of explanation might be more interesting.

Perhaps you could suggest a couple of issues that I could address as it is still not clear to me just what your objections are.

I did appreciate two points you indicate in your post.

One is that you distinguish between the analysis and the topic itself. Most researchers conflate the admittedly questionable GCP hypothesis and the quality of analysis. These are, of course, separate issues.

The other is the difference in style between the social sciences and physics. This leads to unnecessary misunderstandings. I benefit enormously from interacting with scientists in different disciplines but the challenge is always to understand the mindset, since it determines how people frame the questions they ask.

As far as experimental physics (and the hard experimental sciences in general) and statistics, there are really 3 worlds here: laboratory research, where one works hard to achieve huge effect sizes. In this world one usually doesn't need to have much statistical sophistication. The second is modeling and simulation which is highly coupled to theory. The last is the experimental study of "natural records". This includes astronomy, geology, climate science, etc. Here you often take what you get and data can be noisy, heteroscedastic, etc. so that statistical sophistication is key. This is a caricature and of course these all overlap and interact. My point is that physics obviously isn't a monolith and good physicists may need some skills and not others. Presumably their training allows them acquire new skills as needed, often with the helpful guidance of colleagues in other fields.

I don't think it's appropriate for me to give long reply in response, so I'll just make a couple of general comments.

1. I think the biggest issue is that ESP is something that Bancel and Nelson are particularly interested in, but it's not something that I care about much at all. I don't want to go around claiming that ESP isn't real, or anything like that--I think it's enough to say that whatever effects are there, are very small, so small that they don't particularly interest me.

In contrast, I get much more irritated when people do bad science on topics that are potentially important (for example, the crappy studies I've mentioned on the blog on occasion, on topics such as political effects of the number of cabinet ministers in a country, or the alleged irrationality of voting, or the purported liberal voting tendencies of rich people, or, hmmm, was there something once about engineers having beautiful babies, or something like that . . . I can't quite remember . . .). Some statisticians get particularly outraged about shaky medical claims, but I don't know enough about medicine to get involved in such fights.

2. ESP statistics is pretty sophisticated. When you have large effects, you don't necessarily need sophisticated methods. But when effects are very weak, you might need very large sample sizes, sophisticated corrections for nonsampling errors, multiple comparisons adjustments, and so forth. I respect the statistical methods that have been developed in ESP research (and in psychometrics more generally), but I think they're still in a tough spot because of the small magnitudes of the effects they're studying.

As I've always said, what makes a statistician look good is not teasing out a small effect, but finding a huge effect that hasn't been noticed before. Sometimes fancy methods can help us find big effects (as in Red State, Blue State), but then we should be able to go back to the raw data and find these as well (again, as in Red State, Blue State). In that sense, the fancy methods are helping us do a more effective job of exploratory data analysis.

I don't have anything more to say about the Bancel and Nelson article in particular. It's out there, and youall can make your own judgments of it. (Please be polite in any comments. I appreciate that Bancel responded to my blog, and I don't want to reward him with a bunch of rude replies. Thanks.)

Categories

More like this

There is an interesting new post up at KlimaZweibel about a paper by Smerdon et al.. This is going to be all over everywhere very soon, so I may as well jump in. The title, of course, is a snark at RC; see the article A Mistake with Repercussions which points out some errors in a Zorita and Von…
Bora/ coturnix over at Science and Politics has generated a lot of conversation via his taxonomy of science blog posts, mostly relating to the call for people to start publishing data and hypotheses on blogs. Much of the discussion that I've seen centers on the question of "scooping" (see, for…
Marilyn Mann pointed me to an interesting post by David Rind over at Evidence in Medicine (thanks!). It's a follow-on to an earlier post of his about the importance of plausibility in interpreting medical literature, a subject that deserves a post of its own. In fact the piece at issue, "HIV…
Three statisticians go hunting for rabbit. They see a rabbit. The first statistician fires and misses, her bullet striking the ground below the beast. The second statistician fires and misses, their bullet striking a branch above the lagomorph. The third statistician, a lazy frequentist, says, "…

I like the idea of using the network of random number generators, but I am very sceptical of two things: the need to aggregate responses to events, and the assumed durations of reactions to events. If psi phenomena exist, you should be able to see significant outliers corresponding to single events. As Andrew said, "what makes a statistician look good is not teasing out a small effect, but finding a huge effect that hasn't been noticed before." Effect size is particularly important for hotly debated phenomena like Psi. Secondly, what was the methodology for determining events? They are all over the place! If you use the second anniversary of 9/11, then use *all* anniversaries of 9/11. If you use death and funeral of John Paul II, then use death and funeral of Benazir Bhutto. If you use the nomination of one set of US presidential and vice-presidential candidates, use *all* such nominations. There is one event per convention in 2004, but one event for Obama, one for McCain, and one for Palin. None for Joe Biden. If you use one Miss World, use all within the time frame you're investigating. At least the list major terrorist attacks is halfway comprehensive. And what is the methodology for determining event durations?

OK, I'll stop here or I'll get rude and snarky.

There is a basic epistemic problem with the combination of a) no proposed mechanism that provides a wedge for other kinds of testing and b) the claim of a small effect that can be teased out only with very sophisticated analysis. The experimenters are using this as evidence that people's thoughts can affect quantum RNGs. But that combination is just as much evidence that something else, somehow has slipped by the experimental setup. It need not be much and need not be frequent, given that the signal is so small. The lack of a proposed mechanism makes it difficult to investigate the proposed phenomenon by other means. Which means that all there is is the raw and small result.

I am very sceptical of two things: the need to aggregate responses to events, and the assumed durations of reactions to events.

Three comments, which I will state succinclty. First, if the effect is so very, very small, even if the statistics are describing a valid phenomena, what meaning, i.e., what use might the phenomenon have? That is a rhetorical question, obviously. If the phenomenon has no viable use, so what?

My second comment has to do with adjusting sample size to obtain a desired result. There is such a thing as a statistical error from too large a sample size -- something most non-statisticians don't realize. In other words, as the sample size increases, so does the potential for a false positive result.

Finally, in the world of the very, very small, all effects are fluid anyway, so an attempt to "quantize" such an ellusive psychic effect truly is meaningless in any concrete way.