David Kane on Lancet confidence intervals

I thought that the CI for the post invasion mortality rate with Fallujah included was bizarre and said so in the previous round of posts. You have 32 clusters and your post invasion mortality CI is 5.6 to 10.2. Throw in Fallujah with its huge excess death toll and your post-invasion CI is 1.4-23.2? So one cluster with a huge death toll increases the probability of a lower mortality rate? It's the relative risk CI's that make intuitive sense. 1.1 to 2.3 without Fallujah and 1.6 to 4.2 with it. Robert's graphs were like the relative risk CI's--I mean, they behaved in the intuitively expected way. I thought of emailing the Lancet1 authors to ask if there's an error in their CI's, but they probably get enough email from nuts without me adding to their spam list.

David Kane seems to be asking how the variance of a ratio can be small, when both the variance of the numerator and the denominator are big. Say we have r = X(post)/X(pre). Can r be estimated precisely, even if X(pre) and X(post) are estimated with considerable error?

Kane assumes that the answer is no, but this assumption is wrong. When X(pre) and X(post) are highly correlated, the ratio of their means can be estimated precisely, even if the individual means can't be.

1) Thanks to Tim for posting this and for providing such a useful forum for Lancet discussion. If it were not for Deltoid, I would never have gotten involved in this dispute.

2) Tim is wrong to suspect that the confidence interval for post-war mortality is incorrect. As my paper highlights, Falluja is such an outlier that the range given in the Lancet is almost certainly correct. Moreover, I have sent a copy of the paper to the authors and Les Roberts insists that they stand by all their results. If Tim has reason to believe that the post-war mortality confidence interval is wrong, he should mention it to Roberts.

3) Ragout is confused on three issues. First, as the paper demonstrates, the correlation between estimates of pre-war and post-war mortality is irrelevant (assuming normal distribution). The simulation proves this. You can assume a correlation of anything from 1 to -1 and you get, more or less, the same answer. Second, even if it were true that a high positive correlation affected the results, that would not matter in this case since there is no evidence of this in the data. If anything (although this is somewhat a side point), there is a slight negative correlation (I think) between pre/post mortality estimates among clusters. Third, there is no point in talking about whether or not we can estimate the relative risk "precisely." What is the definition of "precisely?" Maybe the correct RR confidence interval (probably something like 0.6 to 5) is "precise." Maybe it isn't.

The point is that, I think, it is a mathematical fact that, if the other information in the paper is correct, the central result (RR 2.5 with confidence interval 1.6 -- 4.2) must be wrong.

2 + 2 != 5, however much the defenders of Roberts et al (2004) might wish it to be the case.

Kane,

It seems that you are mistaken regarding some basic points of statistics. Before we clarify those points it would be hard to get into the specifics of your argument.

The original analysis was done in a frequentist framework. When carrying out such an analysis, there is no distribution of an unknown parameter (such as CMRpost) to talk about, since the parameter is not assumed to be a random variable.

Therefore, statements such as P(CMRpost < 3.7) = this-or-that, which are liberally strewn about you paper, are completely meaningless.

1) Donald Johnson is a knowledgeable writer on this topic, so I don't want to be rude. But, really, look up the definition of variance. A single outlier data point will significantly increase the estimated variance and, therefore, the confidence intervals. That is simply the formula.

2) As much fun as it is to argue with anonymoids like Sortition, I don't see his point. Although I prefer a Bayesian framework (and goodness knows that virtually every public appearance by any of the Lancet authors features a Bayesian version of their results), the exact same problem arises if you are a Frequentist. If the confidence intervals for pre- and post-war mortality given in the paper are correct, then the confidence interval for the relative risk must be wrong. It does not matter if you are Bayesian or Frequentist. The math is the same. 2 + 2 != 5

David,

Your simulations prove nothing because they assume the data are normally distributed. Given the inclusion of Falluja, the data are wildly nonnormal (nor are they unimodal).

The annual mortality rate per 1000 in post-war Fallujah was 100-200 above the Iraq-wide mean of 12.6. If the data were normal, mortality rates of -100 and more should be equally likely. But of course, negative mortality rates are not possible, and hugely negative mortality rates are a serious problem with your method.

To say this another way, if the data were really distributed normally with mean 12.3 and SD (12.3-1.4)/1.96, then the odds of observing a Fallujah in a sample of 33 clusters should be infinitesimal. Since we do observe Fallujah, and we think that this is a typical sample (as you argue), it follows that the assumption of normality is wildly wrong.

It is hard to take seriously a paper that purports to prove "_mathematical fact_s" and yet is based on arguments that are mathematically false.

Your "it doesn't matter much" is not a credible response. If it doesn't matter, let's see you make the correct argument and then we can address it.

1) Sortition, I don't know how to say this any more clearly: The exact same proof applies if you are a Frequentist. Since the authors themselves never describe their results this way (see Burnham's quote), I don't see much point in using the mind game of repeated samples. If there is a particular part of the proof that you don't understand, be specific. I am eager for feedback.

2) Ragout. It is not I who assume that the confidence intervals for pre- and post-war mortality are normally distributed. It is the authors. Now, they may be right to make this assumption. They may be wrong. But it is their assumption or, rather, the assumption of the statistical software that they use.

3) But even if you wanted to use the data from Falluja to reject a normal distribution (a reasonable thing to do), any fatter tailed distribution (say, t), makes the problem worse not better. The intuition should be obvious. There fatter the tails of the distribution for post-war mortality, the more likely it is that post-war mortality is below 3.7.

4) None of this is to say that there isn't a way to change the model to get the answers that the authors want to get. If you assume that the distribution of post-war mortality is right-skewed (i.e., that only big increases in death are possible, not big decreases), then you ought to be able to somehow get the 1.6 -- 4.2 confidence interval for the relative risk. But that is not the model that the authors used. (See their paper for the exact description.) Therefore, they must publish a correction, or withdraw the paper.

5) Note that I would not spend too much time on that exercise. If you try to create a new model, you would need the raw data to estimate it. The authors now report that the data is no longer "available." I am trying to get clarification on just what they mean by that, but my sense is that the data is gone. (The individual level data, not the aggregate data that Tim kindly posted.)

6) By the way, that thread from Tim's posting of the data brings back lots of fond memories, eh? I miss Seixon! And hope that dsquared chimes in on this topic. But note that BrendanH was the first (?) to suggest that the relative risk confidence interval must be wrong if Falluja is included. See his full results here. I am not sure how to translate between what he has done and what I did, but it seems like we agree that, if you include Falluja, you can't reject the null hypothesis that mortality in Iraq is unchanged.

David, it looks from a cursory scan of your paper that you're assuming that the ratio of two normally distributed variables is itself normally distributed (or at least, this seems to be the implication of your multiplying the confidence interval by the risk ratio). This is not the case for anything other than independent variables, and even small departures from independence can mean that the true distribution of the ratio is very very different from normal (this is why the weak instruments problem in IV estimation is so serious). I'm busy today so I can't be sure but could you confirm whether I'm right in identifying this assumption?

Erratum to the above after checking my own posts on IV estimation - the ratio of two normal distributions is Cauchy distributed, and there aren't any non-trivial cases where the ratio is exactly normal.

David says he misses comments from Seixon. For me that's like missing a bad hangover or worse still, a serious case of Montezuma's revenge.

Given all of your hand waving, David, I just wonder how many Iraqi civilians you think have died in what was an is an illegal, catastrophic war conducted on the basis of lies and deceit. Moreover, since the U.S. State Department called the mineral resources of the region "One of the greatest material prizes in history" and a source of "Stupendous strategic power" (in 1950), and senior planners like Kennan and Brezinski said that any country controlling this 'material prize' had "Veto power over the global economy", I'd like to know your rationale for the efforts you appear to be making to legitimize the invasion. How many corpses is enough in defense of naked aggression?

Just to restate what others have said (basically, so if it comes to vote counting, I've expressed a vote):
if we start with a population of Iraqi districts post-invasion, we can estimate death rates (although almost noby seems interested in doing so). We can, if we choose, say they are all in the same population (located inside Iraq), and note there is a huge variation in death rates. Or, we can note that data correlates with political fact, and that a single district, involved in high-intensity war has a higher death rate. If we restrict our population to districts that are not involved in high-intensity war, we get a more homogeneous population, covering most of the country at that point, and have sampled one site which is politically and statistically distinct.
I come across the same work in my field. I think it's much more useful to have estimates of a specified population (say, weights of adults) than to have data that's not so easily categorized (weights of humans, including fetuses, 3 week-olds, microcars filled with midgets, etc.).
Kane's argument is made. Personally, I think it's a lousy one, with a poorly specified universe.

also to note the standard textbook caveats about extrapolation of statistical estimates a long way outside the range of data. To use the Fallujah datapoint as evidence for the existence of significant probability mass below RR=1 is to postulate the existence of an unobserved "anti-Fallujah", where the crude mortality rate fell as much as it rose in Fallujah. The fact that CMR is bounded below from zero (unless the Rapture occurred during the sample period, which it didn't) means that an "anti-Fallujah" is actually impossible. To get the same effect on the confidence interval as an "anti-Fallujah" without resurrecting the dead would require lots and lots of unobserved clusters where the mortality rate fell substantially, which raises the question of why weren't more of them sampled?

Thanks to dsquared for his comments. High quality discussion like this is precisely why I asked Tim to post the paper at Deltoid. To substance:

1) I am not assuming anything about the distribution of the ratio of two normal variables. Although the paper has details, the basic trick is that, if pre- and post-invasion CMR is normally distributed, then the difference between the two is normally distributed. I show that the distribution of this difference overlaps significantly with zero. If that is true, then the lower bound of the relative risk confidence interval is too high.

2) Yet dsquared's comment raises another possible approach. Assuming zero correlation between the estimates, we can easily simulate the RR risk using the given CMRs. For R users, this would be:

> quantile(rnorm(mean = 12.3, sd = (12.3 - 1.4)/1.96, n = 10000) / rnorm(mean = 5, sd = (5 - 3.7)/1.96, n = 10000), c(0.025, 0.5, 0.975))
2.5% 50% 97.5%
0.3032679 2.4624131 5.0236085
>

Note that the mean is what the paper gives us. The upper bound is high, but not by much. The inconsistency is, as in my paper, with the lower bound.

Now, obviously, this approach is not how the original paper estimates the RR. But it is further circumstantial evidence that, if the CMRs are correct (and I am pretty sure that they are), the RR risk must be wrong.

3) As to Jeff Harvey's charming question, the more that I study this topics, the more convinced I am too trust Jon Pedersen's judgment: 100,000 excess violent deaths.

David, you're just window-dressing a problem of post-hoc covariate analysis. There are clearly two types of town in Iraq - those being blown to shit by Americans, and those being only slightly blown to shit by Americans. This was only discovered after the fact in the Lancet study, so a post-hoc analysis of deaths by high- and low-risk areas was necessary. Unfortunately, there was only one observation of a high risk area. Had the cluster survey been 10 times the size, maybe there would have been 10 clusters from "towns being blown to shit by Americans" and we'd be able to do a post-hoc high- vs. low-risk analysis.

Given we have only one such cluster, i.e. insufficient data, the natural thing to do is to exclude the high-risk set and do only a very qualitative discussion of how it might affect the data.

This happens all the time in Epidemiology. For example you do a study of sexual behaviour and HIV incidence in gay men, but discover at the end of the study that 5 of your 100 subjects are also injecting drug users, and they all got HIV. Obviously you know these people are at higher risk of HIV, but you don't have enough data to compare them with the rest sensibly. Had you 10 times as many of these people you could do a sensible covariate analysis but you can't, so you just put them aside so as not to bias your main findings. I don't really see the difference.

Also David, consider this thought experiment: randomized cluster survey of civilian mortality in England in 1940. You accidentally get a single borough in London (i.e. focus of the blitz), and are forced to conclude no change in mortality because your sample variance is too high. Do you really believe that in 1940 in England such a study would be correct? Of course, had you sampled 10 times as many clusters you would have got a borough from Plymouth, Yarmouth, ... and you would be able to conclude that in the countryside mortality was unchanged, but in the Southern cities it was much higher. Can you see any difference here?

David, thanks for your answer. In other words, mass carnage and slaughter. A vast crime against humanity, for which the occupiers are obliged to pay reparations. Moreover, I suppose this also means that you support the idea of convening war crimes trials for the various leaders and their cronies involved in this debacle e.g. Bush, Cheney, Rumsfeld, Blair, Straw, Hoon, Berlisconi, Aznar etc.

Of course, had you sampled 10 times as many clusters you would have got a borough from Plymouth, Yarmouth, ... and you would be able to conclude that in the countryside mortality was unchanged, but in the Southern cities it was much higher. Can you see any difference here?

I disagree.

Had you taken a larger sample in the area of focus, you may have been blown to bits and there'd be no number, hence it would not have happened because we wouldn't know about it.

This is a common problem, collecting data in a war zone. Quibblers should travel to Fallugia and do their own study, just like the Intrepid Auditors are traveling to the surface temperature stations to overturn the very fundament of science.

Best,

D

David, you don't explain why you think the CMR's are correct. Yeah, outliers increase variance (I'm so well read I actually knew that). Robert (who I hope shows up here) did his own analysis of the data and adding Fallujah increased the CI, but it also shifted the whole thing to the right, as intuition would suggest and the probability distribution he got for the excess deaths has several peaks, which I assume has something to do with how often the Fallujah cluster pops up in resamplings. Speaking as the ignorant layperson here, if I saw the result the L1 team got with the CMR including Fallujah, I'd suspect the software package was employing an inappropriate mathematical model. Like dsquared, it's not clear to me that finding a bombed-out place like Fallujah suggests the existence of other places where mass resurrections occurred.

SG, I tried that argument myself, in modified form. What if Bush had dropped a nuke on Fallujah, killing 200 in that cluster? Think of the variance and just imagine how much this would increase the probability that the invasion of Iraq lowered death rates. It didn't seem to strike David as a reductio ad absurdum like I hoped.

Anyway, in the interest of furthering my statistical education, can someone concoct a nice simple example we could do by hand which would show how adding a very high mortality cluster to a sample of low mortality clusters would widen the probability distribution so much that there is now a greater chance of lower death rates than before? I'd like to see this if it can be done. I hate computer statistical packages which give wildly nonintuitive results if you can't follow the details. Please type slowly, with lots of explanation, so I can follow. Merely telling me that outliers increase variance is unsatisfying. Or ignore me--I'm just trying to get free tutoring lessons.

I'll start. Suppose you have two clusters with mortality rates of 0 and 1. The average mortality rate is 0.5 and if you resample you get 1 run of 0,0, two runs of 0,1 and 1,0 and one run of 1,1. One out of the four runs gives an average mortality less than 0.5. Now suppose you add a third cluster with a mortality rate of 100. If I understand bootstrapping and how you'd use it to get average mortality rates, you resample with replacement 3 times because you have 3 clusters, average the mortality rate for the three clusters, then do it again over and over. In this case there's only 27 possible resamplings. And only 4 of the 27 give you an average mortality rate below 0.5. Only 1/27 gives you an average mortality rate of 0. The distribution of average mortality rates has shifted to the right. It's a lot wider, but it behaves the way I intuitively expect.
I don't find an increased chance of getting an average mortality rate less than 0.5 just because I added an enormous outlier.

So can someone concoct a simple example that behaves in a surprising way, so that I could understand how adding a Fallujah-like outlier would make a rational person think "Huh, I guess the invasion might have lowered death rates after all."

I used "average mortality rate" in two different senses in my example above, if anyone cares. That was confusing. What I meant except in its first usage was the average mortality rate in a given sample--so if you picked 0, 1, and 1 out of the 3 clusters with mortality rates of 0, 1, and 100, that's an average mortality rate for that particular resampling is 2/3.

Sorry for the clutter--I'm looking for simple examples to clarify things in my own mind, not intending to supply confusing examples with my own idiosyncratic terminology.

Ragout's point is the correct one by the way - once you have assumed that a massively bimodal dataset has a unimodal distribution, you are never going to get a meaningful answer. The Figure 2 diagram actually has 1.34% of its probability mass represented by states in which Iraqis were being resurrected from the dead, and the idea that you can correct for this problem by simply truncating the distribution at zero is clearly wrong; the very large outlier on the positive side isn't a reason to believe in a lot of non-outlier cases on the negative side.

Hang on ...

David, what are you going on about here talking about the authors assuming normal distributions? A quick read back of Roberts et al (2004) reveals, at the bottom of p3, that

"As a check, we also used
bootstrapping to obtain a non-parametric confidence
interval under the assumption that the clusters were
exchangeable. The confidence intervals reported are
those obtained by bootstrapping. The numbers of excess deaths (attributable rates) were estimated by the same
method, using linear rather than log-linear regression."

So basically your analysis is completely tangential. Roberts et al got their confidence intervals by bootstrapping from the empirical distribution of their data. The empirical distribution of the data wasn't normal and it wasn't nearly normal.

David, what's your source for your remark to Ragout above that:

It is not I who assume that the confidence intervals for pre- and post-war mortality are normally distributed. It is the authors. Now, they may be right to make this assumption. They may be wrong. But it is their assumption or, rather, the assumption of the statistical software that they use.

It seems to be flatly contradicted by the passage from p3 quoted above and I really can't find any other places in the text where they talk about a normal distribution of the noise (in fact, the terms "Normal" and "Gaussian" don't appear in the paper at all). Is this from private correspondence?

At present it seems to me that when you say:

"If you assume that the distribution of post-war mortality is right-skewed (i.e., that only big increases in death are possible, not big decreases), then you ought to be able to somehow get the 1.6 -- 4.2 confidence interval for the relative risk. But that is not the model that the authors used. (See their paper for the exact description.) Therefore, they must publish a correction, or withdraw the paper."

then you're wrong - they used a model which only had big increases, because the "model" was a resampling of the actual observations, which of course contained Fallujah but did not contain "anti-Fallujah".

Donald - you're not confused. There is no way that you can get the sort of result you describe simply by simple bootstrapping from the empirical distribution. You need further assumptions to get there - either fitting a parametric distribution (like David's normal assumption) or by messing around with the dataset - sometimes for financial applications it can be sensible to double the size of the dataset by appending a copy with the signs changed, but you need to be very careful when you're doing this about what your underlying model is, which is usually something parametric.

Thanks to dsquared for these comments. Initial thoughts:

1) It seems clear to me that the authors use two totally different methods for calculating confidence intervals. The passage that dsquared quotes from applies only to calculations for the relative risk. The paper is not completely clear on this point, but the full paragraph implies this.

2) It seems certain to me that the confidence intervals for the estimates of crude mortality rates assume a normal distribution. I quote the full paragraph in the paper. Besides the software clues, the key fact is that the authors provide an estimate of the design effect. This is a natural product of the usual normal distribution approach. How would you get a design effect if you were calculating the confidence intervals for CMR using a bootstrap? I don't think you can, but counter-examples are welcome.

3) Goodness knows that this discussion would be a lot more productive if the authors were to release their code. Elizabeth Johnson, the graduate student who actually did the calculations, does not respond to my e-mails or phone calls.

4) The other way that I can be certain (?) that dsquared is wrong in his inference that the authors used a bootstrap or other empirical method to calculate the confidence intervals for post-invasion CMR is that, if they had, the confidence interval would not have been symmetric. But it is. So, almost certainly, we are dealing with a normal distribution.

5) All of which does no prevent SG (or anyone else) from using a skewed distribution for modeling post-war mortality. Go ahead. But that is not what the authors did.

Further comments sought.

How would you get a design effect if you were calculating the confidence intervals for CMR using a bootstrap

the design effect is just the ratio of between-cluster variance to within-cluster variance, isn't it?

Kane,

You made a fundamental mistake - you are using meaningless terms, symbols and arguments (namely, all those that refer to distributions of CMRpre, CMRpost and RR).

You do not attempt to dispute that, yet you still expect your "proofs" to be accepted as valid. This is absurd. Either come up with valid proofs or abandon your position.

Donald and Tim are concerned that the confidence intervals for the CMRs are incorrect. Without access to the full data, this is tough to know for sure, but the cluster level data is enough to provide plenty of consistent evidence on this point. R users will find my R package handy.

If anyone wants, I can paste in the whole R session, but the main points are easy. Looking at just post-war CRM, we can take simple cluster averages. This is probably not what the authors did. Clusters with more people should get more weight. But it is close enough. With Falluja, the mean is 14 and the standard deviation 32. Without Falluja, the mean is 8.2 and the standard deviation is 6.6. Those means are pretty close to what the paper reports. The key is that the standard deviation is almost 5 times bigger with Falluja than without.

We can't translate these standard deviations directly into standard errors since it depends on the clumping of the data. (Actually, given that we know the design effects, this might be possible, but I haven't done it.) But a rough guess might be to just divide the standard deviation for without Falluja by the square root of 32 (for the number of clusters). This gives 5.5. Double that, and you have a one side confidence interval of 11, just about spot on to the 10.9 reported by the paper.

Again, there is a lot of hand-waving going on and we ought to divide by a bigger number since the sample size is bigger than 32 once you consider all the people in the clusters, but, big picture, there is no reason to doubt the CMR estimates and confidence intervals which the authors present. If anyone has reasons for doubting them, please present them.

Let me summarize again. Roberts et al estimate pre- and post- CMR using the mean, and calculate confidence intervals under the assumption that the means are approximately normally distributed. David notes that the confidence interval of the difference in these estimators includes zero, assuming normality. Then, Roberts et al estimate the relative risk, *using some other method*, and get a confidence interval that does not include one.

David claims that these findings are inconsistent, but this is true only in a trivial sense. Roberts et al calculated the CIs for the means under one assumption, and the CIs for the RRs under another assumption. In particular, in calculating the RRs, they modeled the log of the data, while when calculating the mean CMRs, they used the levels of the data. In addition, they took into account the correlation of the data when calculating the RRs. Finally, they reported bootstrapped SEs for the RRs, rather than assuming normality. So it is not at all surprising that CIs for the RRs and the CIs for the difference in means would give different answers.

By the way, one problem here might be that the description of RR estimator in the Roberts et al paper is totally opaque. They mention overdispersion, so I assume they used a negative binomial regression. It does seem clear, though, that they modeled log deaths, took into account the correlation over time in deaths, and reported bootstrapped SEs for the RR.

It is perhaps the original authors' fault, not Kane's, but normality is a crazy assumption here. Re "any fatter tailed distribution (say, t), makes the problem worse not better": well, that's assuming we're talking about a symmetric distribution. But death rates spike up not down. So the issue is more one of skew than kurtosis, i.e. the right tail needs to be fatter, not both tails (indeed the left tail needs to be thinner than a normal since death rates can't be negative). What happens if you instead use, say, the Poisson distribution?

David, I see you're sort of acknowledging this in part 5 of comment 22, but I'd be interested to hear: Independent of whether the authors used it or not, do you agree that normality is a really bad assumption (especially when you include the Fallujah cluster)? And don't you agree that intuitively a more reasonable model would increase -- not decrease -- the lower confidence bound relative to a normal?

Thanks for this very helpful feedback.

1) Ragout is exactly correct when he writes:

Let me summarize again. Roberts et al estimate pre- and post- CMR using the mean, and calculate confidence intervals under the assumption that the means are approximately normally distributed. David notes that the confidence interval of the difference in these estimators includes zero, assuming normality. Then, Roberts et al estimate the relative risk, using some other method, and get a confidence interval that does not include one.

Correct! Part of what we are arguing about above (like the normal assumption) would refute this summary, but I believe that ragout has it right. This is certainly what I mean to argue. Ragout goes on:

By the way, one problem here might be that the description of RR estimator in the Roberts et al paper is totally opaque.

Correct! Although I find ragout's guess about what they did reasonable, neither he nor I (nor any of you) know the exact procedure used for estimating the RR, although it seems reasonable to assume that a bootstrap was used.

All of which raises an interesting point. You don't need to use a bootstrap to calculate an RR in this (very standard) cluster set-up. So, why did they? What would the result have looked like if they just used the usual approach, as BrendanH did? My guess is that they would get the same (not statistically significant) answers that Brendan got. Since they didn't like those answers, they went searching for a model that would produce the answers that they wanted. A bootstrap did. Once they found that, they did not bother to check that the RR results were inconsistent with the CMR results.

Of course, this is all pure speculation, but there is no reason to use a bootstrap for the RR and not for the CMR.

ragout claims that my claims are true only in a "trivial sense." Perhaps! No author can fairly judge the importance of his own work. My preferred analogy is to scales and weighing. Imagine that the Lancet authors had reported that using scale A, each of two bags of apples weighs 2 pounds. Using scale B, those same two bags of apples weigh five pounds together. I then assert that, since 2 + 2 != 5, the conclusion is wrong. ragout says, "Well, they told you that they used two different scales. Your claim is trivial." Perhaps. But a scientific paper needs some minimal amount of internal consistency. You can't assert that 2 + 2 = 5 and then blame your scales. The scales are your responsibility. Why do these scales give mathematically inconsistent answers?

My friend asks:

Independent of whether the authors used it or not, do you agree that normality is a really bad assumption (especially when you include the Fallujah cluster)?

No. Normality was not a "really bad" assumption. Now, anytime you are estimating something, like crude mortality, that must be non-negative, a normal assumption will be "wrong" since it allows for negative numbers. But as long as most of the mass is greater than zero (as here), it probably doesn't matter much at all. Some simulations that we did with a truncated normal did not affect our results much if at all. So, normal is fine in this case. If the CMR estimates were screwy, you can be sure that I would critique them. In fact, my sense of the literature is that this approach is totally standard.

There is nothing wrong with assuming normality when estimating CMR as long as almost all of the posterior distribution is greater than zero.

David, there are two problems with a normality assumption. One is the mass at negative mortality as you say.

The other -- and bigger -- problem I see with normality is the Falluja outlier which is wildly improbable under a normal assumption. And this outlier doesn't come out of left field. You expect to see positive spikes in violent mortality in a war zone. (See also SG's comments about WWII above.)

So it's good to hear you did work with truncated normals to address the thin left tail, but I would expect the thick right tail to be the far bigger issue.

Couple points:

1) Note that the debate does not even turn on the normality assumption for the post-invasion CMR. Even if this is a nice skewed distribution, as long as its lower tail is at 1.4 (and it is unimodal), the last proof in the paper still holds. So, I do not need for it to be normal (although it almost certainly is). Even if the authors are using something else, as long as they are reporting the lower confidence intervals correctly, the last version of my proof (really, Mike Spagat's proof) works. The normal distribution stuff is a bit of a red herring.

2) Perhaps someone more patient than I can explain to Sortition what is going on in the paper. I have tried to make it as clear as possible. Also, Sortition, can you at least see that other commentators here seem to understand the math?

3) This issue of what distribution one ought to use to model post-invasion CMR is a tricky one. If not normal, then what? The tricky part is how much of a skew you assume/require. If you assume that CMR's below 4 (or whatever) are impossible, then you have essentially assumed your conclusion (that mortality has increased). My guess is that any distribution which gave reasonable prior mass to the possibility that CMR has gone down would produce similar results to mine. But, again, that is a non-trivial technical problem, independent of my point.

4) dsquared asks:

the design effect is just the ratio of between-cluster variance to within-cluster variance, isn't it?

Correct. But the reason, I think, that this is an interesting number, the reason that people report it in papers (like L1) is that it is almost always used in conjunction with normal models. I have never seen a paper which reported something like:

"The crude mortality rate during the period of war and occupation was 12Â·3 per 1000 people per year (95% CI 1Â·4-23Â·2; design effect=29Â·3)"

in which the estimation procedure was not based on a normal model. Has anyone?

Now, if I were really clever, I would use the information about the design effect to re-engineer what some of the other key numbers are. But, for now, I will leave that as an exercise for the reader.

My friend notes:

So it's good to hear you did work with truncated normals to address the thin left tail, but I would expect the thick right tail to be the far bigger issue.

Not for my purposes. It does not matter to me what the mean post-war CMR is, nor how fat the tail to the right. All that I need to show concerns the lower confidence interval. I am only calculating the probability that CMR_post < CMR_pre. For that calculation, all that (really) matters is the bottom percentiles for CMR_post (its 2.5th percentile and its mass below CMR = 3.7). The right hand tail (whether short or long or somewhere in between) does not matter for my proof. It is irrelevant.

Now, of course, the bigger a tail you place on the right hand side, the less mass there is to go elsewhere, including the left. I do require that the 1.4 lower CI is correct, as presented in the paper. Again, no one has shown any reason (meaning a specific alternate model) to doubt that. Roberts et al made a mistake, but not in estimating CMR.

David, I'm not sure I understand your comment #32. Isn't the point of your analysis that under normal assumptions adding the Falluja cluster dramatically changes the lower CI? If instead of a normal, you use a distribution with a fat right tail such that the Falluja cluster is no longer that improbable, its inclusion or exclusion will have much less effect on the CI. The point is -- because of the Falluja data point -- the far right tail is very important in determining the lower CI.

My entire paper assumes Falluja is included and only addresses the claims made by Roberts et al which also include Falluja. Now, the reason that the CI for post-war (versus pre-war) is so much larger is because the estimate for Falluja post-war is so different (bigger) than those for other clusters. Falluja pre-war, on the other hand, looks a lot like other clusters. You write:

The point is -- because of the Falluja data point -- the far right tail is very important in determining the lower CI.

I agree that this is possible in theory, but I have to see a concrete demonstration. Pick any right skewed non-negative distribution that you like which a) Has a mean around 12 (which will be needed to match the mean of the data) and b) has lots of mass well between 5-12 (because lots of clusters are not nearly that violent) and c) has a tail which stretches out to easily include Falluja and d) does not assume the conclusion by having zero mass below 5.

I do not think that there is an distribution which does this very well, or any better (in aggregate) than the normal distribution that the authors use. Suggestions welcome.

Once we have such a distribution, we can estimate pre and post CMRs along with their confidence intervals. We can then check to see if these are consistent with the reported RR. Perhaps it will all make sense! I doubt it. And, anyway, my point is just to demand a correction/retraction of L1 as published. It is wrong. The authors are welcome (if they can find their data) to create a new model and then publish that one instead. But they first need to admit their mistake. Or someone needs to demonstrate mine.

You guys are driving me nuts. I'm over here on vacation trying to enjoy the Tour de France (the results of which, btw, appear to have more credibility than David's results) and you guys can't do a simple mortality calculation. Months (years?) ago I posted a figure showing the [bootstrap distribution for excess mortality](http://anonymous.coward.free.fr/misc/roberts-iraq-bootstrap.png). Here's the [bootstrap distribution for the odds ratio of post- to pre-invasion mortality](http://anonymous.coward.free.fr/misc/roberts-iraq-bootstrap-relrisk.png) .

As before, I don't know what their random seed was, I don't know how many replicates they used, I don't know which bootstrap CI they used, I don't use the same software--but even with all of those caveats I come pretty darn close to Roberts' (or probably, Garfield's) results: they claim a RR of 2.5 (1.6 - 4.2) with Falluja and 1.5 (1.1 - 2.3) without it. I get 2.5 (1.2 - 5.2) with Falluja and 1.5 (1.1 - 2.2) without it.

Dang it, Robert, I was about to post a link to your old excess mortality graph and claim I'd done it.

David, this is an attempt at a constructive suggestion but it will probably come out poorly.

Why don't you try redoing your analysis of P(CMRPost) etc. using the assumption of an unobserved covariate? This unobserved covariate is a categorical variable, takes values 0 or 1, and its regression coefficient is about - what - 7? So in any cluster where it takes the value 1, it shifts the mean of the mortality to the right by (estimated fallujah CMR - CMR in remainder of iraq). The unobserved covariate is, obviously, measuring the presence or absence of a major military conflict. You can model various assumptions about the distribution of this variable, but the best assumptions are obvious: before the war, it has a value of 0 with constant probability 1. After the war, it has a binomial distribution with p=(number of clusters in fallujah)/(number of clusters in the country). I bet this will make p=1/32, approx.

Further, assume homogeneity, so the clusters with this unobserved covariate value of 1 have the same variance as the clusters with the value of 0. You'll see that these clusters have essentially 0 probability mass in regions of mortality which lie below the pre-war estimate. You can even, I'm sure, do a calculation of exactly the variance which would be needed (assuming non-homogeneity) for these observations to have any significant overlap of probability mass with the pre-war estimates. You can rejig your figures accordingly, and you'll see visually what is happening.

Obviously, if p=1/32, the chance of observing a large number of these clusters is very small, so unless you do a sample in the order of 320 clusters you are unlikely to get enough high-risk clusters to get a sensible estimate of excess mortality in the different levels of the covariate. But you don't want to - your interest is in showing how the low-risk clusters have changed, since everyone knows that shitloads of people die in high-risk clusters, and we don't need to throw away the lives of Iraqi doctors to find out.

I think while you are claiming that Roberts et al have to assume the answer of "higher deaths", it's pretty obvious that you are assuming the answer that there is no unobserved covariate corresponding to a massive bombing campaign and military intervention in one town. Given what we know of war, Iraq, and the sample, which answer do you think it is better to assume?

Robert makes a fair point which I hope to address soon. SG suggests an interesting model which I encourage him to work out himself.

For now, I want to step back and try to claim why this arcane dispute matters. Recall that the most quoted result from the article was the 98,000 excess death estimates (with confidence interval 8,000 to 194,000). Now we all realize that this estimate excluded Falluja. Now, here is a question for Lancet defenders, especially serious folks like Tim, dsquared, Robert, BrendanH and others:

Take the exact same computer code that produces the 98,000 (CI 8,000 - 194,000) estimate. Now, don't exclude Falluja. What does the code produce?

Now, without the exact code, it is tough to know the answer to this. I argue that the authors do a lot to try to imply that the mean estimate would rise to 300,000 or so and that the confidence interval would get wider but still safely exclude zero, something like 150,000 to 600,000.

But this is, I am fairly certain, false. Using the exact same code, I bet that the lower bound would be well below -100,000. See the paper for details. Moreover, it is almost certain that the authors knew this and that they purposely organized and wrote the paper in such a way as to hide this fact, to purposely mislead readers (including smart readers like Tim and dsquared) into thinking that including Falluja would move the lower bound of the confidence interval for excess deaths up. After all, excluding Falluja is "conservative!"

Now, before diving into the weeds on this topic, I would first love to establish what smart Lancet defenders believe. So, what is the answer to my question above?

Tossing in a big outlier can drive the lower CI limit down. Imagine 5 points; 3, 3, 3, 3, and 3. Now toss in one more point, 4. The upper confidence limit will now be higher, but the lower confidence limit will be lower.

Very cool discussion. I had classmates over there, although most of my year group is out of the field by now. Dave, you seem smart...for a Marine. ;)

David, it`s not an "interesting model" which I should go away and work out - it`s the answer to your conundrum. What you are doing is claiming that a very very unusual event observed in a sample of 32 observations is a true random draw from the same distribution as the other 31 observations, when we have STRONG evidence to suspect that it is a random draw from a different distribution. Fallujah was a planned event, in which the US government moved a lot of resources to ensure that the mean of the observations drawn from that area would be shifted significantly to the right. You can`t claim that it is simply another observation from the same sample as the other 31.

Let me give you another example. You do a study of fitness in boxers, so you sample 32 boxers randomly from a list provided you by the Nevada state boxing association. 31 of these yield a body fat percentage of 12, CI 11-13. One has a body fat percentage of 16. Study author DK takes this as evidence of a wide CI, and concludes that commonly boxers in Nevada have a body fat percentage as low as 8, since his revised CI ranges from 5 - 20. Study author SG, on the other hand, investigates the records and finds that last year the Nevada boxing association admitted women for the first time, and didn`t tell you when you they gave you the list. The 32nd observation is a woman. You cannot caim that she is just an unlikely observation from the same distribution as the rest of the sample, because she is a structurally different observation... there is a missing covariate.

In order to argue Fallujah is an unusual observation from the same distribution, you need to give a reason. The authors gave a very valid reason - a big fat war - for thinking it wasn`t. The onus lies with you to explain otherwise.

SG writes:

What you are doing is claiming that a very very unusual event observed in a sample of 32 observations is a true random draw from the same distribution as the other 31 observations, when we have STRONG evidence to suspect that it is a random draw from a different distribution.

No. This is what the authors do. See their paper. They measure the post-invasion CMR using a normal distribution with the Falluja cluster treated identically to all the others. The clusters are exchangeable. If you think that this is nuts, you should bring it up with Roberts et al, or with the Lancet's peer reviewers.

With regard to Robert's excellent post (and graphics) above, I have some comments.

1) Although Robert gets close, this is still not a replication. Is anyone else annoyed that the authors refuse to provide the details of their methodology?

2) Assume that Robert has replicated their results. Does that invalidate my claims? No! (I think.) Go back to the two scale analogy. Robert has demonstrated that there is a scale B which acts just as the Lancet authors claims it acts. I don't deny this. I just think that you can't simultaneously believe both the post-invasion CMR and the RR numbers. Robert may be correct that such-and-such procedure produces the 1.6 -- 4.2. But that result is inconsistent with a CMR_post of 1.4 -- 23.2. You can believe one. You can believe the other. But you can't, mathematically, simultaneously believe both.

3) Now, the potentially suspect part (Warning: Speculation alert!) is why the authors insisted on using the bootstrap. They only had 33 observations. The bootstrap is rarely used with so few observations. Why didn't they report the non-bootstrap results? They report standard confidence intervals in the 2006 paper (although they check them with a bootstrap).

So, another question for Lancet defenders. Why report bootstrap RR confidence intervals in 2004 but not in 2006?

I think that standard results for the RR would have yielded a lower bound for the confidence intervals well below 1. The authors did not like that result, so they hid it. They looked around for a method which would give the result that they wanted, found one, and then reported it (without noticing that it contradicted their other results).

Fortunately, it is easy for the authors to prove me wrong. Just tell us what the standard calculation for the relative risk produces when Falluja is included. As long as it is something similar to 1.6 -- 4.2, there is no problem. But if it is significantly different (as the quotes from Burnham and Roberts suggest), I think that they are guilty of purposely misleading their readers.

"Tossing in a big outlier can drive the lower CI limit down. Imagine 5 points; 3, 3, 3, 3, and 3. Now toss in one more point, 4. The upper confidence limit will now be higher, but the lower confidence limit will be lower."

Oh, goody, someone at my level maybe. From what I glean in this thread, it seems to depend on what sort of distribution you fit to the data. If you assume a normal distribution previously you had no variance and your bell curve was a delta function spike at 3. Add the 4 outlier and there's a nonzero variance and so there's a chance of a result less than 3.

But if you do a simple bootstrap (as I understand it anyway), that won't happen. Finding a 4 doesn't give you any reason to think there's a 2 in the vicinity. You resample from 3,3,3,3,3, and 4 and there's no way any collection of clusters you get will have an average value of less than 3. Because, you know, they are all 3 or bigger.

This whole business seems vaguely arbitrary to me, probably from not understanding it enough. But the bootstrap approach (as I understand it) seems to stick closer to the data you actually have.

And David Kane, yeah, when you brought this up a few weeks ago and I looked at the CMR with Fallujah I thought the result was nuts. Maybe someone should have objected. BTW, I thought it was 32 non-Fallujah, with Fallujah being 33. And that brings up something I once thought of. The prewar and postwar time periods are of different lengths, which complicates things, but there was 1 violent death prewar, which means one violent cluster. There were 15 clusters with violent deaths postwar, or 14 violent clusters and 1 superviolent cluster if one wants to treat Fallujah differently but I'm lazy and don't. I don't know how to correct for the time duration problem, but this alone kinda suggests an increase in violent deaths due to the invasion. Maybe you could do simple binomial statistics on clusters, classifying them as violent or nonviolent. This is about my speed. Lumping Fallujah in I get a variance of Npq = 33*(15/33)*(18/33) = 8.2. Sqrt 8.2 is 2.8 and I'll double that for my CI half width and you have 5.6, so my informal CI for violent cluster number out of 33 would be 9.4 to 20.6. I'm not sure what to do with the prewar number since the time duration is different, but say it counts as 2 violent clusters instead of 1, which seems generous to me. The variance would be about the same, so if that's 2 violent prewar clusters with a variance of 2, then that's 2 plus or minus 2.8 and with a maximum value of 4.8 there's no overlap with the postwar CI. I did a bad thing there--spreading my CI for the prewar violent cluster number into the negative range, when really the distribution skews right and could be approximated by the Poisson distribution, but hell, it's not going to do too much harm in this case. I think the upper end of the CI goes up slightly if you do it right. Someone who took this seriously could do it more carefully, but it seems clear the war caused an increase in the number of violent clusters. What a surprise.

As I read it David, that`s not what they do. They present the RR and CI with Fallujah excluded as their main result, and then do a bootstrap estimate of the confidence interval with Fallujah included.

The bootstrapping simply assumes that the probability of the "high-risk town" covariate occurring in the sample reflects its true population proportion. So in any sample of 32 clusters the probability of more than one such cluster occurring is very rare, but a much more likely effect is that in most clusters it won`t occur, biassing the confidence interval calculation towards a higher lower end. You can do the same thing with any data set where an unexpected covariate has introduced a highly unusual data point - in the absence of enough data to treat the unexpected covariate as a confounder, bootstrapping is a way to get an estimated confidence interval without throwing away any data. There are only 32 points, after all.

For example, David:

We estimate that there were 98000 extra deaths (95% CI 8000-194 000) during the post-war period in the 97% of Iraq represented by all the clusters except Falluja.

So they have restricted their report of extra deaths to the low-risk towns outside of Fallujah, and have explicitly given a figure for the death rate in 97% of Iraq. Does this sound like it might possibly represent a covariate analysis to you? With insufficient data for the remaining 3%?

I got carried away with my sloppy statistics for dummies approach, but there's a serious point underneath all this. David, do you really think there's the slightest chance the invasion of Iraq lowered death rates? I can still remember those days of late 2004 when it was still considered quite daring in mainstream political circles in the US to suggest that the Iraqis might be suffering more violence and death under occupation than they had during the late years of Saddam's rule, but nowadays I think you'd have to be blind not to realize this is exactly the case, and it was the case in 2003-2004 as well, though the violence has gotten worse since. Looking at L1 in the simplest most naive way possible, one sees one violent death in the (somewhat shorter) period before the war and 21 violent deaths (excluding Fallujah) after the war. 1 violent cluster (in a shorter period of time) prewar and 15 postwar. Any sophisticated statistical analysis which takes this data and somehow "shows" that there is a chance the death rate went down following the war is clearly insane. (I'm ignoring the nonviolent deaths--in theory those could have decreased once the war started, but it's not an effect anyone would expect, given the destruction of infrastructure, and the data shows an increase there too.)

David says:

>If you assume that CMR's below 4 (or whatever) are impossible, then you have essentially assumed your conclusion (that mortality has increased).

But CMR's can't be below 4. You accept that they can't be below 0, but the rates can't be below 4 either. It follows that the low end of the CI 1.4-23.2 contains no useful information and analysis based on that low end will not tell us anything useful.

Nor is true that accepting minimum values for CMRs will inevitably lead to the conclusion that mortality has increased. Mortality could have decreased if there had been a high CMR pre-war.

David, do you really think that there is a non-zero probability that the post-war CMR in Iraq was less than 2.2?

[Why do these scales give mathematically inconsistent answers?]

No, David, and this is getting exasperating. The answers aren't "inconsistent" - they're inconsistent with the assumption of a unimodal distribution which you make and the authors don't. The standard errors reported for the with-Fallujah data come from the bootstrap. They don't come from a normal distribution. You keep saying they do, based on your belief that "they must have done", but they don't, and the report says that they don't.

What is your reference for claims like this:

[No. This is what the authors do. See their paper. They measure the post-invasion CMR using a normal distribution with the Falluja cluster treated identically to all the others]

The words "normal", "lognormal" and "Gaussian" don't appear in the paper. The phrase "measure the crude mortality rate using a normal distribution" does not appear, probably because it is gibberish. A simple declarative sentence stating that the reported confidence intervals come from a bootstrap, does appear.

I pointed this out in #20, and your answer in #22 was entirely based on "it seems". That isn't good enough. (it's also pretty murky that in #22 you were claiming that you didn't think a design effect could possibly be calculated in the context of a bootstrapped CI, which is also gibberish, but by #31 you've goen back to "it seems to me).

[Robert may be correct that such-and-such procedure produces the 1.6 -- 4.2. But that result is inconsistent with a CMR_post of 1.4 -- 23.2. You can believe one. You can believe the other. But you can't, mathematically, simultaneously believe both. ]

It is not inconsistent with the CMR confidence interval. It is inconsistent with the combination of the CMR confidence interval and the assumption that the CMR confidence interval is a parametric estimate based on a unimodal distribution. The second part of that conjunction is untrue.

[So, another question for Lancet defenders. Why report bootstrap RR confidence intervals in 2004 but not in 2006?]

Because the 2006 dataset did not contain any Fallujah-like clusters, and therefore the data was distributed roughly unimodally. As even a minute of thought would have revealed.

David, at present your entire paper rests on your assumption that the reported confidence intervals are based on parametric estimates from a unimodal distribution. The paper says pretty specifically that they were calculated by bootstrapping from a bimodal empirical distribution. This is a big problem for your paper. The nature of the dataset makes it very clear that the bootstrap was the correct way to calculate the confidence interval for the risk ratio. That is another big problem for your paper. Finally, your paper has the implication that it would be sensible for a statistician to conclude that the discovery of mass deaths in Fallujah is evidence in favour of the proposition that the death rate had fallen. That's the really, really big problem for your paper.

David, in the midst of one of the most savage wars in recent memory, from which over 1 million people have fled abroad, another 1 million or so have been driven from their home within Iraq, are you seriously asking the public to believe that Iraq may be a SAFER place to live than before the 2003 invasion?

If that is the conclusion, then it so defies common sense that there must be something wrong with the model or logic that led to that conclusion.

in fairness to David he is discussing the 2004 paper here, so the possibility that things had got better during the sample period wasn't quite as crazy as to make the same assertion for the period 2003-present.

David Kane bet:

Take the exact same computer code that produces the 98,000 (CI 8,000 - 194,000) estimate. Now, don't exclude Falluja. What does the code produce? Now, without the exact code, it is tough to know the answer to this. I argue that the authors do a lot to try to imply that the mean estimate would rise to 300,000 or so and that the confidence interval would get wider but still safely exclude zero, something like 150,000 to 600,000. But this is, I am fairly certain, false. Using the exact same code, I bet that the lower bound would be well below -100,000.

[You lose that bet](http://anonymous.coward.free.fr/misc/roberts-iraq-bootstrap.png).

Many thanks for the helpful comments. I want to focus my next couple of replies to Tim and dsquared, not because I don't value the other participants, but because they are clearly the experts on this topic.

Tim writes:

But CMR's can't be below 4. You accept that they can't be below 0, but the rates can't be below 4 either. It follows that the low end of the CI 1.4-23.2 contains no useful information and analysis based on that low end will not tell us anything useful.

Maybe I am missing something, but what basis is there for claiming that it is impossible that post-war CMR < 4? Whatever formula that the authors use allows for that possibility. I tried to show above that there is every reason to believe, given the cluster-level data that we have, that the 1.4 - 23.2 range is reasonable. Should I add a section to the paper with this argument?

Perhaps dsquared could referee this one. Surely he agrees that it is possible, given the data we have and the authors' reasonable choice of models, for post-war CMR to be below 4 . . .

Nor is true that accepting minimum values for CMRs will inevitably lead to the conclusion that mortality has increased. Mortality could have decreased if there had been a high CMR pre-war.

I am not sure I understand this comment. My point is that there is a non-trivial probability that "mortality could have decreased." So we agree? I also agree with the first sentence here. Am I missing something?

David, do you really think that there is a non-zero probability that the post-war CMR in Iraq was less than 2.2?

Yes! I believe Roberts' et al data and models when it comes to estimating the post-war CMR! Their modeling choices are not the only reasonable ones. I would be eager to see other people try other models (and get access to the data.) But their choices are fine, even standard. I am no expert on this literature, but I think that the approach they use is the most common. They tell us that the confidence interval is 1.4 -- 22.3 and (it is virtually certain) this interval is based on the normal distribution. So, according to them, the probability that post-war CMR is below 2.2 is 3.5%, as you can see for yourself from this R code.

> pnorm(2.2, mean = 12.3, sd = (12.3 - 1.4)/1.96)
[1] 0.035

If I am missing something here, please let me know. I am not asserting that this is the world's best estimate. I am just claiming that this is what the paper says. Am I mistaken?

David, this is where Sortition's comments become entirely relevant. You can't move back and forth between confidence intervals of an estimate and posterior probability distributions of a parameter in the way that you're doing here. And also, the paper does in fact say that the reported CIs come from the bootstrapped estimates.

dsquared writes:

No, David, and this is getting exasperating. The answers aren't "inconsistent" - they're inconsistent with the assumption of a unimodal distribution which you make and the authors don't. The standard errors reported for the with-Fallujah data come from the bootstrap. They don't come from a normal distribution. You keep saying they do, based on your belief that "they must have done", but they don't, and the report says that they don't.

What is your reference for claims like this:

[No. This is what the authors do. See their paper. They measure the post-invasion CMR using a normal distribution with the Falluja cluster treated identically to all the others]

Clearly, I need a section in my paper which addresses this point. Fortunately, it is an easy one to make! The confidence interval (1.4 -- 23.2) reported by the authors for post-war mortality is symmetrically distributed about the mean (12.3). That is, 12.3 - 1.4 = 10.9 and 23.2 - 12.3 = 10.9. It is perfectly symmetrical. So, given that the Falluja data is such an outlier, it is impossible that any sort of bootstrap procedure was used to construct this confidence interval. Any procedure which did not, from the start, assume a symmetric parametrization would have produced a highly skewed one because of Falluja.

Now, it is true that symmetric does not guarantee unimodal, much less normal. I was, perhaps, going to quickly is assuming that it does. But the first step here is to agree that it must be symmetric. There is no way that a bootstrap procedure of any type could have produced a symmetric confidence interval. Once we agree on that, we can move on to discuss whether this symmetric parametric form must also be unimodal and, then, normal.

I apologize if dsquared finds this exasperating but I am doing my best to make progress in good faith on this issue. I realize that he and Tim have spent countless hours arguing against stupid and malicious people on this topic. I thank them for their time.

David Kane responded:

David, do you really think that there is a non-zero probability that the post-war CMR in Iraq was less than 2.2?

Yes!

Hmmm. I do not know of any nation-sized population that has ever had a CMR below 2.2. There may be one or two exceptional cases from small countries experiencing high in-migration, but certainly nothing sustainable. In long-run equilibrium, a CMR of 2.2 implies a life expectancy of around 450, which is kind of high.

dsquared and I are commenting here is real time, so the back and forth may not make sense. But we do have a disagreement about his claim that "the paper does in fact say that the reported CIs come from the bootstrapped estimates." For those to lazy to check the paper itself, here is the key section.

For every period of analysis, crude mortality,
expressed as deaths per 1000 people per year, was
defined as: (number of deaths recorded/number of
person-months lived in the interviewed households)
12 1000. We estimated the infant mortality rate as the
ratio of infant deaths to livebirths in each study period
and presented this rate as deaths per 1000 livebirths.
Mortality rates from survey data were analysed by
software designed for Save the Children by Mark Myatt
(Institute of Ophthalmology, UCL, London, UK), which
takes into account the design effect associated with
cluster surveys, and reconfirmed with EpiInfo 6.0. We
estimated relative and attributable rates with generalised
linear models in STATA (release 8.0). To estimate the
relative risk, we assumed a log-linear regression in
which every cluster was allowed to have a separate
baseline rate of mortality that was increased by a clusterspecific relative risk after the war.15 We estimated the average relative rate with a conditional maximum
likelihood method that conditions on the total number
of events over the pre-war and post-war periods, the
sufficient statistic for the baseline rate.16 We accounted
for the variation in relative rates by allowing for overdispersion in the regression.15 As a check, we also used
bootstrapping to obtain a non-parametric confidence
interval under the assumption that the clusters were
exchangeable.17 The confidence intervals reported are
those obtained by bootstrapping.

dsquared's interpretation of this (all confidence intervals in the paper are from bootstrapping) is reasonable, but I do not think that this is what the authors intended. They meant that just the confidence for the relative risk intervals are from a bootstrap. I think that my interpretation is correct because, given what an outlier Falluja is, there is no way to bootstrap your way (at the cluster-level) to a symmetric confidence interval for post-war mortality. (Counter-examples to this claim are welcome.)

By the way, it sure would be nice if we had access to the code so that we didn't have to waste time on this sort of easily-settled dispute. Perhaps dsquared could seek clarification from Roberts on this point. We could then move on to more substantive topics.

But David, if the crude mortality rate confidence intervals were calculated from a unimodal parametric distribution and the relative risk ratio confidence intervals were bootstrapped from a bimodal empirical distribution, then doesn't that totally undermine the point of your paper? The underlying assumptions are so different that the calculations you're carrying out are not valid, and all your point boils down to is a call to publish the bootstrapped confidence intervals for the crude mortality rates, which are (for obvious reasons) not going to show any material possibility that the death rate fell.

But David, if the crude mortality rate confidence intervals were calculated from a unimodal parametric distribution and the relative risk ratio confidence intervals were bootstrapped from a bimodal empirical distribution, then doesn't that totally undermine the point of your paper?

No. At least I don't think so. But before we decide if my paper has a point, we need to agree as to whether or not my description of Roberts et al (2004) is accurate. (If it isn't accurate, there isn't much point in going on.)

So, are we in agreement that "the crude mortality rate confidence intervals were calculated from a unimodal parametric distribution"? (Actually, all I have proven above is that it must be a symmetric distribution. It could be tri-modal for all we know. But it seems to me that, once we agree that it is symmetric, it is pretty obvious that it must be normal.)

Robert,

1) If you think that it is impossible for CMR in Iraq to be 2.2, you should let Roberts and the Lancet know. If you're right, their published results must be wrong!

2) With regard to your graphics, you are using a bootstrap, which is fine for calculating the relative risk. But that is not how Roberts et al calculated excess mortality. (See my paper or the original for relevant quotes/calculations.) Follow their formula and you will get, I am fairly certain, a confidence interval for excess mortality more like the one which I present.

[1) If you think that it is impossible for CMR in Iraq to be 2.2, you should let Roberts and the Lancet know. If you're right, their published results must be wrong!]

NO! (and no about what the paper says, too). You can't go back and forth from a confidence interval to a posterior distribution like this, unless you're prepared to share your prior. If Robert had a prior distribution for CMR which put zero or negligible probability on 2.2 (as he very well might, since this is half the nonviolent death rate pre-war), then the statements are completely consistent. Get it right, please.

[But that is not how Roberts et al calculated excess mortality. (See my paper or the original for relevant quotes/calculations.) ]

DAVID! they aren't there! There is no quote in your paper that says or implies that the confidence intervals for the risk ratio or excess deaths were calculated any way other than bootstrapping from the empirical distribution. That's because "The confidence intervals reported [for the risk ratio -dd] are those obtained by bootstrapping." We are agreed, (I thought) that the relative risk CIs were boostrapped. Your paper contains two different estimates of parametric CIs and doesn't mention the word "Bootstrap" once.

1) Can we leave aside for now any complaints about going back and forth between confidence intervals and posteriors. I like to think that, with a diffuse prior, all of this is the same, but it is not critical to my point.

2) So, are we in agreement that the confidence intervals for post-war mortality were not calculated from a bootstrap? As you point out correctly, this is a critical issue.

3) Assuming we are, it is easy to see that the same thing must be true for the excess deaths. The paper reports (and I quote).

We estimated the death toll associated with the conflict
by subtracting preinvasion mortality from post-invasion
mortality, and multiplying that rate by the estimated
population of Iraq (assumed 24Â·4 million at the onset of
the conflict) and by 17Â·8 months, the average period
between the invasion and the survey.

Nothing about the bootstrap here. So, if CMR is not estimated from a bootstrap, then excess deaths are not estimated from a bootstrap.

You write:

We are agreed, (I thought) that the relative risk CIs were boostrapped. Your paper contains two different estimates of parametric CIs and doesn't mention the word "Bootstrap" once.

You are correct, We are agreed that the relative risk CIs were boostrapped. I have always understood that to be the case and will add something to my paper to make that more clear. Thanks for the suggestion.

[1) Can we leave aside for now any complaints about going back and forth between confidence intervals and posteriors. I like to think that, with a diffuse prior, all of this is the same, but it is not critical to my point]

No we can't. The assumption of a diffuse prior is something that would need to be defended here, it appears to be important to your argument in a number of key places and it looks totally wrong to me. In particular, your claim that the two estimates are inconsistent (and your argument against Tim Lambert's point that the post-war CI is wrong) appears to depend quite strongly on it.

[2) 2) So, are we in agreement that the confidence intervals for post-war mortality were not calculated from a bootstrap? As you point out correctly, this is a critical issue.]

No we aren't. I am prepared to join in with the arguments of other people (like Tim and Robert) who appear to agree with you on this, but I don't think it's supported by the actual text of the article.

[3) Assuming we are, it is easy to see that the same thing must be true for the excess deaths. The paper reports (and I quote).

We estimated the death toll associated with the conflict by subtracting preinvasion mortality from post-invasion mortality, and multiplying that rate by the estimated population of Iraq (assumed 24Â·4 million at the onset of the conflict) and by 17Â·8 months, the average period between the invasion and the survey.

Nothing about the bootstrap here.]

Nothing about the confidence intervals either! In general, there's a couple of points in this thread ("measure the crude mortality rate using a normal distribution" being a prime example) where you seem to be quite confused about the difference between estimating a quantity of interest and estimating the confidence interval of the estimate.

[ So, if CMR is not estimated from a bootstrap, then excess deaths are not estimated from a bootstrap.]

But if the confidence intervals of the CMR are estimated from a bootstrap, then the confidence intervals of the excess deaths would normally also be estimated using a bootstrap. In fact, even if the confidence intervals of the CMRs were calculated parametrically, the confidence interval for the difference of the two rates might be estimated using a bootstrap - you would do this, for example, if you were worried about an unknown correlation between pre and post invasion mortality rates.

I think that this is a case of a mistake made because of this confusing form of words you use "estimating the CMR usign a bootstrap". The CMR is not estimated using a bootstrap - it's estimated by dividing the number of deaths by the population. The confidence interval of the CMR can then be estimated either parametrically or by bootstrap.

by the way, a better analogy than this ...

[ Imagine that the Lancet authors had reported that using scale A, each of two bags of apples weighs 2 pounds. Using scale B, those same two bags of apples weigh five pounds together. I then assert that, since 2 + 2 != 5, the conclusion is wrong. ragout says, "Well, they told you that they used two different scales. Your claim is trivial." Perhaps. But a scientific paper needs some minimal amount of internal consistency. You can't assert that 2 + 2 = 5 and then blame your scales. The scales are your responsibility. Why do these scales give mathematically inconsistent answers?]

would be ...

The Lancet authors weigh a bag of oranges and a bag of apples on some dodgy scales. They get the result that the oranges weigh 40kg +/- 5kg and the apples weigh 35 kg +/- 5 kg. They then put one bag on each end of a seesaw and the end with the apples on it goes down. David Kane claims that since they reported the weight of the apples as being less than that of the oranges, they must be wrong and the apples actually went up.

David Kane wrote:

1) If you think that it is impossible for CMR in Iraq to be 2.2, you should let Roberts and the Lancet know. If you're right, their published results must be wrong!

Actually, I do think the published result was wrong; I think the reported 95% CI for post-invasion CMR is too wide. However, as a general rule, I don't particularly sweat when CI's are too wide--the greater error is when one thinks the CI's are too narrow.

2) With regard to your graphics, you are using a bootstrap, which is fine for calculating the relative risk. But that is not how Roberts et al calculated excess mortality.

Well, maybe it was, maybe it wasn't, but they give you the same answer (as they should) and it certainly appears to be the way they calculated the CI's on the estimates.

D-squared wrote:

[2) 2) So, are we in agreement that the confidence intervals for post-war mortality were not calculated from a bootstrap? As you point out correctly, this is a critical issue.]

No we aren't. I am prepared to join in with the arguments of other people (like Tim and Robert) who appear to agree with you on this,

Actually, I don't necessarily agree with David about this. One of the basic bootstrap CI's that I calculated agrees with this but because the bootstrap distribution is so skewed I prefer to use one of the bias-resistant estimators of CI (e.g., either the percentile or the BCa CI (or Tibshirani's abc variant of the BCa CI)). This is sort of angels dancing on the head of a pin, anyway.

1) Thanks to dsquared for these thoughtful comments. The very best peer-review that I could hope for my paper to receive is from people like him, Tim, Robert and others. Hooray for Deltoid!

2) I want to focus for now on this key point.

[2) 2) So, are we in agreement that the confidence intervals for post-war mortality were not calculated from a bootstrap? As you point out correctly, this is a critical issue.]

No we aren't. I am prepared to join in with the arguments of other people (like Tim and Robert) who appear to agree with you on this, but I don't think it's supported by the actual text of the article.

a) How is it possible for post-war mortality to be both symmetric and calculated using a bootstrap given that Falluja is such an outlier? I think that this is impossible. Could you please explain? I agree that this is central to my paper.

b) I have e-mailed the authors (cc'ing Tim and dsquared) on this point. I hope they respond. Imagine how much easier this conversation would be if we had access to the code!

c) dsquared writes:

But if the confidence intervals of the CMR are estimated from a bootstrap, then the confidence intervals of the excess deaths would normally also be estimated using a bootstrap.

Agreed! Again, this is why whether or not a bootstrap was used is so important. Indeed, it is central to my argument. Also, would you agree that the contrapositive holds? If a bootstrap was not used to calculate the CMR, can we conclude that a bootstrap was not used in calculating excess mortality?

3) You are correct to note that my terminology in this thread is a bit sloppy. Apologies. Do you see similar mistakes in the paper? I hope not! If so, please point them out. I do want every detail in the version that I distribute at JSM to be correct.

Again, the central point to figure out now is whether or not a bootstrap was used to determine the confidence intervals for the post-invasion crude mortality. If it was, I am wrong and my entire paper needs to be revised. If it was not, then even a careful reader like dsquared can be confused on a fundamental point because of the paper's opaque presentation and the authors refusal to share their computer code.

dsquared writes:
> In general, there's a couple of points in this thread ("measure the crude mortality rate using a normal distribution" being a prime example) where you seem to be quite confused about the difference between estimating a quantity of interest and estimating the confidence interval of the estimate [should be "of the quantity of interest" - Sortition].

Kane answers:
> You are correct to note that my terminology in this thread is a bit sloppy. Apologies. Do you see similar mistakes in the paper? I hope not! If so, please point them out.

As I pointed out several times, you make that mistake repeatedly throughout your paper. You keep attributing a distributions to your parameters. This is simply wrong.

As dsquared mentions: CIs have a distributions since they are random, parameters do not since they are not.

You do realise that you're arguing the following; "in measuring the performance of a typical football team, X decided to leave out Manchester United because they were freakishly good. I say that, had they included United, the average football team would be rather worse!"

And that this is why nobody agrees with you?

Although I don't want to cut off dsquared before he explains how the estimate for post-war mortality could be calculated using a bootstrap (including an extreme outlier like Falluja) and still have a perfectly symmetric confidence interval, I do want to make more progress.

Assume for a second that I am right that the confidence interval CMR post-invasion is unimodel and the numbers given by Roberts et al are correct. Does the rest of my argument follow? Or, are there mistakes beyond this point? I have not seen any pointed out above, but perhaps I have missed something. (Sortition makes a point about terminology. The best way to make progress on that is for him to quote a specific sentence and explain why (given that I am using a Bayesian interpretation) I am wrong.)

[If a bootstrap was not used to calculate the CMR, can we conclude that a bootstrap was not used in calculating excess mortality?]

No. definitely not.

Can we also agree that if the CMR CIs were calculated under the assumption of normality while the RR CI was calculated by bootstrapping some version of a negative binomial regression, that this would be good scientific practice? That it would be a completely standard way of analyzing this data?

David Kane asked:

If a bootstrap was not used to calculate the CMR, can we conclude that a bootstrap was not used in calculating excess mortality?

No.

and further to the above, the phrase "If a bootstrap was not used to calculate the CMR" is meaningless. Do you mean "If a bootstrap was not used to calculate the confidence interval for the CMR"? The answer is still "no, definitely not", because I can think of dozens of situations where you might be content with a parametric estimate of the CIs two quantities but want to have a robust estimate of the CI for their difference or ratio.

[Can we also agree that if the CMR CIs were calculated under the assumption of normality while the RR CI was calculated by bootstrapping some version of a negative binomial regression, that this would be good scientific practice? That it would be a completely standard way of analyzing this data?]

I actually agree with Ragout on this point, so I am prepared to place some probability on this being what actually happened, although I still think that the paper doesn't say this (although the drafting is pretty opaque on a number of points here).

[given that I am using a Bayesian interpretation]

I don't see any evidence that you are actually using a Bayesian interpretation btw. You keep saying you are, but beyond talking about frequentist confidence intervals as if they were probability distributions, I don't see any Bayesian statistics at all, and you're saying things (for example, your assertion that the quoted CIs commit Robert to assigning positive probability weight to CMR<2.2) that don't really make sense in a Bayesian context at all. To be honest there is a quite distressing trend in statistics these days toward using the word "Bayesian" as a licence to talk about CIs incorrectly (falling back on an unconsidered invocation of a diffuse prior when challenged) and it really really looks like that's what you're doing. And ignoring the bootstrapped estimates is a really, really odd thing for a self-styled Bayesian to be doing.

Ragout asked:

Can we also agree that if the CMR CIs were calculated under the assumption of normality while the RR CI was calculated by bootstrapping some version of a negative binomial regression, that this would be good scientific practice? That it would be a completely standard way of analyzing this data?

Hmmm. It would be a completely standard way of analyzing data like these. Whether the completely standard way of analyzing data is good scientific practice is a different question and its answer would depend on the data.

This is interesting. I almost feel like I understand chunks of it.

I understand if you want to stick to talking to the experts, but if you have time I want to repeat my earlier question. David, do you really think the L1 data with Fallujah included supports the notion that the mortality rate in Iraq decreased in 2003-2004? As I understand your position, you do think this because of the L1 CMR CI dipping all the way down to 1.4--you think this CI was calculated using a normal distribution (apologies if I'm screwing up the terminology) and that this is a sensible way to analyze the data, as though a huge outlier caused by bombing in one town somehow implies the existence of other towns where the Coalition succeeded in bringing about a capitalist democratic utopia with virtually zero death rates, if not outright resurrections. IIRC, you come close to saying this in your paper (leaving out the capitalist utopia part and the resurrections.)

Or are you just trying to show inconsistencies between the various CI's in the L1 papers? As best I can tell you've convinced everyone of this, but then it's a question of what CI is the right one to use and you seem to favor the CMR one with Fallujah included, the 1.4 to 23.2

just a passing request - could someone whose browser is more resistant to crashing when faced with fugly widget-laden websites than mine please check out what Shannon Love is pushing under the headline "Vindication Is So Sweet"? I don't think anyone is going to gain if David's paper gets widely distributed in its current state, and Love's post has already been picked up by Michelle Malkin.

I think that we are making progress! Ragout asks:

Can we also agree that if the CMR CIs were calculated under the assumption of normality while the RR CI was calculated by bootstrapping some version of a negative binomial regression, that this would be good scientific practice?

dsquared replies:

I actually agree with Ragout on this point, so I am prepared to place some probability on this being what actually happened, although I still think that the paper doesn't say this (although the drafting is pretty opaque on a number of points here).

Agreed! But one proviso (and perhaps Robert is hinting at this as well). There is nothing wrong with calculating CIs analytically (as I believe they did with CMR). There is nothing wrong with doing so using a bootstrap (as we all believe was done for RR). But what happens in those cases in which the bootstrap gives substantively different answers than the analytic result? I think that "good scientific practice" requires that you report this fact to your readers. You can't hide it.

I suspect (but do not know!) that this is what happened here. (See BrendanH for circumstantial evidence on this point.) They did the first draft just using standard analytic methods (just plug the data into STATA and write down what comes out). The results for things like CMRs seemed fine. But the result for RR was not fine. It was too broad. It included 1. It failed to reject the null hypothesis of unchanged mortality in Iraq. Then, they went looking for an approach which would give "better" answers. And the bootstrap did that. They did not notice (or care) that the analytic CI for CMR was inconsistent with the bootstrap CI for RR.

Again, I am not asserting that it is wrong to use analytic and bootstrap methods in the same paper. That's fine. I am claiming that it would be bad practice to not clearly report when the answers you get from these procedures are substantively different. Can we all agree on that as well?

I am not sure what the heck you are on about here David, but if you're trying to ask whether there was something dishonest about what the authors did then no, we can't agree about that, if you're asking whether this would be a bad way to present the results even if your suppositions are true (which we don't agree about either, then no we don't agree on that and if you're asking whether it's sensible or even ethical to accuse the authors of model-mining then we don't agree about that either.

dsquared wrote:
"in fairness to David he is discussing the 2004 paper here, so the possibility that things had got better during the sample period wasn't quite as crazy as to make the same assertion for the period 2003-present."

In fairness to everyone else, it should be pointed out that this discussion therefore is practically irrelevant and has nothing significant or even true to say about the lives of people in Iraq in this terrible war.

Interesting, if abstuse, discussion though.

> Sortition makes a point about terminology.

Completely untrue. The point is that your entire analysis is relying on meaningless quantities (again, any reference to the _distribution_ of the CMR is meaningless - see, for example, Figure 2, but really the entire paper). Saying that you are "using a Bayesian interpretation" is not an easy fix for the situation because:

1. It is not clear what the phrase "Bayesian interpretation" means. If you want to carry out a Bayesian analysis, you have to be explicit and consistent about it. State and justify your priors. Avoid talking about confidence intervals, that are a frequentist notion.

2. The paper you are criticizing does not use the Bayesian framework. Even if you do carry out a coherent Bayesian analysis, you would not be able to prove inconsistencies in the original work.

Any chance that to help convince us folks less adept at suffering the pain of statistical analysis that this stuff isn't just pure bunk you could demonstrate the methods used on a sample with KNOWN answers?

E.G., without cherry-picking, just pick a region of the USA, go and conduct a survey in the same manner (population density, number of samples, etc.) about death-rates, and then compare the answers to the known values? Might give us an idea of how worth-while the results can be.

And no fudging either. Just collect the sample data, run the code, and see the answer.

Sortition,

Well, I do not think that I am the only one having trouble following your point. Are there any results that would be mathematically inconsistent if we assume a frequentist framework? For example, imagine that instead of CMR post having a mean of 12.3 (CI 1.4 -- 22.3) the authors reported mean 4.3 (CI 3.0 -- 5.6), i.e., mean mortality going down. Would I then be able to conclude that, mathematically, a RR estimate of 2.5 (CI 1.6 -- 4.2) was impossible?

If not, then, obviously, I can never prove that any set of estimates, within a frequentist framework is inconsistent. If yes, then please tell me the steps for that proof and, whatever the steps are, I will be able to show that they also work for the data actually in the paper.

I am not being flip. It is quite possible that there is a proper way to present the proof that I have in mind but that I don't know the way to do it. Tell me the way and I can do it for myself.

Donald asks:

I understand if you want to stick to talking to the experts, but if you have time I want to repeat my earlier question. David, do you really think the L1 data with Fallujah included supports the notion that the mortality rate in Iraq decreased in 2003-2004?

Yes. As the paper shows, if you believe the CMRs provided by the Lancet (and I do), there is a 10% chance that mortality decreased post-war. Now, there is nothing wrong with having doubts about this estimate, especially if you have doubts about the quality of the underlying data (maybe families with deaths were much more likely to immigrate and therefore not show up in the survey), but that is what a standard approach gives you.

David, you really don't understand the "Bayesian" paradigm you claim to be working in, and ironically this is one of the few cases where it apparently makes a difference.

Taken in frequentist terms, the confidence interval describes what one would expect to see as the empirical distribution of the estimator CMR^, asympyotically as the number of repeated trials tended to infinity.

Taken in Bayesian terms, it reflects the (subjective) probability distribution of the random variable CMR, of which CMR^ is the estimator.

There is a difference here! In an infinitely long sequence of trials, you would expect to see CMR<2.2 quite often, because in a very long series of repetitions of the experiment, some of them would be really unlucky and get wildly misleading CMR^.

However, this doesn't mean that the subjective probability distribution of CMR ought to have any material probability weight at all on the true value being <2.2, as this would be wildly implausible even for a very young country like Iraq.

It is exactly this confusion which has led you to assume that you can carry out the calculation that you have done - you have specifically made the mistake of *not* thinking like a Bayesian.

The answer to your question is that there is not enough information given in the paper to be able to say anything about the correlation structure of the clusters. Therefore we cannot calculate the implied shape of the bivariate distribution of RR from the marginal distributions CMR-pre and CMR-post, and so no, the calculation you want to carry out is impossible on the basis of the information in the published study. You could prove that these estimates were "inconsistent" if you had more information (and if that information did in fact say the right things about the correlation structure) but you don't, so you can't.

And by the way, I am really, really, not pleased that while ostensibly asking us for comments aimed at improving the paper before distribution, behind our backs you allowed it to be distributed on the Michelle Malkin site in unchanged form. I don't appreciate having my time wasted or having been played for a sucker. And I hereby put you on notice that I do not want my name mentioned, even in passing conversation, in connection with this paper. If I find out that you have done anything to imply that I checked it, reviewed it or anything of the sort, I will be very furious indeed.

David, thanks for the response. Wouldn't this suggest that the "standard approach" isn't the right one to take? Doesn't it widen the CI to include ridiculously low values because the mathematics assumes (if that's the right word) that a very high mortality outlier implies the existence of very low outliers, when in reality there's no good reason to believe in them? People keep pointing out to you that one should expect high mortality outliers like Fallujah in a war zone and there's no reason to expect their mirror images to exist (which might even involve resurrections.)

While I'm thanking people, thanks to dsquared for the reply to my question about the effects of large positive outliers way above.

I see Michelle Malkin has already picked this up (in a post yesterday morning). Apparently, among her many talents, she is an expert in statistics. No doubt after a careful review, she has concluded that "none of [the prior] dissections comes close to [this] damning new statistical analysis of the 2004 study". By the way, David, is it true that you already submitted (Malkin says "sent") your paper to Lancet?

Donald asked:

Wouldn't this suggest that the "standard approach" isn't the right one to take?

Yup.

1) Surely we can all agree with Malkin that my critique is better than previous ones! :-) Or do others prefer Kaplan . . .

2) The paper is under review at The Lancet.

3) Donald asks:

Wouldn't this suggest that the "standard approach" isn't the right one to take? Doesn't it widen the CI to include ridiculously low values because the mathematics assumes (if that's the right word) that a very high mortality outlier implies the existence of very low outliers, when in reality there's no good reason to believe in them?

There is no doubt that the standard approach widens the CI. But so would virtually any approach. That's what happens when you have an outlier. The CI widens. Now, perhaps a different model would widen it much more on one side than the other in the presence of a single outlier. But, as I argue above, it is very hard to come up with such a model that does this and has other desirable properties. Try it.

Now, one might use this as a reason to just bootstrap everything. No models needed! Perhaps the Lancet authors should have taken that approach. Perhaps the peer reviewers should have required them to do so. But that is not what happened. They used an analytical model. (Assuming I am correct and dsquared is wrong on this point.) Until the paper is withdrawn or a correction is made, they are stuck with that assumption. I personally think that this is standard in the field and reasonable in this case. You may disagree. But the fact that you don't like their model is not a point against my deriving the ineluctable implications of that model. Their results for CMR mean that their results for RR are wrong.

Again, this all assumes that I am correct that a bootstrap was not used in estimating CMR. I look forward to dsquared's description of how this could possibly be the case. For the record, here is the e-mail that I sent to the authors on the topic.

Les,

Hope all is well. Daniel, Tim and I are having an interesting discussion about your paper over at Deltoid. Could you clarify two points?

1) First, Daniel believes that your estimate of the post-war crude mortality (12.3 with CI (1.4 -- 23.2)) was estimated using a bootstrap. I think it was not. (No way that it could be, given that it is symmetric.) Is Daniel correct?

2) Excluding Falluja, the excess deaths were 98,000 with CI (8,000 -- 194,000). Could you tell us what these numbers would be if Falluja were included? That is, run the exact same code which produces these numbers, but just don't exclude Falluja.

Thanks for your time and I look forward to meeting you at JSM.

Dave

I have been asking Les the second question for months if not years. Perhaps involving Tim and dsquared (and others) in the conversation will cause him to answer.

Can we all agree that, if the answer to the second question is something like 264,000 CI (-130,000 -- 659,000), as I argue in my paper, that the vast majority of readers of Roberts et al (2004) were misled (perhaps unintentionally)? Most (all?) readers assume that excluding Falluja was "conservative" and that including it would increase (not decrease) the lower confidence bound.

I see Michelle Malkin has already picked this up (in a post yesterday morning).

Not only that, but we picked it up from there. Although in confidence, I'm not sure our Shorter of DD was the best possible Shorter, given the breadth of argument and so forth.

David Kane asked:

Can we all agree that, if the answer to the second question is something like 264,000 CI (-130,000 -- 659,000), as I argue in my paper, that the vast majority of readers of Roberts et al (2004) were misled (perhaps unintentionally)?

Well, it isn't, so that's a pretty weird hypothetical -- but as long as we're going there the answer would be "no."

"But if you do a simple bootstrap (as I understand it anyway), that won't happen. Finding a 4 doesn't give you any reason to think there's a 2 in the vicinity. You resample from 3,3,3,3,3, and 4 and there's no way any collection of clusters you get will have an average value of less than 3. Because, you know, they are all 3 or bigger"

Ah; but your resampling will give you a normally set of samples of two types: all 3s, where mean = 3, or 3s and a 4, the mean of which depends on how many you pull for each resample. So now you have a normal distribution of a bunch of 3s and a few slightly larger than 3; it will still give you a ci extending below 3.

"are you seriously asking the public to believe that Iraq may be a SAFER place to live than before the 2003 invasion? "

I think we are getting somewhat circular, here. A priori, you have no way to absolutely rule out a RR < 1; the fact that the results of a study, which does not exclude it explicitly, does exclude it from the CI so thoroughly, indicates that a single-tailed test would probably give you a more accurate estimate. But it's still not 100% guaranteed; the FDA would not approve it in a new drug application, for instance.

If I may bring up a mostly but not totally (I think) OT example; in another publication I read a letter by somebody who is analyzing digit pair frequencies (i.e. 24, 18, 34, etc.) within the digit string of pi to some large number of decimal places. Of the 100 such combinations, he finds 92 whose frequency is within the theoretical CI95, and 8 whose frequencies are outside the CI95; and he felt that this was sort of interesting and may bear further investigation.

To me it's a similar argument as here, but inverted; while it's not 100% mathematically impossible that the RR estimated is freakishly high due to random chance and not a true measure of the "true" mortality rate, the chances of that actually being true on the one trial are, well, statistically insignificant, literally.

"Ah; but your resampling will give you a normally set of samples of two types: all 3s, where mean = 3, or 3s and a 4, the mean of which depends on how many you pull for each resample. So now you have a normal distribution of a bunch of 3s and a few slightly larger than 3; it will still give you a ci extending below 3."

Are you curve-fitting your bootstrapped results into a normal distribution? I was picturing an actual histogram of the results and of course there'd be no sample with an average less than 3. Maybe your way is standard practice. I'll let you have the last word on this hypothetical if you want it.

I take it back. One more thing about this samples of two types--all 3's or 3's and a 4. You'd also have a few with 2 4's or more. But whatever--I agree that after you compile a histogram you could fit some continuous curve to it and have it go forever in both directions.

"There is no doubt that the standard approach widens the CI. But so would virtually any approach. That's what happens when you have an outlier. The CI widens. Now, perhaps a different model would widen it much more on one side than the other in the presence of a single outlier. But, as I argue above, it is very hard to come up with such a model that does this and has other desirable properties. Try it."

I'll pass. I'd rather try to get you to step outside the models for a second and try to explain the logical connection between discovering a bombed-out city with very high casualties and the conclusion that there must be other areas with very low death rates (possibly even resurrections). I know about outliers increasing variance. I want a real-world explanation of the logic. Say you're on the Lancet I survey team and you've done 32 clusters and now you do Fallujah. You go through bombed out neighborhoods, pick one, and start tallying the deaths. 52 by violence. With all this destruction around you and with this huge number of deaths in this one neighborhood, you sigh with relief and think "Whew. I guess there's a better chance now the mortality rate in Iraq went down because of the war. Now I know there are may be other places where children are playing and sewage treatment plants are working and hospitals are well-equipped and there are bike paths and options for low fat dishes on restaurant menus. Alternatively, maybe the excess death toll is 500,000."

Maybe the fact that you find it difficult to treat the Fallujah outlier within some set of mathematical models without coming to this schizoid conclusion gives us some reason to suspect the mathematical models aren't appropriate here.

Kane,

First, please note that in the very phrasing of your question

> [...] imagine that [...] the authors reported mean 4.3 (CI 3.0 --
5.6), i.e., mean mortality going down

you keep using the same wrong notions that you have in your paper: the midpoint of the CI is not "mean mortality". The midpoint of the CI may be the mean mortality in the sample - but it certainly does not have to be.

As for proving inconsistencies:

> Are there any results that would be mathematically inconsistent if we assume a frequentist framework?

I don't want to pull a "go read the textbook" on you, but there is some complexity here that is really somewhat difficult to explain in the comments section of a blog. I'll attempt a short partial explanation.

What you are trying to do may not be impossible, but I believe it would be difficult. The reason is that a CI is not really a single specific interval at all - it is a mapping from the sample space into the set of intervals. That is, every point in the sample space (i.e, every data set collected) maps to a specific interval, but it is the entire mapping, rather than any specific interval that constitutes the CI. The condition imposed on the mapping in order for it to be a CI is a constraint on the _distribution_ of the intervals but puts very little restrictions on any specific interval, since the probability of any specific data set is very low.

For example, let CI1 and CI2 both be level 5% confidence intervals for some parameter T(P) of a distribution P. Then CI3 = intersection(CI1,CI2) is a level 10% confidence interval of T(P), and yet there could be specific samples x for which CI1(x) and CI2(x) are disjoint and therefore CI3(x) is empty.

I hope that demostrates that the kind of derivation you hope to carry out will not be straightforward, if at all possible.

As for a Lancet submission (or reaching "wider blog readership" through Malkin), in my opinion you should consider putting these on hold until you consult with a professional statistician.

I appreciate Sortition making a honest effort to explain his position to me. Comments:

1) I am, truth be told, a "professional statistician" in my day job, although much more applied than theoretical.

2) Although I appreciate Sortition not pulling a "read the textbook," I am not against good references. What text book would have examples of the sort of proof that Sortition is describing? A specific example would be most helpful.

3) Let me restate my objection to Sortition. Imagine that every single word in the paper is exactly the same except for the 3 numbers associated with CMR_post. Instead of 12.3 (CI 1.4 -- 23.2) we have 4.3 (CI 3.0 -- 5.6). Now, I think that every other commentator on this thread would agree that, in the case, the reported results for RR of 2.5 (CI 1.6 -- 4.2) is mathematically impossible. Sortition seems to disagree, seems to argue that there is no way to "prove" that this is result is impossible. If that is so, then I have a different notion (not necessarily a better one) of what it means to "prove" something in this context. I think that all of the rest of us can agree to disagree with Sortition on this.

4) Looking at Sortition's explanation, my mind drifts back to the hazy mists of graduate schools and proofs in the limit and what not. plim anyone? My claim continues to be that whatever such proof was constructed to deal with the case that all the rest of us agree is impossible would work well for me as well.

The three most recent of dsquared comments were caught in moderation, so if you missed them you might want to scroll up and read them. Malkin says that she read this post and emailed David for permission to post it, rather than David contacting her.

Shannon Love is claiming "vindication" not because David Kane proved that his criticisms him right (because he didn't) but because he thinks that David's argument proves that the Lancet is wrong.

David Kane:

>Can we all agree that, if the answer to the second question is something like 264,000 CI (-130,000 -- 659,000), as I argue in my paper, that the vast majority of readers of Roberts et al (2004) were misled (perhaps unintentionally)?

No.

>Most (all?) readers assume that excluding Falluja was "conservative" and that including it would increase (not decrease) the lower confidence bound.

No. Conservative means that excluding it reduces the point estimate.

In the words of Enrico Fermi, if there are places in Iraq SAFER than before the invasion, WHERE ARE THEY. Having mad a prediction that there might be such places, one is compelled to find them if one wishes to be taken seriously. On the other hand it is sadly easy to point to places where there has been comparable destruction to Fallujah. Otherwise this just degenerates into a stupid finger exercise.

David Kane argues: *They used an analytical model [to calculate the CMR CIs] ... Until the paper is withdrawn or a correction is made, they are stuck with that assumption ... Their results for CMR mean that their results for RR are wrong.*

But this just makes no sense. Roberts et al used "bad" (simple but nonrobust) methods to calculate the CMR CIs but good methods to calculate the RR CIs. Now one could reasonably argue that Roberts et al should have used better methods (bootstrap) throughout. But David argues that because they used simple ("bad") methods at one point in the paper (for less important results) they are committed to using the same method throughout (for the more important results too). WTF?

And by the way, if Roberts et al had bootstrapped the CMR CIs, it would only have strengthened the paper's conclusions. It would have increased the lower bound to something more plausible. When I try replicating their results I get CMR(Post) = 12.3, CI= (2.06, 22.56) assuming a simple random sample of 33 clusters (not that different from the Lancet paper's results). But I get a much more plausible CI = (6.40, 23.36) using a bootstrap (the 2.5th and 97.5th percentiles).

David, I think there is a simpler answer to your problem. I am reading your problem as that described in the introduction of the paper, namely that the CI for post-invasion CMR overlaps the point estimate for the pre-invasion CMR (indicating a non-significant difference in deaths), and that a RR whose CI doesn't overlap 1 (indicating increased risk of death) contradicts these findings. e.g.

By itself, the L1 result of a RR
of 2.5 (95% CI 1.6 - 4.2) seems plausible. However, this confidence interval
is not consistent with the estimates presented for CMRpre and CMRpost.

(I presume as discussed by dsquared that you mean here to say it is not consistent with the confidence intervals of the estimates presented for CMRpre and CMRpost).

You then use a discussion of probabilities of CMR pre and post-invasion to show how this can't be.

The problem with this problem is that the standard (analytic) calculation of confidence intervals for a RR is completely independent of the confidence intervals for the contributing CMRs.

For a standard contingency table:

a | b | a+b
-----------
c | d | c+d

If a=post-invasion CMR, c=pre-invasion CMR, and the populations at risk {(a+b) and (c+d)} are equal, then the RR is given by

RR=a/c

and the standard error of the log of RR by (approximately)

se=sqrt(1/a+1/c)

{this approximation is good if a very much less than a+b, c very much less than c+d}. i.e. it behaves like an odds ratio. {Here also I am assuming a+b=c+d=1000, since the death rate is given as deaths per 1000}.

As you can see, the confidence intervals of the pre- and post-invasion CMRs have no bearing on the calculation of the CI for the RR. You can even calculate the value of post-invasion deaths above which the confidence interval for the relative risk is no longer statistically significant, by rearrangement. But it doesn't depend on the confidence interval for our calculation of that death number.

The point is, it is very common in standard frequentist statistics for point estimates of death numbers to overlap, but their RR to be significantly different from 1. This is because it is the smaller of the numbers which dominates calculation of the standard error - we don't care that we have observed lots of deaths in one group, if we have large uncertainty about the number of deaths in the other group.

If you have a problem with this fact, you need to write a paper to a biostats journal arguing against the basics of modern biostatistics.

Of course, the point is moot as discussed, because in the end the CIs weren't calculated using the analytic formula, but a bootstrap, for reasons discussed ad nauseum.

Hi All --

Interesting discussion... but I can't help but feel a critical point is in danger of being lost in the shuffle. David Kane is focusing on confidence intervals in the Lancet study, aka "Trees." Many people reacting to Kane's article, however, are focusing on the Iraqi death toll, aka "The Forest."

In the interest of accuracy, I hope that Mssrs. DSquared and DKane can determine whether the tree in question is an oak or a maple. In the interest of humanity, however, I hope that Kane's readers will understand that uncertainty about one particular tree does not translate to uncertainty about the existence of a forest.

About two years ago, I invested a great deal of time in evaluating estimates of the deaths caused by the Khmer Rouge regime in Cambodia. I believe that the lesson of that research is applicable here: an accurate estimate requires an interdisciplinary approach.

Like Iraq, Cambodia presents a daunting challenge because of the incomplete, uncertain nature of the data; and, as in Iraq, estimates derived from one set of data may appear to be at odds with estimates derived from a different set of data. Moreover, ideologues may be quite happy to have the actual numbers remain open to question: as long as "x" is unknown, they can assign whatever value is politically expedient.

My article on the Khmer Rouge death toll was written specifically for Cambodia scholars, but I believe that a number of the issues discussed -- in particular, the implications of clustering of mortality on random surveys -- would be relevant in other contexts. The article ("Counting Hell") is online here: http://www.mekong.net/cambodia/deaths.htm

With regard to the Lancet study, if the inclusion of Fallujah drastically skews the results of the survey, it seems reasonable to argue that the size of the survey was too small. Precisely because I believe in the necessity of an interdisciplinary approach, I would agree that the Lancet studies should not be construed as authoritative. The lesson of Cambodia, however, is that even imperfect data can ultimately help us see the larger picture. The proper way to address this shortcoming is: gather more data, via different methodologies, and correlate the results.

Kane's article states that "The Lancet authors cannot reject the null hypothesis that mortality in Iraq is unchanged." Perhaps this would be an accurate statement if the Lancet study were the only evidence available to us. But it isn't; thus, Kane's statement seems disingenuous. Michelle Malkin believes that this is "Kane's bottom line." I would encourage Mr. Kane to clarify his position: if he genuinely believes that mortality in Iraq is unchanged, he should say precisely that.

Regards,
Bruce
cambodia@aol.com
www.mekong.net

My apologies to David Kane for failing to see his earlier comment (#14), in which he remarked that he believes violent excess mortality is probably on the order of 100,000.

If excess *violent* mortality is around 100,000, total excess mortality (that is, increased mortality due to disease, etc.) will be substantially higher.

While I'm skeptical of the 655,000 figure suggested by the 2006 Lancet study, I wouldn't rule it out as a total impossibility. The idea that "mortality in Iraq is unchanged" since the invasion, however, is an absurdity.

I agree with what Ragout said in #105.

David Kane, since you think a normal approximation for the errors on the CMR estimates is appropriate, what is your calculation for the probability that the post-invasion CMR is < 0?

Bruce Sharp wrote:

http://www.mekong.net/cambodia/deaths.htm

Nice summary article.

> Imagine that [... i]nstead of 12.3 (CI 1.4 -- 23.2) we have 4.3 (CI 3.0 -- 5.6). Now, I think that every other commentator on this thread would agree that, in the case, the reported results for RR of 2.5 (CI 1.6 -- 4.2) is mathematically impossible. Sortition seems to disagree.

I do. I can easily construct a set of three 5%-level CIs such that they have 5% chance of having exactly the arrangement that you state.

Here is a positive statement that you can make that should match your intuition that the CIs must be consistent in some way: if you have three level alpha CIs for three parameters, a, b, c, such that c(P) = a(P) / b(P), then the probability that CIc overlaps with CIa / CIb is no less than 1 - 3 alpha (division interpreted in interval arithmetic).

So, P(the upper point of the CI of CMRpost < the lower point of the CI CMRpre and at the same time the lower point of the CI of RR > 1.0) is less than 15%.

I doubt that more than that can be said without knowing how the CIs are constructed.

Bruce makes an important point, but I would like to get this statistical thing squared away too because it looks to me like a direct attack on the integrity of Les Roberts, and I care about that sort of thing.

[3) Let me restate my objection to Sortition. Imagine that every single word in the paper is exactly the same except for the 3 numbers associated with CMR_post. Instead of 12.3 (CI 1.4 -- 23.2) we have 4.3 (CI 3.0 -- 5.6). Now, I think that every other commentator on this thread would agree that, in the case, the reported results for RR of 2.5 (CI 1.6 -- 4.2) is mathematically impossible. ]

No. Every other commentator on the thread would agree that it is not possible to make categorical statements about the distribution of the risk ratio based on only the marginal distributions of the two CMR estimates. (Or similarly, what SG said about the calculation of the CIs, but I am trying to address your own folk-Bayesianism here).

[Sortition seems to disagree, seems to argue that there is no way to "prove" that this is result is impossible.]

There is indeed "no way to prove that this result is impossible" on the basis of the information available. If you had more information about the full bivariate distribution, you could show that the CI for the risk rate was incorrect, but it is actually *not* mathematically impossible - in that there exists a bivariate distribution consistent with those two marginal distributions and the distribution of the ratio - and so you can't prove it is impossible.

[ If that is so, then I have a different notion (not necessarily a better one) of what it means to "prove" something in this context. I think that all of the rest of us can agree to disagree with Sortition on this. ]

As noted above, no.

David Kane wrote:

I am, truth be told, a "professional statistician" in my day job

I just can't stop reading this.

dsquared wrote in #78

And by the way, I am really, really, not pleased that while ostensibly asking us for comments aimed at improving the paper before distribution, behind our backs you allowed it to be distributed on the Michelle Malkin site in unchanged form. I don't appreciate having my time wasted or having been played for a sucker. And I hereby put you on notice that I do not want my name mentioned, even in passing conversation, in connection with this paper. If I find out that you have done anything to imply that I checked it, reviewed it or anything of the sort, I will be very furious indeed.

1) I had changed the paper to thank you for your comments (in the same way that I thank Shannon Doocy and Stephen Soldz, two others who do not agree with my conclusions), but, since you do not seem to want thanks, I have removed your name. This seems silly, but I would always abide my someone's wishes on something like this.

2) The only other person from this thread that I have thanked in the next draft is Tim, since he is the only one whose real name I know for a fact, although perhaps I should thank commentators at Deltoid in the same way that people thanks seminar participants at MIT or whatever. If anyone else would like to be thanked by name (or Tim would like to be removed), please let me know.

3) To be clear, I did not seek Malkin out. She is a Deltoid reader (and who isn't?) and contacted me. Should I have refused her permission to reprint? If someone contacts me from DailyKos, should I refuse him permission? If, say, Andrew Gelman asked for permission to reprint should I give it to him? That seems positively unscientific to me. Once you decide that a paper, even in draft form, is ready for wider distribution (not necessarily ready for publication, but ready for comment and criticism from smart people), it seems silly to m to restrict its distribution. I have no problem with people who do restrict distribution, but doing so on the basis of political views seems stupid.

4) I am not even sure what it would mean to ask "for comments aimed at improving the paper before distribution." I can't get comments from smart people like the commentators at Deltoid to comment without distributing the paper to them and I can't distribute it to them (unless I find out all their mailing addresses) without asking Tim to post it. And, given that it is a free world wide web, I can't prevent Malkin from reading Deltoid. In fact, she could have written the exact same post that she did and simply linked to the copy of the paper at Deltoid instead of recopying in the whole thing. Nothing I could do would have prevented that. So, my only option (once the paper is public at Deltoid) is to not give her permission to reprint it. And what would be the point of that?

I thank Sortition and dsquared for taking the time to educate me on proofs and Frequentists. Let me see if I can restate their point (correct me if I am wrong) in a simpler framework.

Imagine a paper which reports, in one section, a weight for a brick of 2 pounds (95% CI 1-3) and, in another section, a weight for the same brink of 10 pounds (95% CI 9-11). Assume that those confidence intervals are normally distributed and the paper is Frequentist.

Sortition/dsquared's point is that, even though these two claims are wildly inconsistent, it would be wrong for me to claim to "prove" that. There is some state of the world in which the true brick weighs 6 pounds but the first weight just happened to be very low and the second very high. Since the confidence limits are normal, there is always a chance, however small, of such a result.

Fine. I think that I get this. But, we can all agree, there is surely something suspect in the brick results above. From a Bayesian standpoint, the result is absurd. You can believe that the brick weighs around 2 or around 10, but you can't believe both results simultaneously.

So, how does one demonstrate that absurdity within the frequentist paradigm? I assume that there is a way to do this but graduate school was a long time ago. dsquared writes:

The answer to your question is that there is not enough information given in the paper to be able to say anything about the correlation structure of the clusters. Therefore we cannot calculate the implied shape of the bivariate distribution of RR from the marginal distributions CMR-pre and CMR-post, and so no, the calculation you want to carry out is impossible on the basis of the information in the published study. You could prove that these estimates were "inconsistent" if you had more information (and if that information did in fact say the right things about the correlation structure) but you don't, so you can't.

How would one "prove" inconsistency in the simply brick example? If Sortition/dsquared would outline the procedure (or point me to a text book example), I think that I could rework that example to apply to L1. (dsquared and I disagree on just what information I have about the paper. I will address that point soon.)

David Kane wrote:

perhaps I should thank commentators at Deltoid in the same way that people thanks seminar participants at MIT or whatever. If anyone else would like to be thanked by name (or Tim would like to be removed), please let me know.

Well, I try never to presume that my comments will be helpful to anyone but if, through inadvertence, that should have happened it wouldn't bother me if you thanked me by name as long as you add that my comments were disapproving.

Having said that, I don't think there's a problem if you didn't contact Malkin but rather she contacted you--unless, of course, you contacted her seeking technical comments in the same way you sought technical comments from us. In that case, I wouldn't think there's an ethical problem on your part at all. There'd be a disquieting cognitive problem in thinking that Malkin could provide you with technical advice, but not an ethical one.

I think that the actual consequences of the Malkin posting (that a million and one wingnuts are running around pretending that Roberts et al 2004 has been conclusively refuted and that therefore the entire Iraq War is "vindicated") were so obviously predictable, and the possibility of any worthwhile comments so remote that it was at best mindlessly irresponsible to agree to the paper being posted. I don't want my name anywhere near that paper as I think that in its current form it is many miles away from being anywhere near distributable; if I'd written it I would have asked for comments via email rather than posting it on Deltoid. I also maintain that it was quite rude to not even inform us that the paper had been widely circulated in its original form.

Your latest attempt at a hypothetical example smuggles in a piece of prior information (that the brick is the same brick in both cases) which is exactly what you don't have in the current case. If you didn't know it was the same brick, you wouldn't be able to prove the estimates were inconsistent there either. However you ask this question, the answer's going to be the same.

David, the trouble is that libertarian folk on the far end of the political right are trying to find any fragment of evidence they can to legilitize their defense of the criminal enterprise that was and is the Iraq war and occupation. Your critique of the Roberts et al. Lancet papers - which is clearly contentious, given the soundness and robustness of the rebuttals here by Robert, dsquared et al. - is still going to be one of the fragments that the rightists use to defend the indefensible.

How do you feel about this? Given the humanitarian catastrope in Iraq (lets ignore the carnage for just a second and focus on the massive outlfow of refugees and the clear descent into barabarism as every thread of the civilian infrastucture has been unwound by this disaster), what is your opinion on the war and its aftermath? Are you one of those who ignore a historical legacy of western atrocities? Do you think that there was an alternate agenda (that I outlined in my first post on this thread) that was camouflaged by (initially) rhetoric about WMD and links with Al Queda, and, when those proved to be bogus, propaganda about bringing the great gift of democracy to the Iraqi people? Forget your statistical critique of Roberts et al. for just a second and tell me what you think of the whole Iraq conflict.

BTW, my name is very real (at least it was the last time I checked) and I am a population ecologist working in The Netherlands.

Here is an update on the issue of whether or not the confidence intervals for CMR are, as presented in the paper, normally distributed. dsquared takes issue above with that claim and points out, correctly, that it is critical to my paper.

1) If we had access to the code, this would be trivial to figure out. But the authors won't let us see the code, so we get to waste time arguing about it.

2) The authors have not responded to my e-mail (#91) on this point, at least to me. Perhaps dsquared or Tim got an answer . . .

3) I challenged dsquared and others above (e.g., #22, #55 and #57) to demonstrate how the estimates for CMR could both a) come from a bootstrap and b) result in a symmetric CI, given that Falluja is such an outlier. No one has met that challenge. I conclude that a parametric method was used and that, given this, it is almost certain that the CI is normally distributed.

If anyone disagrees, please explain how you can bootstrap your way to a symmetric CI given the data in this study. Future posts by me will assume that we have settled this. The confidence intervals for crude mortality rates are normally distributed. (But disagreements with this claim are still welcome!)

Assuming that the confidence intervals for CMR are normally distributed, I think that these two claims by dsquared are false. (Comments, as always, welcome.)

The answer to your question is that there is not enough information given in the paper to be able to say anything about the correlation structure of the clusters. Therefore we cannot calculate the implied shape of the bivariate distribution of RR from the marginal distributions CMR-pre and CMR-post, and so no, the calculation you want to carry out is impossible on the basis of the information in the published study. You could prove that these estimates were "inconsistent" if you had more information (and if that information did in fact say the right things about the correlation structure) but you don't, so you can't.

and

There is indeed "no way to prove that this result is impossible" on the basis of the information available. If you had more information about the full bivariate distribution, you could show that the CI for the risk rate was incorrect, but it is actually not mathematically impossible - in that there exists a bivariate distribution consistent with those two marginal distributions and the distribution of the ratio - and so you can't prove it is impossible.

I think that I understand this point. But, if the CIs for CMR are normally distributed, then, as the paper shows, I can prove that it is impossible (or very, very, unlikely) by simulation. Choose any correlation for the estimates of post and pre-invasion CMR. Whatever number you choose, you still get a result for the RR that is inconsistent with the paper. If anyone is interested, I can provide a couple lines of R code in the comments (and the next version of my R package will include a function to play with at home).

In other words, if CMR CIs are normal, then we can examine the space of all possible bivariate normal distributions (consistent with their individual marginal distributions) and see that all of these are inconsistent with L1. I have done this, as my paper reports.

No you can't. The continuum of correlation coefficients does not exhaust the space of "all possible bivariate normal distributions". There can be a wide variety of bivariate normal distributions which have the same correlation coefficient. If I recall correctly, David, your background is in empirical finance so you really ought to be aware of what a copula is.

I am, truth be told, a "professional statistician" in my day job.

As you know, David, we have pretty similar jobs; we're both portfolio managers. I would describe myself (and, frankly, you) as a user of statistics, but not a professional statistician.

Re Malkin, I have no problem at all with you allowing her to reproduce your paper and answering her questions (though people here probably would also be interested to know it was already submitted to The Lancet).

Re your light-hearted comment:

Surely we can all agree with Malkin that my critique is better than previous ones! :-)

Fair enough, though the issue is more whether Malkin is competent to judge. Also, I sense some dissent here to her conclusion that this is a "damning new statistical analysis of the 2004 study".

dsquared:

The continuum of correlation coefficients does not exhaust the space of "all possible bivariate normal distributions".

A bivariate normal, as the term is customarily used, is determined by a single correlation coefficient (and the two means and variances). There are however many bivariate distributions in which both marginals are normal, but the joint distribution is not bivariate normal; I assume that's the point you were making.

yes you are right of course. these kind of errors are my trademark (cf comments #9 and #10 above), which is why if Michelle Malkin asked me if she could republish a paper of mine before I had thoroughly finished checking it I would answer "no".

David Kane insisted:

Assuming that the confidence intervals for CMR are normally distributed

If so, then there is a non-zero probability that the CMR is abitrarily far from the mean (say, three SD's). So, David, do you truly believe that the probability of a negative CMR is non-zero?

David, once again: in studies of rare events, the confidence interval for the RR is unrelated to the confidence intervals for the point estimates of the cells in a 2x2 contingency table. Should you have reason (such as a few extreme outliers in your estimate of the values in the cells of that 2x2 contingency table) to think the data is not drawn from an approximately normal distribution then you use robust estimation (such as a trimmed mean or bootstrap sampling or wls regression) to estimate the confidence interval for the RR. This does not mean you have to go back and recalculate the confidence intervals for the point estimates of the cells in the 2x2 table - the symmetric confidence intervals are instructive as to the problem, and can be kept in to show this problem.

For example, the cells in the model 2x2 table used to define the RR should be drawn from a binomial distribution. This means the confidence interval for an observation of value 12 should be approximately 4 to 20. In this case it is 1.4 to 23. Therefore we think the cells aren't being drawn from a binomial distribution. Maybe therefore the standard analytical formula doesn't apply? We show this by retaining the analytically calculated CIs for the CMR. Then we calculate a better estimate of the CI for the relative risk.

If outliers are the reason for this problem, and you use robust estimation to handle them, you will by design reduce the width of confidence intervals for your RR relative to an analytic calculation. This may look like cheating to you but it is the right thing to do.

For example, my epidemiology text book (Armitage and Berry, 4th edition) says that in situations where you "expect occasional statistical outliers" you should use robust estimation methods, for example the trimmed mean. Note the phrase "expect occasional statistical outliers". Note that a trimmed mean in this case is equivalent to excluding fallujah. For CIs, you can use methods like bootstrapping to calculate robust estimates.

I can't understand how any of this can be difficult for you.

Progress!

Elizabeth Johnson (who did the statistical work on the paper) has confirmed that I am correct and dsquared is wrong. Crude mortality rates were not estimated via bootstrap. Those confidence intervals are normally distributed.

As I said before, if dsquared had been right about this and I wrong, much/most of the paper would have been invalidated. Now that my assumption has been confirmed, I would be interested in reading further comments.

Although it was a bit of a bother to try to "prove" over and over again what, to me, is obvious (that the CIs are normal), I am pleased to claim correctly to understand something from L1 that dsquared did not. and to have tried my best to explain it to him. I have learned much from his writings on the topic so this was a small attempt by me to return the favor.

happy to take your word for it. Did she confirm that the relative risk rate CIs were bootstrapped? If so, then you still don't really have a paper - as Ragout notes above, your assertion that the authors should have presented non-bootstrapped CIs for this quantity doesn't really rate even a one-paragraph letter to the editor.

I looked at the Lancet paper, and it seems reasonably clear to me what they did. For the relative rate they use a generalized linear model. They estimate errors using bootstrap. As far as I know, this is basically the state of the art in estimating this type of model.

For some summary statistics that they mention in passing, they apparently report the confidence interval you get from assuming that the statistic (not necessarily the underlying sample!) is approximately normal. They could have reported the standard error or the t statistic just as well. As expected, the inclusion of Fallujah makes the interval wider than if you excluded Fallujah. But these are summary statistics, statistics that serve to give you a feel for the data. The relative rate calculation is the point of the paper, and it does not assume normality.

dsquared asks

Did she confirm that the relative risk rate CIs were bootstrapped?

Yes. Note that it has always been clear (to me) that the RR estimate comes from a bootstrap. Nowhere in my paper do I assume that RR CIs are normal. You mistakenly assert that I do this in your CT post. A correction would be most welcome.

Thanks again for all these helpful comments and apologies for taking so long to respond. Several people above point out, correctly, that just because CMR_pre and CMR_post are normally distributed, I do not get to assume that their joint distribution is bivariate normal. Correct. But my understanding (corrections welcome) is that this does not matter to the point that I am making.

The paper shows the (easy!) math of how the difference between two normal variables is itself normally distributed with a variance of v(x) + v(y) + 2*cov(xy). Fine. Now, the correlation between x and y (whatever their joint distribution may be) must be between -1 and 1. Given that the cov(x, y) is a direct function of the correlation and the sd(x) and sd(y). We know sd(x) and sd(y), so we can calculate the range of all possible values for cov(x, y). Given that range, as long as all those values produce results consistent with my contention that there is around a 10% chance that CMR_post is less than CMR_pre, aren't I done?

That is, I do not need to assume that I know anything about the joint distribution of CMR_pre and CMR_post since I know there variances and I am willing to try all possible values for their correlation.

This all seems trivially true to me, but perhaps I am missing something. I especially seek comments from dsquared, my "friend", Sortition, Robert and SG on this topic.

David, if you're not arguing about the use of bootstrap, then what are you asserting?

1) Note that there is a mistake in the paper (and post) above (but not in the code, so all the figures are correct) about the variance of a difference between two normal variances. You need to subtract the covariance, not add it.

2) I am asserting the same thing as when we started! Give the distributions from CMR, it is impossible (or highly, highly unlikely) for the confidence interval for RR reported in Lancet to be correct. On this point, dsquared does not like my brick example.

Your latest attempt at a hypothetical example smuggles in a piece of prior information (that the brick is the same brick in both cases) which is exactly what you don't have in the current case. If you didn't know it was the same brick, you wouldn't be able to prove the estimates were inconsistent there either. However you ask this question, the answer's going to be the same.

This misses my point. There is some specific number of deaths in Iraq before and after the invasion (and some population). We don't know these numbers, just like we don't know the weight of he brick. But the methods that we use to estimate these numbers (or functions of them) must be consistent or there is a problem. Whatever formal "proof" one would use in the brick example is just the same formal proof that I could use. I don't think such a formal proof is necessary or even useful, but if someone shows me an example, it will be easy for me to fit it in the paper.

David,

The problem is actually less whether your math is right than the fact that the point you are using it to make is wrong, or at least wrongheaded.

Your argument, so far as I understand it, boils down to three parts:

1) L1 uses different distributional assumptions to calculate the CMR CIs from what it uses to calculate the RR CI.

2) If one uses the assumption used to calculate the CMR CIs to calculate the RR CI (underlying distribution is normal), then the RR CI includes one.

3) The assumption underlying the calculation of the CMR CIs is appropriate, and _should_ be applied to calculate the RR CI.

On #1, you seem to be correct. On #2, I actually think your math looks ok, but this is irrelevant, because:

#3 is so wrong. 180 degrees off. Normality is obviously a bad assumption here; nonparametric methods, e.g., the bootstrap, will be better. If you want to reconcile the discrepancy you found in #1, it would be much better to do completely the opposite of what you have done, and fix the CMR CIs. (Which, I should note, is exactly what Tim said in the post that kicked this off.)

Kane says:

> I am asserting the same thing as when we started! Give the distributions from CMR, it is impossible (or highly, highly unlikely) for the confidence interval for RR reported in Lancet to be correct.

As I have written several times - if you think your paper proves this assertion then you are utterly mistaken. Your entire paper is based on a severe misunderstanding of the statistical analysis. (BTW, the very phrase "Given[n] the distributions from CMR" is high unclear - again, parameters have no distributions in the frequentist setting of Lancet paper).

It seems that despite seeing that your grasp of the analysis is tenuous, you are unwilling to re-examine your original claims.

David Kane asked:

This all seems trivially true to me, but perhaps I am missing something. I especially seek comments from [...] Robert [...] on this topic.

What you're missing is that hypothesis testing isn't automatic. This is why I kept prompting you to think about the probability of a negative CMR: I was hoping you'd realize that proper statistical inference always depends on a good estimate of the underlying sampling distribution, and that your method presumed a sampling distribution that is wrong. Basically, the Roberts paper presents the post-invasion CMR's mean and CI as summary statistics -- but you're using the summary statistics as if they were sufficient statistics for the entire sampling distribution. The mean and variance (or equivalently, the sd or 95% CI) are sufficient statistics when the distribution is normal, but not when it's not. That's why I keep pointing at the bootstrap distributions of the various parameters: when Falluja is included they're clearly not well-behaved, so the mean and variance are clearly not sufficient, so your statistical inference is clearly off.

But I encourage you to present the paper in its current state. As I said earlier, I'm supposed to be on vacation and I need to concentrate on whether Cadel Evans can pick up 1:51 in tomorrow's TT stage (he picked up 3 seconds today), so if you present in its current form I won't have to look at your stuff again and I can just dismiss it as is.

There are two conversations going on here, statistical and political. The statistical dominates but it is pretty clear that for some participants Mr. Kane's paper must be false because, if true, it would help destroy a mythic narrative that they need to advance their political cause and science should bend to the need of the political (#118 seems to be the most explicit of this group). This implies both that scientists have a shared political world view and that that world view should come first in setting direction no matter where little inconveniences like facts would lead. Both are false for the vast majority of scientists but it is important for the long-term health of enlightenment values that such assaults must be repulsed even-handedly.

No matter what the statistical merit of the paper (and I'm not competent to judge those), I hope that the vast majority of the participants would note and reject the idea of politics first, science second no matter what stripe of politics is being pimped. If not, I would suggest that Mr. Kane go elsewhere for his critique as he will not get an honest hearing here, just better or worse disguised political attacks. Talk about Michelle Malkin drew multiple critical remarks because many wanted to ensure her political myth telling was not enhanced but go back and look at the other side in this very thread. There seems to be no even-handedness so far about trying to keep politics out of the debate and after 130 comments, it really should have shown up by now.

On the statistical front I note a few disparaging remarks (rapture) about negative death rates and how this would only be possible in high in-migration areas. There were quite a few refugees from Saddam's Iraq (my understanding is that a million fled). Many of those people went home after the US overthrew his regime and that those camps were emptied during the time frame of the 2004 study. So far as I have been able to divine, nobody has taken into account the effects of the returning refugees should have had on the 2004 study. As best I can tell, there were approximately 300,000 returnees, many of them single male military deserters with low death rates. Depending on how *they* clustered, a local zone's death rate might very well have dropped.

Eli Rabett in #104 asks where are the lower zones of mortality post invasion. I would suggest starting to look in the kurdish zones for example which are generally peaceful and doing quite well. The US military posts a quarterly report to Congress and one of the features is a graph showing violence by province. The top 4 or 5 provinces generate about 80% of the violence. At the other end of the graph you might find your zones of lower mortality, especially Shia areas that were heavily under the thumb of Saddam's goons before and are now receiving their fair share of medical and other supplies. It isn't directly relevant for this study time period but the 4 Shia provinces that have reverted to local control are probably better off today and might have been better off in the study time period too. Good politicians, good administrators would have started their work long before the formal provincial handover had taken place.

TMLutas: The Lancet people explicitly look only at mortality, not changes in population. A negative death rate is literally impossible, unless you grant the possibility of resurrection.

TMLutas admitted:

No matter what the statistical merit of the paper (and I'm not competent to judge those)

yet then wrote:

On the statistical front I note a few disparaging remarks (rapture) about negative death rates and how this would only be possible in high in-migration areas.

Sigh. No, low death rates are only possible in populations with unusual age structures, such as ones which are experiencing high in-migration -- and since we're talking all of Iraq, you'd need high in-migration of the right age structure for the entire country, not just some areas within it -- but negative death rates are never possible. Unless your parenthetical "rapture" was really intended as the Rapture; in that case, the dead will arise and the death rate will be negative.

David, I feel like I must be misunderstanding what you are arguing here, becaue if you are arguing what I think you're arguing, then you are in for a rough ride when you present the paper (unless no one in the audience has looked at Roberts et al, which I suppose is possible).

It's pretty clear what model the authors use to estimate the relative death rate. It's a complicated generalized linear model, not a ratio of two normals. If you really want to make the case that you can't get the result they find given what they report for the summary statistics, you would have to simulate that model, and show that there is no way for the two sets of confidence intervals to match up.

As for your consistency argument, this is essentially the opposite of the frequentist philosophy. In frequentist statistics, you should report several different estimates as a robustness check, to show that your result is not too sensitive to your modelling assumptions. I would have liked to see more of that in Roberts et al, but I have noticed that medical papers seem to have very tight space constraints. The authors report the naive sampling average, with or without an obvious outlier. Doing it the naive way, you get the pathological result that the confidence interval gets bigger, which is pretty clear from what they report. But for their findings, they don't do it the naive way. They do it the right way.

Robert - Uncharitable much? "this" in my prior comment should be expanded out to 'mortality rates lower than 4' and not negative rates which I took as a given was hyperbole not worth addressing. Looking back, I should have not phrased it the way I did.

The point stands. Iraq had a large population influx, both settled and unsettled refugees (I do remember local stories about well established Chicago Iraqis going back to Iraq). How much this drove the rate down I've no idea but there seems to be a widespread conviction in this thread that there are zero, none, nada factors going on in Iraq that could have possibly driven the mortality rate down and any theory that posits that the death rate even might have gone down are thus suspect. You don't have to know very much math to identify some factors that would tend, at least on a local basis, to drive rates down.

Walt - If there are tight space requirements, all the more reason not to toss your data, wouldn't you say?

[Basically, the Roberts paper presents the post-invasion CMR's mean and CI as summary statistics -- but you're using the summary statistics as if they were sufficient statistics for the entire sampling distribution]

beautifully put, Robert. This is the heart of the whole "Bayesian paradigm" stuff that has Sortition tearing his hair out.

TMLutas - your argument depends on wildly counterfactual assertions about the relative flow of refugees in and out of Iraq.

TMLutas: That has precisely nothing to do with my point. Les Roberts may be the most sinister figure in the history of statistics, for all I know. David asked for feedback on his paper, and I am providing it.

TMLutas wrote:

"Iraq had a large population influx, both settled and unsettled refugees (I do remember local stories about well established Chicago Iraqis going back to Iraq)."

Nonsense. No matter how many Iraqis went from Chicago, it is hardly equal to the the 2 million who are causing crises in Syria and Jordan. No to mention the million more or so, who have been displaced within the country. See more at http://news.bbc.co.uk/2/hi/middle_east/6916791.stm

In fact, the vast outflow of refugees is one of the independent corroborations of the Lancet results - only a high level of violence could have caused so many people to flee their homes.

"Are you curve-fitting your bootstrapped results into a normal distribution? "

In case nobody noticed, I'm not the bootstrap expert, but I had assumed that it relied on the central limit theorem; that if you repeatedly resampled a nonGaussian distribution, the distribution of the means thus produced would be Gaussian. In the case of the bag of threes with one four, two points on a Gaussian distribution, from whence the lower CL would be calculated. If I'm wrong, please somebody stop me before I post again.

This is the kind of reason why the FDA doesn't accept treatments like resampling, which are not proved to be 100% conservative always.

Surely, it seems relevant that David Kane has done this before?
http://crookedtimber.org/2006/10/18/floating-the-fraud-balloon/

"I'd rather try to get you to step outside the models for a second and try to explain the logical connection between discovering a bombed-out city with very high casualties and the conclusion that there must be other areas with very low death rates (possibly even resurrections). I know about outliers increasing variance. I want a real-world explanation of the logic."

Oh that's easy; there is none. It's an artifact from making implicit assumptions which are in reality, not valid. If Falluja were an honest to goodness purely random statistical outlier, where the fighting were no different from that in any other Iraqi town but the death rate ended up that high, then obviously there would have to a huge variance to accomodate it. But, it's pretty much universally accepted that that was hardly the case, no? So like I said in the other thread, it's somewhat similar to calculating the average weight of everybody in town, and the Moon; you get an answer, but the significance is "so what?" You're better off averaging the humans by themselves, and treating the Moon on its own, if you're in search of meaningful estimates.

"I think that the actual consequences of the Malkin posting (that a million and one wingnuts are running around pretending that Roberts et al 2004 has been conclusively refuted and that therefore the entire Iraq War is "vindicated") "

Reminds me of a somewhat OT joke:
"I see Hamid Karzai gave the President an amazingly talented parrot. It's really startling to hear the bird repeat all the President's talking points"
"Well, don't be too impressed, he doesn't understand the words, he's just repeating the sounds"
"True, but then again so is the parrot".

"Surely, it seems relevant that David Kane has done this before?"

One would think so. But now he has statistics that someone made up for him.

I am just an bush-league statistician but I do understand something about study design. I do not see any explanation in this thread for why one would even suspect that the raw data did not have a symmetrical uncertainty distribution.

There appears to be no reason that one could expect during the design phase of the study that the data would exhibit an asymmetric distribution. The study purports to be able to detect if the ratio of deaths to living person hours, increased, stayed constant or decreased pre and post invasion. (n>1, n=0,n<1) To accomplish this, the methodology must have an equal chance of detecting each type of change if it exist in a sample. If it cannot do this then it is structurally biased to over or under-report deaths and births.

Assumptions of asymmetrical uncertainty are usually only made when a large amount of previous empirical evidence exist that such a distribution does exist. No such previous knowledge exist in this case. In such cases, researchers make a point of noting that they didn't use a symmetrical distribution.

In short, just because this particular instance of a study produces a highly asymmetrical result does mean that the methodology or the phenomenon inherently exhibits an asymmetry. We would need a large number of identical studies with very similar results to make that leap.

Not to be rude, but the average quality of comments was much higher before dsquared linked to here from Crooked Timber. Just saying! :-)

Anyway, let me respond to Robert.

This is why I kept prompting you to think about the probability of a negative CMR: I was hoping you'd realize that proper statistical inference always depends on a good estimate of the underlying sampling distribution, and that your method presumed a sampling distribution that is wrong. Basically, the Roberts paper presents the post-invasion CMR's mean and CI as summary statistics -- but you're using the summary statistics as if they were sufficient statistics for the entire sampling distribution. The mean and variance (or equivalently, the sd or 95% CI) are sufficient statistics when the distribution is normal, but not when it's not. That's why I keep pointing at the bootstrap distributions of the various parameters: when Falluja is included they're clearly not well-behaved, so the mean and variance are clearly not sufficient, so your statistical inference is clearly off.

1) You keep claiming the the CMR estimate were just "summary statistics." Not important. Pay no attention. Just move along. But as the paper states (and I quote!) that the CMR estimates served as the basis for the headline 98,000 figure (8,000 -- 194,000). So, if you consider that number important, you need to take their model seriously.

2) How many times do I have to make this clear? It is not my fault that Roberts et al use a normal model for CMR. Don't shoot the messenger! You may think that this is stupid, stupid, stupid; that any model which allows for negative CMR (as a normal model invariably will) is unacceptable. Tell Roberts and the Lancet and the peer-reviewers. I do not have to defend this choice. I merely derive the implications of it. (I also think that the choice is a reasonable one, mainly because other choices (like truncated normal) would give similar answers.)

Could I ask some of the Deltoid regulars for comments on a dispute that dsquared and I are having about his post at Crooked Timber. dsquared claims:

The paper is a disaster. As the comments thread at Deltoid gradually teases out, it's full of silly mistakes (the author constantly fails to make a distinction between an estimate and its confidence interval) and is based on a fundamental misreading of the paper (in that it assumes that the relative risk rate was estimated parametrically using a normal distribution when it wasn't).

1) disaster, silly and full are in the eyes of the beholder, but no one can accuse dsquared of being overly generous.

2) Although I confess to sloppiness in the comment thread, I see no place in the paper where I actually confuse "an estimate and its confidence interval." If there is, please have mercy and point it out so I can correct the mistake.

3) I challenged dsquared to remove the claim that about how he relative risk was estimated. First, I know this! It has always been obvious to me that the bootstrap was used to estimate the relative risks just as the bootstrap was not used to estimate CMRs. dsquared was confused about this, not me. Second, nowhere do I assume this in the paper. Show me the line where this mistake is made. Third, there is no reason why I would need to assume it. I only manipulate 3 points with regard to the estimate of RR, the mean and the two bounds of the confidence interval. I do not care how the authors estimate those points. I just take them as given.

I think that even those who disagree with other aspects of the paper will admit that I did not make the "fundamental misreading" that dsquared accuses me of. (I may have made many other mistakes, perhaps more fundamental than that. But accusations must be accurate.)

Surely, even Robert, Sortition, SG and -- dare I hope? -- Tim will back me up on this . . . I may be guilty of many sins, but a failure to realize that the RR estimates come from a bootstrap is not one of them.

UPDATE.

For those following at home, here is an update on what I think that I have learned from this useful discussion.

1) We have confirmed, despite hundreds of words from dsquared and others, that the confidence intervals for CMR are normally distributed. No bootstrap is involved. Progress! If Tim had not posted my paper, I do not think that we would have learned this fact, so progress is being made. This means that the next version of the paper can just drop the entire second proof (which did not assume a normal). It is still correct, but tedious and unnecessary.

2) For a moment, I was truly worried about comments like 121, 124, 125. Who would not be intimidated when someone as smart as dsquared writes:

The continuum of correlation coefficients does not exhaust the space of "all possible bivariate normal distributions". There can be a wide variety of bivariate normal distributions which have the same correlation coefficient. If I recall correctly, David, your background is in empirical finance so you really ought to be aware of what a copula is.

Space exhaust, indeed! To be honest, I had forgotten that just because two quantities are normally distributed, it does not mean that their joint distribution is bivariate normal. Fortunately, as I demonstrate in #132, this does not matter. Because I know the means and variances of both marginal distributions, I can just check all possible values for the correlation and, Presto!, I am home free. But this does provide me with an excuse to add this trivial proof to the paper. You will see it in the next draft.

3) As far as I can tell, the most important criticism that I have yet to rebut comes from Sortition and (to some extent) dsquared. Basic claim is that the entire approach of the paper is wrong given that I am dealing with a Frequentist. This may be true! I am no expert in dealing with Frequentists in their own language. I provide a challenge in 115 to Sortition/dsquared to either give an example of a simple proof using their preferred approach or to point to a specific example in a textbook. Neither has risen to the challenge.

Again, my point is not that this critique has failed (as the other two clearly have). My point is that, without a concrete example of what sort of proof would pass muster, there is nothing that I can do.

David, I don't believe that you are answering our criticisms in good faith anymore. Several people have explained to you in several ways the reasons why the CI for the RR should be calculated using the bootstrap, and the information from the (symmetric) CIs of the CMR ignored in considering the results of this calculation. You have ignored all of them, even though the essence of your argument is that the confidence interval of the RR is inconsistent with the confidence intervals for the CMRs. Your whole paper rests on this conflict, everyone is telling you it's irrelevant, and you won't answer them. Not a good look.

Since you don't seem interested in our comments, I'm reluctant to add this, but just in case you do pay attention for a moment, here goes:

Section 1.0.2, your proof, contains an instructive contradiction which should point you in the direction of the problem you face. You have started from P(CMRpre>3.7)=0.975, and then shown that P(CMRpost<3.7)>0.025. Sounds fair, right? Now, start from P(CMRPost<23.2)=0.975, and then calculate P(CMRpre>12). It's almost 0 isn't it? 12 is more than 2 standard deviations away from 5.5, so the probability of this occurring is negligible. From this you can show that a RR>1 is more than 97.5% likely. The result you get from your proof depends on your starting point.

Why is this? Because you are in a situation of unequal variances. If anyone here knows anything about regression models, it is that this kind of heteroscedasticity leads to problems in simple arguments of the sort you present. Here due to Fallujah we have extreme hetersocedasticity.

You have to answer this problem, David Kane. You have to tell us: given standard conditions for standard calculations have completely broken down in this data set, why do you expect them to be used?

(Oh, and why do you suggest the quality of comments has declined due to dsquared's post on crooked timber? It could just as likely be Michelle Malkin's odious influence)

I especially seek comments from dsquared, my "friend", Sortition, Robert and SG on this topic.

Hey, what's up with the scare quotes on "friend"? ;)

SG writes:

David, I don't believe that you are answering our criticisms in good faith anymore. Several people have explained to you in several ways the reasons why the CI for the RR should be calculated using the bootstrap, and the information from the (symmetric) CIs of the CMR ignored in considering the results of this calculation

1) I am arguing in good faith.

2) I have no problem with the use of a bootstrap in calculating an RR (although I would like to see what the non-bootstrap result looked like).

3) You want the "information from the (symmetric) CIs of the CMR ignored?" I have made this point several times before and I am happy to make it again. The 98,000 excess death estimate derives from the CMR, not the relative risk. See the quote in my paper. See the article itself. Quoting is fun so I'll do it again.

"We estimated the death toll associated with the conflict by subtracting pre-invasion mortality from post-invasion mortality, and multiplying that rate by the estimated population of Iraq (assumed 24.4 million at the onset of the conflict) and by 17.8 months, the average period between the invasion and the survey."

If you say that we should ignore the CMR, then you must ignore the excess death estimate, the single most quoted figure in the whole affair. Is that your claim?

PS. To my "friend", the quotes are just to refer to you as opposed to all (?) my generic friends at Deltoid. By the way, if you could confirm that my proof about space exhaust above is correct, I would appreciate it.

David, I gave you a specific textbook, I can give you the page number, which outlines the problem and describes the solution - which is exactly the solution used by the Lancet authors. You ignored it.

David, point 3) is a deliberate misreading of my words. I said this information should be ignored in considering the results of a bootstrap calculation of the confidence interval for the RR. Please don't do that to me again.

You know full well that the 98000 figure is calculated with the Fallujah cluster removed, for all the reasons given in their paper and here, ad nauseum.

SG,

1) My textbook question has nothing to do with you or any comments that you have made. Sortition (and, perhaps, dsquared) have argued that proof written for Frequentist need some special phrasing that I do not use. They need to provide an example.

2) The thread is long so mentioning the specific post (I assume you mean 127) is helpful. Again, you seem to imply that Roberts et al were stupid to not use robust methods to calculate their CMR before using that CMR to estimate excess deaths. Perhaps! I am just taking their methods as is.

3) Malkin's post was up for a day (?) before the quality in comments went down.

4) You comment 106 is interesting but makes absolutely no sense to me. You claim:

RR=a/c

and the standard error of the log of RR by (approximately)

se=sqrt(1/a+1/c)

So, the standard error of the RR has nothing to do with sample size? This seems implausible to me.

5) Again, many people (not just SG) seem to argue that method A is the correct thing to do. Since I don't do A, I am an idiot. But I am not making any choices about methods. I am just using the authors stated results to derive a contradiction.

I have just realized that I have been challenged, and, by mindless reflex driven by sheer vanity, I am about to do my best to rise to the challenge.

> Imagine a paper which reports, in one section, a weight for a brick of 2 pounds (95% CI 1-3) and, in another section, a weight for the same brink of 10 pounds (95% CI 9-11). Assume that those confidence intervals are normally distributed and the paper is Frequentist.

> [... this is absurd ...]

> So, how does one demonstrate that absurdity within the frequentist paradigm?

I do suspect that this setup is different enough from the situation in the Lancet paper that proving an inconsistency here would not enable you to prove an inconsistency in the paper, but I will go along, in the hope that my small contribution will induce Malkin to mention me approvingly in her upcoming correction post as one of the "statisticians and math geeks" who turn the blogosphere into the "open-source intelligence-gathering medium" it is.

Before I address your question, however, let's make the setup clear, since you are being quite vague in your description.

Would you say, David Kane, that the weight of the brick is being measured using a scale that introduces normally distributed noise, with a known variance (say var=1lb.^2)?

Would the two CIs reported be based on the same measurement, or two separate measurements?

Would the two CIs be known to have been constructed using the same formula, or two different formulas?

Would you say that you know what formula is (or what formulas are) being used to construct the CIs from the measurement?

Nothing wrong with being driven by vanity! That, at least, is my motto.

I think best is that there is one brick but two different scales. The brick is weighed once on each scale. Feel free to answer all your other questions in whatever way makes the resulting proof as simple as possible.

Good luck!

David Kane writes *So, the standard error of the RR has nothing to do with sample size? This seems implausible to me.*

So if I reanalyzed the Lancet data using person-days instead of person-months, increasing the sample size by a factor of 30, the standard errors would be a lot smaller? What if I used person-minutes, or person-seconds! Wow! I hope David's right about this one!

David, fair enough re:1.

re: 2), I am saying that roberts et al did the right thing to use analytic calculations of confidence intervals of the CMR, then a robust estimation method for the confidence interval of the RR: the opposite of your inference.

re: 3) well there's a turn up

re: 4) in the standard calculation of an RR (using a 2x2 contingency table as the basis for your data), the confidence interval is very closely related to the sample size but is not dependent upon the confidence intervals of the point estimates of the cell contents. This is an important point with bearing on your topic. I will elaborate further here, and if you like I have a graph to illustrate it.

The standard method for calculating RRs assumes they are calculated from a 2x2 contingency table with cells a,b,c,d as here (I shall use pre- and post- as unexposed/exposed, okay?)

exposure dead | alive | total

pre a | b | a+b

post c | d | c+d

We calculate the RR as {a/(a+b)}/{c/(c+d)}.

Under the assumption that the cell counts are approximately binomial distributed (as counts are likely to be) the correct formula for standard error of the RR is:

sqrt{b/a(a+b)+d/d(c+d)}

So the total size of the exposed and non-exposed groups enters each formula and counts as sample size. But as you can see there is no information from the confidence intervals for these values.

Now, for a+b=c+d and small death numbers relative to sample size these formulae reduce to those in post 106.

Note the assumption of binomial distribution. This means we can calculate the std dev of the cell counts as (e.g.)

se=a*b/(a+b)

and therefore the confidence intervals for the cell counts are easily determined.

In fact, from these formulae you can calculate the analytic form of the RR for the case pre-=5, post-=12. The RR is 2.4 and the confidence interval is (0.85, 6.8). So it includes 1! BUT, you can also calculate the confidence intervals for the point estimate of the post-invasion death rate: it is 5.25 to 18.7. So the relative risk is non-significant but the confidence intervals don't overlap!!! So if you use the confidence intervals for the point estimates to infer changes in deaths, you determine that they increased; but if you use the RR, they didn't increase.

(It's the opposite case to that you worry about in your paper, occurring for the same figures).

This, David, is a fundamental property of Relative Risks. Your paper is an attempt to prove that this fundamental property exists, but we already know this and we don't think it's a problem. Further, the cells in the 2x2 table for the Lancet paper don't follow a binomial distribution, so the whole issue is moot in any case - we need to use robust estimation. (Plus of course this is a more complex model because of the role of cluster sampling in the variance, and correlation between pre and post values, which is why a regression model has been used).

Ok. I'll consider two cases:

1. Assume you know that the CIs are generated in the standard way for the mean of a normal, i.e., the midpoints of the CIs are equal to the observations. In that case H0 (the hypothesis that the model is as described - same brick weighed on two identical scales, with N(0,1) noise) would imply that the difference between the two midpoints is distributed normally with mean 0 and variance 2. The p-value (two sided) for a difference of 10 - 2 = 8 is then about 1.5e-8, so H0 can be confidently rejected.

2. Assume you know nothing about the CIs except for the fact that they are both level 5% confidence intervals. In that case, the probability that they both contain the unknown mean (and therefore must overlap) is no less 100% - 5% - 5% = 90%. Therefore, under H0, the event that the CIs do not overlap has a probability of at most 10% - not clearly impossible, but unlikely. H0 can be rejected at the 10% level.

Now, as far as I can tell, neither of those cases is analogous to the Lancet situation. The Lancet situation is more like having two CIs, both based on the same measurement, with one of them (the one generated by dividing the intervals for the CMRs) much larger than the other (the one stated for the RR), and you claiming that the second one can't be that small if the first one is so large.

1) Ragout is funny!

2) I appreciate the most recent comments from SG and Sortition. Indeed, I sought a thread here at Deltoid for precisely these sorts of comments. However, I am off to Salt Lake City tomorrow! So, I won't have the time to give these comments the responses they deserve.

3) Perhaps Tim will be so kind as to post the next version of the paper once I incorporate this feedback and that from ASA. It seems many people at Deltoid are interested in the topic and the resulting discussion is high quality. Also, I feel that we have made a lot of progress (see #153), so the discussion can move forward.

SG,

What confidence intervals don't overlap? The CI around the pre-invasion death rate is (0.7, 9.3), and around the post-invasion death rate is (5.6, 18.4). So these CIs overlap. Is your point that the post-invasion lower bound, 5.6 > 5.0, the pre-invasion death rate?

I should have added that I was using the values from SG's example, not from the Lancet paper.

In #151, David Kane wrote:

You keep claiming the the CMR estimate were just "summary statistics." Not important. Pay no attention. Just move along. But as the paper states (and I quote!) that the CMR estimates served as the basis for the headline 98,000 figure (8,000 -- 194,000). So, if you consider that number important, you need to take their model seriously.

David, David, David. David. Of course I take their model seriously. You need the estimated CMRs to do the excess mortality calculation but you don't need the CIs on those estimated CMRs either for the calculation of excess mortality or for the calculation of the CI on the estimate of excess mortality. So the way they calculated or reported the CI on the CMRs is irrelevant. It's just a summary statistic and that, too, was reported in a completely standard way. It's not used in any further calculation. They could've made a huge error in estimating those CIs and it wouldn't change any other conclusion. Neither the 98000 number, nor the 264000 number, nor the (8000--192000) interval, nor the (70000--975000) interval, depend on the CIs of the CMRs, with or without Falluja, parametric or nonparametric, symmetric or asymmetric. That's what they did, that's what they reported they did, that's what anyone who is well-trained in the field would do.

David, your entire thesis is based on a calculation that wasn't even reported -- and it wasn't reported not to hide it but because it was so unimportant to the central conclusion of the article that no one cared. I would think this is a problem for your paper but then I live in the reality-based world.

Are you really supposed to present this on Monday? Ouch. Anyway, on to more important stuff: think Evans can take 1:51 out of Contador?

David Kane- the conclusion of public interest from the 2004 Roberts study was that, more likely than not, there were at least 100,000 excess deaths resulting from the invasion.

You conclude that, more likely than not, there were at least 264,000 excess deaths.

Therefore, don't you agree with Roberts that, more likely than not, the invasion led to at least 100,000 excess deaths?

Based on your analysis, what is the probability that there were at least 100,000 excess deaths?

Robert,

I think David is right that 98,000 (CI=8000, 192000) figure was calculated by differencing the CMRs, calculating the CI of the difference in the CMRs and multiplying these numbers by the population of Iraq and the follow-up time. That seems like the natural way to do the calculation, and it seems to be what Roberts et al say they are doing.

Roberts et al wrote

We estimated the death toll associated with the conflict by subtracting preinvasion mortality from post-invasion mortality, and multiplying that rate by the estimated population of Iraq (assumed 24Â·4 million at the onset of the conflict) and by 17Â·8 months, the average period between the invasion and the survey.

Ragout:

I agree that's how they calculated the 98000, and I agree that it's the natural way to do the calculation. However, they say: "[t]he confidence intervals reported are those obtained by bootstrapping." You don't need the CIs on the CMRs in order to get the bootstrap CI.

Ragout, sorry, the CI for the post-invasion figure doesn't overlap the pre-invasion point estimate is what I meant. Too many sentences with the words Relative risk and confidence interval in them.

note also that the paper doesn't report a CI or a point estimate (other than a vague comment that it would be about 200k higher) for the cum-Fallujah data, because the specific Fallujah cluster studied is probably an outlier even for the town of Fallujah. David's assertion is not only that they should have included this cluster but that they should have done so in an utterly brainless way. (He specifically says in this thread that they failed to include a normal-distribution CI for with-Fallujah data in the paper on purpose in order to mislead the reader).

We have a factual disagreement about how the 8,000 -- 194,000 CI was calculated. I agree with Ragout that the CMR was used without any bootstrapping. Robert disagrees and quotes the same sentence that caused dsquared (and others) to believe that bootstrapping was used to calculate the CMR CI. This is the exact same mistake that dsquared made. The bootstrap sentence applies only to the proceeding couple of sentences about relative risk. Only the relative risk CIs were calculated with the bootstrap. Everything else is standard, whatever STATA spits out.

Now, do I know this? No! I can't see the code! But the burden of proof is clearly on Robert. He can't just quote the bootstrap sentence, a sentence that we know does not apply everywhere in the paper, and then claim that it applies to an estimation on a different page of the paper. (Perhaps he could ask Les Roberts for clarification.)

On a side note, I had never thought to check Ragout's description of the precise methodology. In retrospect, it seems obvious, much better than the total hack that I use. But can anyone point to a way to doublecheck his claim using the data that we have? That is, I think that the paper reports post-war CMR without Falluja but does not report pre-war without Falluja. So, how can we check if Ragout is correct? Suggestions welcome!

David Kane challenged:

The bootstrap sentence applies only to the proceeding couple of sentences about relative risk. Only the relative risk CIs were calculated with the bootstrap. Everything else is standard, whatever STATA spits out.

David, can you explain why the quoted excess mortality CI is not symmetric around 98000?

In more important news, David Millar's rear wheel disintegrated just out of the start house.

Robert asks (reasonably!) if I can "explain why the quoted excess mortality CI is not symmetric around 98000?" No! I do not know the formula that they used to get that CI. I provide a guess in the paper which comes close, but is not spot on. Can anyone? The authors won't answer my question.

dsquared claims:

He specifically says in this thread that they failed to include a normal-distribution CI for with-Fallujah data in the paper on purpose in order to mislead the reader

The biggest puzzle for me is: Why do they not report an excess death estimate and CI for all the data, including Falluja? They do this for RR, so why not for excess deaths? My guess, as described in the paper, is that, if they did (using whatever methodology they are use), the answer would have a lower bound of -150,000 or so. They knew this but specifically failed to report the number then (and continue to obfuscate now) because of their larger political goals.

So, a question for dsquared (and others): What would the lower bound of the confidence interval for excess deaths be if the authors used exactly the same code as they used to get the 8,000 number but now included Falluja?

I think that almost all readers assumed that it would be much greater than 8,000. After all, look at the findings! The RR lower bound goes from 1.1 to 1.5 with Falluja. If the RR lower bound went up then, obviously, the excess lower death lower bound went up.

I think all the readers who made that assumption were wrong. This is the most important implication of my paper.

The lower bound would be 70 000, as Robert said in post #169.

Obviously this isn't exactly the same code, but it's close enough to replicate the (8 000, 194 000) confidence interval.

btw, can anyone tell me why the original data and code can't be released? I heard it was something to do with the IRB, but not what their specific reasons were. I notice that there was enough data released for Robert to replicate their results.

David Kane writes:

They knew this but specifically failed to report the number then (and continue to obfuscate now) because of their larger political goals.

David Kane, this strikes me as a fairly outrageous accusation, particularly since it is based on your guesswork. Also, given the concerns expressed here by a few others about your motives in making your paper in its present form available to a "wider audience," perhaps it would be wiser to keep your accusations of improper behavior by others within the bounds of those things that you can establish on the basis of fact rather than speculation.

On the other hand, nothing about your paper, its presentation in this forum, or your comments here and at Crooked Timeber indicates caution. On that basis, perhaps you'd care to fill in the blanks in your theory of scientific misconduct. Would you care to give us a fuller description of the political motives and extent of scientific misconduct of these authors? I'm eager to see how you apply facts and logic in clarifying your position on this matter.

Kane writes at p. 13, "It is impossible to be 95% confident that there was an increase in mortality."

Assuming Kane is correct, it appears that his calculations show that it is possible to be 83% confident that there was an increase in mortality.

I understand that the conventions of statistics require, as an arbitrary standard, that something must be true to within a probability of 19 out of twenty. But as a matter of public policy no one requires that level of confidence when making decisions.

Mr Kane, is it your conclusion that the probability is five out of six that there were excess deaths?

dsquared,

note also that the paper doesn't report a CI or a point estimate (other than a vague comment that it would be about 200k higher) for the cum-Fallujah data, because the specific Fallujah cluster studied is probably an outlier even for the town of Fallujah. David's assertion is not only that they should have included this cluster but that they should have done so in an utterly brainless way. (He specifically says in this thread that they failed to include a normal-distribution CI for with-Fallujah data in the paper on purpose in order to mislead the reader).

The paper relies heavily but very selectively on the Fallujah cluster. The main finding/Interpretation states:

Making conservative assumptions, we think that about 100000 excess deaths, or more have happened since the 2003 invasion of Iraq. Violence accounted for most of the excess deaths and air strikes from coalition forces
accounted for most violent deaths.

The first sentence is based on a data set without the Fallujah cluster. The second sentence is based on a data set with the Fallujah cluster. The authors appear to cherry pick their data in order to support whatever point they want to make at the moment.

If Fallujah is a significant outlier, then it should be excluded from the main finding. If it is not, then it should be included in all calculations. Instead the authors very selectively use or exclude the Fallujah cluster, often without any notice they are doing so.

Kane is correct. There is no particular reason to exclude Fallujah. The study established no criteria a priori for eliminating any particular cluster. The only reason for doing so is that it makes the results of the study nonsensical. However, this indicates that the study's methodology is inherently flawed. A well designed study would have never have produced such a massive outlier.

I feel like I'm in one of those nostalgia shows, like "I Love 2004" now. You presented all of the arguments in that post on your own blog two years ago, Shannon, and it is not as if the news in the intervening period has been good for the case that things were actually OK in Iraq, is it? Why not just link to the Chicago Boyz blog so that nostalgia enthusiasts can wallow, and give up? The idea that "a well designed study would not have a big outlier" is as daft now as it was then, and the study's treatment of the outlier that was found in the dataset is as sensible and consistent now as it was then.

Hey, whatever happened to the "I Kiss You!" guy?

Ah- Shannon Love responds to me, although obliquely. Love says that Kane has shown, not merely that the analysis performed by Roberts is incorrect, but that the methodology is inherently flawed because it produced a massive outlier. Kane himself does not claim that the methodology is inherently flawed, merely that the analysis was performed incorrectly. Unless one claims that the methodology is flawed, one can draw conclusions from Kane that are even more disturbing than the conclusions one can draw from Roberts. Love now says that no conclusions at all can be drawn. Kane doesn't say this.

But Love's complaint seems to have nothing to do with Kane's analysis. Love says that the presence of the outlier demonstrates a methodological flaw. The presence of the outlier has been known since the paper was published. Kane didn't reveal it.

And why does the presence of an outlier demonstrate a methodological flaw? I can see how that would be the case if you were sampling for a phenomenon that you knew had a random distribution, but where you know that you have a unique situation - like a city targeted for destruction - why does the presence of the outlier indicate a problem with the methodology?

What am I missing?

Bloix asked:

What am I missing?

Well, one of the things you're missing is that David Kane already charged that the methodology was inherently flawed, and was shown to be wrong. Now he's switched to claiming that the analysis is inherently flawed.

David quite publicly charged that the data collection was fraudulent, based on an incorrect understanding of response rates. He got shot down in about 10 minutes when google searches immediately found counter-examples that he never bothered to seek. Nonetheless, David has staying power, and concocted a set of conditions that weren't as easily google-disprovable. And he's tried another tack: that the paper hasn't enough information in it to be directly and exactly replicable according to David's standard of replication, and he was trying to get the ASA to pass a resolution that papers that aren't directly and exactly replicable should be ignored. And he tried putting together a statistical vignette a year or so ago showing (in a less sophisticated fashion) that one could not reject the null hypothesis that mortality changed pre- and post-invasion. I don't know about you, but when someone comes up with unrelated sequential criticisms, each of which always come up the same way, each of which appear after the previous criticism is shown to be flawed, I start to wonder whether the criticisms are sincere. But maybe I'm just a suspicious kind of guy.

The bottom line is this: you're having difficulty understanding why a positive outlier can be used to discredit the entire estimate? The reason why you're having difficulty is because it doesn't make sense.

Shannon Love wrote...

"... There is no particular reason to exclude Fallujah. The study established no criteria a priori for eliminating any particular cluster. The only reason for doing so is that it makes the results of the study nonsensical. However, this indicates that the study's methodology is inherently flawed. A well designed study would have never have produced such a massive outlier."

...and Shannon Love is an expert on designing studies for war zones, I suppose?

Besides, most analyses begin with a preliminary exploratory analysis or Initial Data Analysis (IDA) consisting of simple charts and plots, and perhaps some preliminary model fitting. Based on the initial analysis, the deeper business of fitting a more sophisticated model can begin. There is ample scope, I believe, to decide to exclude the Falluja cluster between the IDA and the full analysis. Just look at Figure 1 in David Kane's paper.

Let us hit the bricks. If you want a good analogy, any one who makes a statistical argument that the weight of one of the bricks could be negative should stand there while we throw bricks at him.

As to the Kurdish north, death rates were already low there as Sadaam's writ did not run under the no fly zone.

Why do they not report an excess death estimate and CI for all the data, including Falluja?

Because you would have instead jumped on the paper for presenting such a stupid model?

Basically, David, if your argument is 'when someone rejects a model that's prima facie dumb, we must presume bad faith and ill intent, rather than an ability to identify dumb models', I'd suggest making that claim when subject to English libel laws.

You've already come pretty close, as Robert notes. It's taken a hundred comments for it to become clear that your aim is to sling precisely the same insinuations, albeit wrapped in a different cake of mud.

In #177, David wrote:

The biggest puzzle for me is: Why do they not report an excess death estimate and CI for all the data, including Falluja? They do this for RR, so why not for excess deaths? My guess, as described in the paper, is that, if they did (using whatever methodology they are use), the answer would have a lower bound of -150,000 or so. They knew this but specifically failed to report the number then (and continue to obfuscate now) because of their larger political goals.

So, a question for dsquared (and others): What would the lower bound of the confidence interval for excess deaths be if the authors used exactly the same code as they used to get the 8,000 number but now included Falluja?

I think that almost all readers assumed that it would be much greater than 8,000. After all, look at the findings! The RR lower bound goes from 1.1 to 1.5 with Falluja. If the RR lower bound went up then, obviously, the excess lower death lower bound went up.

I think all the readers who made that assumption were wrong. This is the most important implication of my paper.

I think the implications of your paper are quite different. It is now clear that your argument is predicated on an unseen result (the CI for estimated excess mortality when Falluja is included) that you contend was purposefully suppressed.

However, you reject along the way every suggestion that the CIs on the excess mortality estimate were bootstrapped. OK. But the estimates of excess mortality and their CIs that I produced both with and without Falluja via the bootstrap are completely independent of any assumption of the CI of the CMRs. I didn't use that information at all. All I used were the data on deaths and person-months of exposure.

Thus, I believe we can now all agree that an independent estimate (viz., mine) using a method that you say is different from the one in the paper (viz., bootstrapping) verifies the seen results. But, more than that, using the exact same code, when Falluja is included the CI for excess mortality still doesn't include zero. So a completely independent analysis using a completely different method verifies the published result and provides a reasonable value for the unpublished result that undercuts your claim that the CI straddles zero.

So: parallel analysis, independently performed, using different approach gives same answers. This, I would think, is a problem for your argument.

Kane,

The most important implication of your paper is that you have some fundamental misunderstandings of statistics, and that you are unwilling to admit (to yourself and to others) that these misunderstandings have caused you to stake an untenable position.

A much less important implication of your paper (mainly because it should be obvious) is that if you use statistical analysis that is based on completely inapporpriate assumptions, you may very well get completely absurd results.

Robert makes an excellent point in 188. It is precisely because I seek out well-informed thoughtful criticism that I asked Tim to post my paper.

And, for the record, Robert could easily be right! If the Lancet authors would just show us the code, we could settle things quite easily.

Robert writes:

It is now clear that your argument is predicated on an unseen result (the CI for estimated excess mortality when Falluja is included) that you contend was purposefully suppressed.

Well, it isn't so much "predicated" on that --- nowhere do I assume that --- as only interesting if that is true. If I am wrong that the lower CI for excess deaths is below zero with Falluja, then the paper is a waste of time.

However, you reject along the way every suggestion that the CIs on the excess mortality estimate were bootstrapped. OK.

Correct! And with, I think, good reason! We spent a bunch of time above with very smart people (dsquared and others) arguing that the CMR was bootstrapped. It was not and they were wrong. I "reject[ed]" every suggestion for good reason on that one. Surely, you agree that it is *possible* that I am right to do so again. Moreover, let me quote again the only information the paper gives us on how excess deaths were estimated.

"We estimated the death toll associated with the conflict by subtracting pre-invasion mortality from post-invasion mortality, and multiplying that rate by the estimated population of Iraq (assumed 24.4 million at the onset of the conflict) and by 17.8 months, the average period between the invasion and the survey."

We know that elsewhere in the made CMR was estimated without the bootstrap. I am assuming that the same CMR estimate is used here. Surely that is the most natural assumption to make. If the authors are going to use a totally different estimate of CMR, derived in some different fashion, then they have an obligation to tell us.

Robert, am I unreasonable to make that assumption? (I agree that the best evidence against that assumption from the paper is that the CI for excess deaths is not symmetric.)

Robert goes on.

But the estimates of excess mortality and their CIs that I produced both with and without Falluja via the bootstrap are completely independent of any assumption of the CI of the CMRs. I didn't use that information at all. All I used were the data on deaths and person-months of exposure.

True. Again, my claim is based solely on (my reading of) the description provided by the paper.

Thus, I believe we can now all agree that an independent estimate (viz., mine) using a method that you say is different from the one in the paper (viz., bootstrapping) verifies the seen results.

It depend on what you mean by verify. You get an answer that is similar but not identical to theirs. I also verify their reported results (see Table 2) by getting an answer that is similar but not identical to theirs. Our methods, when including Falluja, then give vastly different answers.

The question is: Which of us has guessed correctly as to what the authors actually did? Could you post (either in the comments or elsewhere) the actual code that you used and the results (numbers, not just graphics). I would like to play around with your simulation myself.

I agree that this critique against my paper (that the excess deaths were bootstrapped) is far and away the most powerful one presented in this thread. I still maintain that what I did (assume CMR was estimated the same way throughout the paper without seeing evidence to the contrary) was sensible, but I could be wrong in my conclusions.

By the way, Salt Lake City is wonderful. If there are any Deltoid readers in the area, please drop me an e-mail. Coffee is on me!

Also, further evidence (untested as yet by me, honestly!) as to whether Robert or I is correct comes from Gilbert Burnham's presentation at MIT, which I quote. Burnham claims that:

"Now this is what the confidence intervals would look like. There is a 10% probability that it [the number of excess deaths] was less than 44,000 and only a 2.5% chance that it was less than 8,000."

So, now we have four numbers to try to match on, the 2.5th, 10th, 50th and 97.5th percentiles. I swear that I have not yet checked to see if my guess at the methodology comes close to the 44,000 number.

A minor point for clarification. I use in the paper this estimate for post-war mortality excluding Falluja.

If the Falluja cluster is excluded, the post-attack mortality is 7Â·9 per 1000 people per year (95% CI 5Â·6-10Â·2; design effect=2Â·0).

Because no estimate for pre-war mortality excluding Falluja is provided in the paper (correct me if that is untrue), I just subtract 5.0 from this. So, the only uncertainty in this calculation comes from the estimate of post-war mortality.

Now, this is obviously suspect. One ought to include the uncertainty associated with the pre-war mortality estimate as well. But my approach seemed to get close to what the authors reported, so that is what I went with in this version of the paper.

If anyone can produce the 8,000 -- 98,000 -- 194,000 estimates more closely, please tell me how you did it.

David,

Re #190:

You're still confusing the CMR estimates with the CIs for the CMR estimates.

The CIs were (wrongly) calculated using a normal distribution. The CIs are only summary statistics, so this mistake does not effect the paper's conclusions.

The CMR estimates were calculated using division. You take the number of deaths, and divide by population*time.

It is the CMR estimates (calculated using division) that are used later to calculate the excess deaths.

David Kane asked:

Could you post (either in the comments or elsewhere) the actual code that you used and the results (numbers, not just graphics). I would like to play around with your simulation myself.

Sigh. Exactly 13 months ago I pointed to my R script right [here](http://scienceblogs.com/deltoid/2006/06/ibc_vs_les_roberts.php#comment-…) . Anyway, I've updated it with a [new script](http://anonymous.coward.free.fr/misc/iraq-bootstrap.r) that includes a RR calculation. It will show you how to calculate the pre-invasion CMR. Use it to examine the actual numbers in addition to the graphs. I think one of the bootstrapped CIs for the number of excess deaths excluding Falluja was consistent with a 10% probability under 43000.

David:

Two more things.

1. If you don't know how to calculate a CMR, is this really the pool you should be swimming in?

2. That was a pretty exciting TT. Cadel Evans made up 1:27 of the 1:51 he needed to take the Maillot Jaune and still nearly got squoze out of 2nd by Leipheimer. The 31 seconds separating the top three spots makes it the most competitive Tour ever.

Did I hear my name? :D

Anyways, the real fatal flaw of this study is still the fact that Roberts et al paired up provinces based upon a belief that they were similar in violence (providing absolutely no evidence for this belief). They then went on to oversample one of the provinces, and not sample the other in each pair.

So unless y'all are fans of faith-based science...

All this mumbo jumbo about CIs and variance is meaningless when it is likely that the sample they used was not representative of Iraq in the first place.

The UNDP and their study internals will tell us the answer - if only Roberts et al cared about vindicating their study... Except for the fact that the UNDP internals will probably blow their study out of the water. Maybe I should write to the Norwegian entity that did the data collection for the UNDP again...

David, can you please answer these two questions:

1) Is your proof in section 1.0.2 valid, given that if you use the same methods and start from the other end of the confidence interval you get opposite results?(See post 155, around the middle).

2) Are you aware that the contradiction between CIs of point estimates and CIs of relative risks of those point estimates is not actually unusual, and given this how does this change your findings? (See post 164; you can find additional examples of the problem yourself easily enough).

David, re space exhaust (157,154,132):

the difference between two normal variables is itself normally distributed with a variance of v(x) + v(y) + 2*cov(xy).

The formula for the variance of the difference has a minor sign error; it should be v(x)+v(y)-2*cov(x,y). That formula is valid for arbitrary distributions x,y with v(x),v(y) finite. However, the difference of normals need not be normal. E.g. take x standard normal and y = x if |x| < 1 and y = -x if {x| >= 1. Then x,y are normal, but y-x is not.

Thanks again to all for the excellent comments. Here is my wrap-up (still in Salt Lake City and looking forward to Roberts' talk on Wednesday).

1) I am finishing up another draft of the article. This will get included in an updated R package. Perhaps Tim will be kind enough to link to it so that we might start the conversation afresh. If any of the commentators would like to e-mail me his real name, I would be happy to acknowledge him in the paper. For now, I have added a blanket thanks to Deltoid commentators.

2) SG's point number one in 196 is interesting. I do not have a good answer, at first glance. Fortunately, I can get away with ignoring this issue because I have cut out this version of the proof (the one that did not assume normal distribution) because, now that the Lancet authors have told us that estimates of CMR are normal, I don't need it anymore. I agree with SG that, were I to keep the proof, I would need to deal with his objection.

3)SG's point 2) is also interesting. This still matters to the paper because I still center the paper on the contradiction between the CMR and RR estimates, just as SG describes. I can't grok SG's argument fully, but I propose postponing this discussion until the next draft is posted, either at Deltoid or elsewhere. SG is the only one to have raised this objection (which doesn't make it right or wrong).

4) Thanks to my friend for pointing out the typo. Fortunately, I caught that and fixed the version distributed this morning at ASA. My friend is also correct, of course, that the difference of two random normals does not, itself, need to be normal. The version of the paper that I just handed out handles this a bit better than the one posted here, but it still needs a major clean up.

197: y = -abs(x) is _not_ normal. The difference of two independent normal random variables is in fact normal.

198: there is no contradiction between the CMR and RR estimates.

The contradiction is between the confidence intervals of the CMR and RR estimates.

While the CMR estimates were used to calculate the RR, the CIs for the CMR estimates were not used.

Tim's very first comment is correct - all you have proved is that the CMRpost (w/ Fallujah) CI is incorrect.

Pete, I think you misread 197, which is presenting a classic text book case, I seem to recall. The Random Variable in question is not y=-abs(x).

David Kane are you going to alert Malkin and her odious ilk of the changes to your paper, and tell her that she should hold off the triumphalism until you have finished the discussion here? After all, you have removed one proof in support of your claim, and have yet to come up with a convincing rebuttal of another problem. Perhaps in the interests of honest debate you should alert Michelle to the possibility your final findings are, in fact, going to be trivial?

1) I am confused. dsquared (and others) claim I am a bad person for allowing Malkin to reprint. You seem to suggest that I should provide her with a new draft (with the same conclusion) so that she can trumpet the result all over again. Which it is?

2) While it is true that I have cut the second proof, I did so because it was now superfluous (since we now have confirmation from the authors that the estimate of CMR is normally distributed). The proof is still correct -- at least no one has pointed out an error.

3) Which problem have I not come up with a "convincing rebuttal of?" Specifics, please. Everything still works as long as I assume independence between estimates of CMR pre and CMR post. That may or may not be an reasonable assumption from your point of view, but no one has had a problem with it at JSM.

4) I agree that it would still be nice to not assume independence and that, before, I was wrong to think that the difference between two normals must be normally distributed, regardless of their correlation. Fortunately, I may have a work-round for that.

5) I do not think that my final findings are going to be trivial, but let's wait for the next draft. I still maintain that if the CMR confidence intervals are true, the RR confidence intervals must be false.

As long as at least some members of the Deltoid community are interested, I am hopeful that Tim will agree to post the next version. (Sound off if you want to see it so Tim knows that you care.)

I think that posting it at Deltoid is wonderful for at least two reasons. First, many of the most Lancet-knowledgeable people in the world hang out here, so the audience is perfect. Second, since I give a copy to Tim to post, readers can be sure that I am not doing nasty things like changing the text while the discussion is underway.

By the way, I am pleasantly surprised at the number of people at JSM who care about the Lancet surveys *and* are highly skeptical. Mine is not the last criticism that will appear. Just wait till the demographers chime in . . .

Thanks SG, I had misread #197 (x,y are both normal, but not independent, so their difference is not normal)

David:

"if the CMR confidence intervals are true, the RR confidence intervals must be false"

But the CMR CIs are false! So your paper says nothing about the RR estimate or its confidence intervals.

> The proof is still correct -- at least no one has pointed out an error.

> Which problem have I not come up with a "convincing rebuttal of?"

Are you kidding? As I have pointed out at least a half dozen times, your _entire_ paper (and in particular both of your "proofs") is based on a complete misunderstanding of the statistics of the Lancet paper, i.e., of basic frequentist statistics.

[I still center the paper on the contradiction between the CMR and RR estimates]

no such contradiction, as Pete correctly says.

[ I am confused. dsquared (and others) claim I am a bad person for allowing Malkin to reprint. You seem to suggest that I should provide her with a new draft (with the same conclusion) so that she can trumpet the result all over again. Which it is?]

"First you say you don't want the pony, then you tell me to get rid of the pony! Make up your mind!" - Homer Simpson.

I entirely agree that you should try to repair some of the damage you caused with the Malkin debacle, although this should be in the form of telling her that the paper has been withdrawn and contains serious errors rather than circulating version 2.0 (particularly if, as I forecast, that one will also contain serious errors).

[While it is true that I have cut the second proof, I did so because it was now superfluous (since we now have confirmation from the authors that the estimate of CMR is normally distributed). The proof is still correct -- at least no one has pointed out an error]

the error is in the implicit assumption that it is relevant to the conclusions of Roberts et al (2004) when it is not. This has only been pointed out to you like a zillion times.

I am confused. dsquared (and others) claim I am a bad person for allowing Malkin to reprint. You seem to suggest that I should provide her with a new draft (with the same conclusion) so that she can trumpet the result all over again. Which it is?

The honest thing to do - if honesty matters to you, which I doubt, having tracked this entire thread - would be to tell her that a group of professional statisticians have pointed out in painful detail why they believe your paper is worthless.

There's really no reason for you not to do so. I'm sure she and the rest of the right-wing blogosphere will gleefully point out that this simply proves that professional statisticians - like climatologists and evolutionary biologists - are part of a vast left-wing conspiracy against truth and reason. And that their efforts to educate you are simply an attempt to suppress the truth.

You're in good company, David. Hope you enjoy it.

David, the arguments against your paper which you have failed to so far rebut convincingly, are:

1) The Zombie Problem: your contention that Fallujah is a reasonable type of outlier to expect in this normal distribution has been disputed, with the absence of zombie militia as the main point against it

2) The assault on standards: it has been repeatedly observed that the three steps of: data description (normal distribution) - basic results (log linear regression) - improved results (robust estimation) are a standard approach to this type of data, followed in a standard way in this paper. You have argued against this standard approach by arguing against robust estimation AND the presentation of basic results

3) Frequent Frequentist Failures: By standing a proof (section 1.0.2) on a data set with unequal variances, you have opened yourself to the obvious claim of a contradictory proof using the other variance (a first year stats error, surely). Also it has been pointed out to you that the "contradiction" you observe between confidence intervals of the RR and the CMR is a normal problem of RRs.

2a) Treating the Normal Distribution as God: even when you know it's not, and we god-fearing liberals think this breaks a commandment somwhere

4) Irreproducible claims of irreproducibility: you claim repeatedly that you can't reproduce the results of the article, yet commenters have subsequently contradicted you and provided data/plots to support their claim

5) Insidious Insinuations: particularly, without making any attempt to present your own CIs, you have claimed that the original authors knew their RR didn't fit a pre-ordained, politically motivated model, and therefore excluded its CIs for non-experimental reasons. You insinuate this is the reason for their use of robust estimation methods, when in fact 2) is the reason for this.

6) Experimental Ignorance: you have refused to accept (or at least discuss) the possibility of covariates, or an experimental design which allowed for occasional outliers or post-hoc analysis of unexpected covariates, even though these are both standard practice in modern Epidemiology

and worst of all:

7) Sheer colossal craziness: in that you insist the model should conclude there may have been a reduction in deaths as a direct consequence of finding a town where everybody was slaughtered

(I'm sure I missed some points too - I haven't even touched on dsquared's hated faux-Bayesianism, because I don't understand it).

So yeah, David, you may feel no-one has pointed out an error, but there do seem to be a few small niggling problems.

David Kane,

I've noted that you failed to respond to my request in comment 179 to provide evidence to support your assertion of scientific misconduct:

They knew this but specifically failed to report the number then (and continue to obfuscate now) because of their larger political goals.

David Kane, will you please either retract this accusation or provide clear supporting evidence?

David Kane, I am perplexed that you asked for comments here and plan to ask for more when you don't seem to be paying any attention to them.

At this stage I think you should retract your paper. It's wrong for the reasons given in excruciating detail in this thread.

David Kane wrote:

Just wait till the demographers chime in ...

Hmmm. Well, you've ignored everyone else's comments and criticisms. Why should the demographers get left out?

David, exactly what do you hope to accomplish with this paper? Let's assume for a moment that your models and assumptions are all entirely correct. In this case, your analysis seems to suggest both of the following:

1. The 2004 Lancet study indicates that there is a 10% chance that mortality decreased in the wake of the invasion, and the number of lives saved (the lower bound) may be as high as 130,000.

2. Although it is possible that mortality declined, it is more likely that it increased. The mean excess death estimate is roughly 264,000, and the upper bound is 659,000 excess deaths.

You stated that "the most important implication" of your paper is that by including Fallujah, the excess death lower bound went down. Your paper also implies, however, that the excess death upper bound went up. Why is that not the most important implication of your paper? Is there some reason that a decrease in the lowest plausible figure is more important than an increase in the highest plausible figure?

"If I am wrong that the lower CI for excess deaths is below zero with Falluja, then the paper is a waste of time." I hope one of the statisticians here will clarify something for me: why would it matter whether or not the lower CI is below zero? In the paper, you state that "Any empirical researcher is vaguely suspicious of a result which just barely rejects the primary null hypothesis..." But why do I care about the hypothesis? I don't care what the people conducting the survey expected to find. I care about what they did find.

Maybe your perspective is that of a portfolio manager: Positive, or negative. Buy, or sell. (You're not still trying to sell us on the idea of invading, are you??)

As far as I can tell, even if your argument is mathematically correct, you are simply arguing in favor of a less precise interpretation of the results of the survey... and while you seem determined to focus on the decreasing lower bound, you could just as easily focus on the increasing upper bound, or the higher mean estimate.

I don't believe, however, that you think your method of interpreting the data gives you a more accurate result. Do you think your mean estimate is closer to reality than their mean estimate? And if the answer is no, why shouldn't we conclude that your method of interpreting the data is worse than theirs? I have to come back to something Donald Johnson said earlier in the thread: "Maybe the fact that you find it difficult to treat the Fallujah outlier within some set of mathematical models without coming to this schizoid conclusion gives us some reason to suspect the mathematical models aren't appropriate here."

I assume that you would respond to this by saying that the problem isn't with your analysis: the problem is with their data. If that is the case, then you should be focusing your arguments on the data, and not on the analysis. The only reason I can think of for focusing on the analysis is that mathematics offers that wonderful illusion of certainty: "Look! I have mathematical proof!"

For what it's worth, I have doubts about the accuracy of the Lancet's estimates; but I believe that they need to be taken seriously, and the right-wing posturing ("I knew all along that it was garbage!") is not getting us any closer to the truth.

Your first footnote states that your paper is "part of a larger project" critiquing the Lancet studies. I hope the goal of the "larger project" is to provide a more accurate accounting of Iraqi deaths. If the goal is simply to seek the Holy Grail of the Low, Lower, Lowest Lower Bound... then alas, I don't believe you will find many to join your quest at this particular castle. You might, however, find some useful algorithms at http://www.style.org/unladenswallow/.

Just back from Salt Lake City and Les Robert's presentation yesterday. I will post something on this and, if Tim thinks that it would be of interest to the Deltoid community, I am sure that he will bring it to your attention.

Comments:

1) This thread has grown a bit unwieldy for me to process, but I hope to continue the conversation if Tim is kind enough to post the next version of my paper.

2) I appreciate the comments from all. This is the way that science should be done.

3) Tim writes:

David Kane, I am perplexed that you asked for comments here and plan to ask for more when you don't seem to be paying any attention to them.

But I am! I will send you the next draft of the paper so that you (and others) can see for yourself. That draft will be non-trivially different from this one and the major cause of the differences will be these comments. (Some will, no doubt, still think that the paper is useless, but that is a different issue.)

4) Robert writes:

Well, you've ignored everyone else's comments and criticisms. Why should the demographers get left out?

I don't think that that is fair. I am reading the comments. I have written hundreds (thousands?) of words in reply. Just because I don't agree with the comments does not mean that I am ignoring them. To cite just one example, I have a new simulation in the next draft which deals with (I hope) the (correct!) criticism that I may not assume that the difference between two normals is normal.

Again, just because I have not responded to each of the points above does not mean that I don't welcome the comments and take them seriously. I hope to handle many of these issues in the next draft.

I look forward to continuing the conversation.

Kane writes: "This is how science should be done." But his approach is precisely how science should Not be done. He has circulated a error-riddled, false argument which has been trumpeted in the right-wing blogosphere as "proof" (where is Popper when you need him?) that the Lancet Study has been categorically de-bunked. That paper has now, in effect, been withdrawn but the popular impression of it remains. If Kane were truly concerned about science, he would work hard to correct the mistaken impression that the original paper succeed in its goal. It did not; it failed. Kane is desperately trying to salvage a shred of credibility by revising the paper. But until that time when a successful analysis is presented, one that can stand professional scrutiny, Kane's "larger project" has failed. He should take responsibility for that failure and report it to Malkin, et al. and urge her to report the truth of the matter. Lacking that commitment, the only conclusion left is that this is all a matter of ideology, not science.

"I have a new simulation in the next draft which deals with (I hope) the (correct!) criticism that I may not assume that the difference between two normals is normal."

Which means you've ignored the criticism that you don't have two normals to difference in the first place.

Pete,

I have confirmed with Elizabeth Johnson that the estimates for CMR post and pre are normally distributed. This is a fact. Now, perhaps the Johnson (and the authors and the Lancet peer-reviewers) were stupid to use a normal distribution, but that's what they used.

I thought SG's objection number 7 is by far the most important one. It's the point I've been trying to stress--

"and worst of all:
7) Sheer colossal craziness: in that you insist the model should conclude there may have been a reduction in deaths as a direct consequence of finding a town where everybody was slaughtered"

Of course, my opinion is partly biased by the fact that I can't follow some of the more technical points, but David has yet to give any rational reason why the discovery of one cluster with a huge number of deaths increases the probability that the overall death rate might have lowered because of the war. When asked, he talks about the variance increasing and then uses a mathematical model which implies clusters with resurrections. There's no actual logical argument given for why Fallujah's partial destruction should be correlated with amazingly happy outcomes elsewhere. It all seems to come from a superstitious belief in normal distributions.

David, how would you respond to SG's 2) at #206?

The assault on standards: it has been repeatedly observed that the three steps of: data description (normal distribution) - basic results (log linear regression) - improved results (robust estimation) are a standard approach to this type of data, followed in a standard way in this paper. You have argued against this standard approach by arguing against robust estimation AND the presentation of basic results.

Do you dispute that this is a standard approach? Or do you dispute that L1 follows this approach? On the other hand, if you agree with SG's assertions, analogous to your comment in #214, it may be that this standard approach is in your view "stupid". But if so your quarrel is with that standard not (at least not in particular) L1.

(Hopefully it goes without saying that what is an imperfect but not unreasonable model in one context may be wholly unreasonable in another context. So it's not a priori obvious that the standard approach making inconsistent distributional assumptions for different purposes is "stupid".)

I dispute SG's description as it applies to L1. It is true that most empirical papers begin with a data description: this many observations, this is the mean, this is the variance, this number missing and so on. But L1's discussion of the crude mortality rate post-invasion is not merely a "description." It is a model. They assume (reasonably!) a specific parametric model. There is no avoiding this fact.

Now, if all they did with the CMR was just to estimate it, that would be one thing. But they then use this CMR (as least they say they do) to estimate the excess deaths, the most quoted part of the paper. So, how they estimated the CMR (i.e., what "model" they used) matters. There was a lot of confusion above about just how they did this. Now, we know. (Actually, I am still trying to replicate the exact results. Suggestions welcome!)

The unknown is how this estimate of the CMR led to the 8,000 lower bound. Again, no one knows. Robert reports (correctly) that he gets close to this estimate with a bootstrap. I also (in my paper) get close without using a bootstrap. Who is right? Me? Robert? Neither? No one knows.

Let me try to give Donald some intuition on this. Let us say we have a town with a 100 people in it. We want to estimate the mean weight. We sample 5 people with weights (140, 150, 160, 150, 150). The mean is 150. The CI's are whatever (exercise for the reader). We use a normal model. Totally standard.

We come back a year later. Same town and (we think!) same people. We take a sample of 5. The first 4 or (145, 145, 180, 155). Looks like average weight may have gone up a bit., although probably by not enough to be statistically significant.

At this point, calculate some CI for the average weight. They will be whatever. You can use those CIs to estimate the probability that the average weight is below 130. This will be positive (after all, you are using a normal model and you have only sampled a few people). Call that probability 5%.

Now, I show you the 5th person. Weight is 300! What is the probability that the average is less than 130. For almost all reasonable models, it will be greater than 5%! In other words, seeing an outlier on one side "spreads out" the posterior distribution, makes it more likely that you really don't understand this town you are visiting. Who knows how many babies are in the town?

Now, of course, we can be sure that the average weight > 0. By definition, no one can have a negative weight. But outliers increase the resulting uncertainty about the mean on both sides (perhaps more on one side than the other, but almost always on both sides).

Anytime you get an "outlier" it means, sort of by definition, that you did not understand the data as well as you thought you did. Otherwise, you wouldn't perceive this data as an outlier.

The above is just my attempt to help Donald with his intuition. Others could no doubt do better. Give it a shot!

David asks:

The unknown is how this estimate of the CMR led to the 8,000 lower bound. Again, no one knows. Robert reports (correctly) that he gets close to this estimate with a bootstrap. I also (in my paper) get close without using a bootstrap. Who is right? Me? Robert?

Me. But that's besides the point. The real issue is, is there enough information to reproduce and extend the reported results? I can do that using a completely independent method, to within the precision allowed by the software. When a completely independent method can reproduce the results it means the results don't depend on any single method. So arguing that no one knows exactly how they did it is irrelevant, and a red herring. It no longer matters exactly how they did it.

Nice try, David, but didn't work. Finding that there is a 300 lb person in the second sample tells me absolutely nothing about the possible existence of 7-8 lb people known as babies. You're tacitly importing external knowledge there--you know there are such things as babies, children, and small adults.. Even knowing that, what on earth does the presence of a 300 lb person in your second sample tell you about the number of babies, or children, or small adults? Nothing. You now can suspect there may be a significant population of very large people in the town when you find one such person in a sample of 5, but it tells you nothing at all about the number of small people. You haven't found any and so you have no idea how common they may be. Now of course you know there are such things as small people, so you realize that in a tiny sample of only 5 you could easily miss them, but that's a bit of apriori knowledge that you had before sampling started, and the presence of the 300 lb person adds nothing to this knowledge.

Dsquared gave a better example either in this thread or the Crooked Timber one. He talked about sampling boxes of chocolates put out by a chocolate factory. If you happen to find a few with twice as many chocolates as there should be, then it leads you to suspect that quality control in that factory is shot to hell. People are perhaps being very sloppy and so there may be some boxes with only half the correct number.

You see, in dsquared's example you can come up with a plausible reason why an outlier on the high side might also imply the existence of outliers on the low side. There's a logical connection. Your body weight example doesn't really do this. And you haven't done it with Fallujah either. You keep invoking mathematical models which assume that the presence of outliers on one side proves the existence of outliers on the other, but you've not supplied any reason for believing this in the case of Fallujah.

BTW, David, you're acting as though this objection is just my statistically untutored view, but it's my impression that many or most people here share it.

re #214

You're still confusing estimates with confidence intervals for those estimates.

In #217 you demonstrated that you know how to calculate a sample mean: sum the observations and divide by the number of observations. This is how the CMR estimates were calculated! No need for a model here.

The sample standard deviations were then reported as "confidence intervals" of +-1.96*ssd. This is probably a bad idea, but it's not uncommon.

The estimates were used to calculate RR. The "confidence intervals" were only used to summarise the data, then discarded (along with any normality assumption).

There are three numbers for CMRpost(w/ Fallujah):

1.4: lower bound, assumes normal, not used to find RR
12.3: estimate, does not assume normal, used to find RR
23.2: upper bound, assumes normal, not used to find RR

David, are you claiming

a) that 12.3 was calculated assuming a normal distribution

or b) that 1.4 and 23.2 were used to calculate RR?

Actually, on second thought, I'll concede that finding a 300 lb adult would increase the chance that some might weigh 130 lbs. That's just a little less than the smallest person found anyway. I'm conceding this because we know something about the variability of human sizes and if you found a population with the weights you suggest and then the 300 pounder, yes, it would seem reasonable to think that if there are people that big there might be more people at the smaller end. But I'm not at all sure that this increases the chance that the average adult in the population weighs 130 lbs. That is, maybe there are more 130 lb people than you originally would have guessed (though none that small have been found), but there are clearly more 300 lb people than you first knew about. So though the CI would widen, it's not clear that the lower bound would drop. Someone would have to do the calculation.

Also, I don't think human adult weights are normally distributed, so you shouldn't do the calculation with that assumption. Height is, from what I've read, but then that means weight wouldn't be, if weight varies with the square of the height (according to the BMI if you assume everyone had the same index) or cube (if people were geometrically similar).

And none of this has anything to do with Iraq and Fallujah.

Donald,

David Kane`s analogy fails even more if you assume that there is some prior knowledge about the town which can inform you about the 300lb weight vs. the probability of babies. For example, suppose it`s an Oil Rig. You know that the weight distribution will be different to a young town (no babies, very few women, etc.)

We have information about Iraq: it`s a war zone. Therefore we expect that the probability of the 300lb-equivalent is higher than the probability of the baby-equivalent.

Also, I would like to mention that using David Kane`s lancet data it is very easy to construct a few linear models. I just ran one in R, your bulk standard generalised linear model with person-months as offset and a poisson family, and the final estimate of the RR is 2.46, with extremely high significance - p<0.0001. Even taking into account the estimated design effect of the clusters (29 including fallujah, I seem to recall) doesn`t lower the significance much.

I didn`t take into account the sample design, because I only know how to do that in SPSS and I don`t have a license (isn`t that embarrassing... the former I mean, not the latter... but I`m willing to defend SPSS in this regard). But I would imagine that a large part of the sample design variance-inflation is offset by the high correlation between pre- and post-samples in death rates, so it`s probably not that important.

So, using David Kane`s preferred method (report parametric confidence intervals and never do any robust estimation) the death-rate including fallujah is highly significantly different from 1. I think this puts a bit of a hole in his claim that the authors didn`t report the basic estimates because they weren`t significant.

With regard to 300 pounders. If you just found one, you just know you found one. If you found one and he is wearing a fat club jacket, that tells you there are likely more. We know that there were more cases of towns being leveled than Fallujah. We do not know where the zombie armies are being assembled.

SG,

1) Please post the R code (in the comments are elsewhere) so that we can all learn something.

2) And what confidence interval did you find? I'll bet that the lower bound was way less than the 1.6 that L1 reports . . .

I suspect that that p-value does not mean what you think it means.

Pete, I am claiming a) but not b). Not sure what that matters.

Donald, I tried to help you by giving you some intuition because you asked for help. In most/all parametric statistical models, an outlier on one side of the mean flattens out the distribution on both sides, at least a little. The proper reaction from you is thanks, even if you don't find my analogy helpful. I agree that dsquared's analogy is better.

Robert, you obviously know better than to use a phrase like "within the precision allowed by the software" in such a twisted fashion. Do I really need to call you out on that? The precision of R and Stata extends to many decimals. You can't even reproduce their result to within a thousand.

It matters because you don't need to make any assumption about the distribution to calculate the sample mean.

Only 1.4 and 23.2 assumed a normal distribution, and we seem to have agreed that those numbers are not used to calculate RR.

David Kane charged:

Robert, you obviously know better than to use a phrase like "within the precision allowed by the software" in such a twisted fashion. Do I really need to call you out on that?

Yup, you do. I used a bootstrap. Anyone who actually knows anything about the bootstrap would have understood the point but in your case I'll make it explicit: as I've said many times, I did this independently of Roberts. I did this without knowing their random seed, nor the number of replications, nor which bootstrap CI was being reported, and we know they used Stata and not R, which may or may not use the same algorithms for calculating the various CIs (as I'm now sure you were never aware, the original "bootstrap" package produced and contributed by Hastie and Tibshirani, the guys who wrote the book, used a different algorithm and even they now recommend using something else. I've never used Stata so I don't know which algorithms they use). Plus, if you had bothered looking at the code that I posted, you'd see that despite all of these known differences I can bracket all of the values reported by Roberts, including the "10% under 44000" that you say was reported by Burnham and that never appeared in the paper.

Once again, here's the bottom line, which you still haven't addressed: different researcher, different method, different software, different assumptions, and yet I produce results that are entirely consistent with the reported results and can extend the reported results in ways that don't require contorted intuition and appeals to parametric models that obviously don't apply. This is a much stronger result than replication by running the exact same code on the exact same software. It means the reported results don't depend much at all on the method or software or assumptions used. You keep whining, "nobody knows." To put it mildly, that's a red herring. Anybody who actually understands the estimation methods involved would have been able to do all of this from first principles without knowing exactly what anyone else did.

Though I guess it's true that "first principles" does mean that one must know how to calculate a CMR. Do I really need to call you out on that? Ooops. I guess I just did.

David Kane asked of SG:

Please post the R code (in the comments are elsewhere) so that we can all learn something.

David, sufficient information has been supplied so that anyone who knew what he was doing would be able to analyze the data in this way. Why don't you show you know what you're doing by posting some code and discussing the results? If you need help figuring out how to calculate a CMR, just ask.

"the proper reaction from you is thanks, even if you don't find my analogy helpful. I agree that dsquared's analogy is better."

Good lord. I asked for an explanation as to why finding a lot of dead people in Fallujah would lead one to think that some clusters have extremely low death rates. What's the causal connection? I think the proper response to your example was to point out that it didn't answer my question. I'll give you points for trying, if you want. I'll give you more points for acknowledging that dsquared's example was better, because it was better precisely because he gave a logical explanation in his case why positive outliers might suggest the existence of negative outliers.

And it isn't just my question--most if not all of your critics here seem to have the same question.

David, a standard glm in R is 2 lines of code (one for the model, one for the summary), or 4 lines if you include an inefficient programmer like me to construct the correct data set to use. I don`t think I need to post something so simple.

Also for the record, the lower bound of the 95% confidence interval was 1.8. So in fact I think the p-value means exactly what it is meant to mean, i.e. that we can reject the null hypothesis of equal death rates before and after the war.

I think you don`t know how to run this simple statistical model, David, and I wonder what that says about your paper...

Also David, after 226 comments, saying "I`m not sure what that matters" about the CI/point estimate distinction really shows you haven`t been listening.

A suggestion for David:

I have been involved in industrial/ marketing studies where we often find the lower CI bound on the number of products failed or returned by the customer is less than 0 because of the smallness of the sample.

To get around this (and avoid having to report negative lower bounds to mystified project managers) we go to the proportion failed or returned (=q), which is usually assumed to follow a normal distribution (normal appromximation to binomial):

- Transform q to logit(q)= log(q/(1-q)).
- Approximate the variance of q from the Delta Method = var(q)/(q(1-q))^2
- Assume logit(q) normally distributed and find upper and lower 95% CI.
- Transform back to find the 95% CI on q, which cannot now be less than 0.

You can find more detail in Chp 2 of Meeker & Escobar: Statistical Analysis for Reliability Data, (Wiley).

I am wondering if you applied some such transformation to the raw numbers or to the differences in proportions before and after, you would also get a lower bound greater than 0. This would get you around the embarrassment of the "zombie army" objection.

However, based on my own crude calculations, you would probably not get a very different CI for Relative Risk than that published in the original paper.

Instead of the body weight example, here's an alternative simple analogy that I think is closer to what we're talking about. Let's say you've got a big bag of bills and you're trying to estimate the average value. You pull out the first couple of dozen and they're all ones and fives. Then you pull out another bill and it's a hundred dollar bill. Should your lower confidence interval on the mean value of a bill in the bag go up or down? Not that it's impossible to imagine strained statistical assumptions under which the lower confidence bound goes down, but intuitively I think it should go up.

Re SG's #206 part 2 (see also my #216), it seems that we have a simple dispute about what is and is not usually done in the literature and, relatedly, whether L1 is modeling the CMR as a normal or simply given descriptive data in a standard way.

David Kane (#217):

I dispute SG's description as it applies to L1. It is true that most empirical papers begin with a data description: this many observations, this is the mean, this is the variance, this number missing and so on. But L1's discussion of the crude mortality rate post-invasion is not merely a "description." It is a model.

Compare to pete (#221):

In #217 you demonstrated that you know how to calculate a sample mean: sum the observations and divide by the number of observations. This is how the CMR estimates were calculated! No need for a model here.
The sample standard deviations were then reported as "confidence intervals" of +-1.96*ssd. This is probably a bad idea, but it's not uncommon.
The estimates were used to calculate RR. The "confidence intervals" were only used to summarise the data, then discarded (along with any normality assumption).

So it seems like we have a simple dispute about what is or is not a common convention in the literature. According to pete, the "confidence interval" on the CMR should not be taken literally, but really should be read as (what in finance would be called) a quotation convention for stating the mean and (implicitly the) standard deviation. Alternatively, according to David, it should be taken as a serious model.

I don't know the epidemiology literature at all, so I don't pretend to know whether David or pete and SG are right.

David Kane`s friend, if our dispute with David boils down to just this then it really is rather a trivial matter to rest a "critique" of the lancet on, eh? I think this is just part of it though.

However, I think my understanding of the epidemiology literature is fairly good, and I would say that there is a quotation convention for presenting these point estimates, and it doesn`t mean anyone really believes they reflect reality. As a hint, we almost always present symmetric CIs for death figures where the point estimate is large enough to be approximately normal, even though we know this means that there is a small chance of deaths <0. Usually the CI lower bound is so far from 0 that it`s irrelevant, and nobody takes the matter seriously.

Except, it would seem, David Kane, who thinks we should never use parametric confidence intervals for data presentation and we should only use parametric models for data analysis. This, I think, is flaunting convention.

I suspect many scholars learned a bit about original code they have developed from watching the vulgar assaults of M&M a few years back. They will have learned from MH and Co to stiff creeps who are quite clearly intent only on destruction and have not the slightest intention of contributing to development in a field, and lack the qualifications to do so. That seems to be what Mann did, eventually, and I noticed that a number of the scholars that Mcintyre gave his summons to after that (give me your code) did not even answer the mail. Since Mcintyre, to my opinion, operates like a ratchet, only relaxing his grip on the exterior for a moment to get a better grip, that seems a good course. The real scholars can always work out a way to exchange information away from the hostile eyes of completely destructive and malicious predators. I can still recall his atrocious attempt to egg on his tiny band of fellow rotters to play on perfectly innocent work of the tree scientist in Arizona as though it were secretive wrong doing.

David Kane writes:

I am reading the comments. I have written hundreds (thousands?) of words in reply. Just because I don't agree with the comments does not mean that I am ignoring them.

So you say, David Kane, but the evidence indicates otherwise. Please see my comment from July 28 (number 179) which I repeated on August 1 (number 207). I am still waiting for a response.

As someone who has no dog in this hunt, but has read all the comments and followed as well as I can with limited knowledge (one graduate-level course in probability and statistics), I would like to follow-up on Cyrus Pinkerton's comment #179, as follows.

I can see three possiblities (maybe there are more):

1. The Lancet authors (a) used a model based on the normal distribution to determine whether mortality in Iraq in 2004 was higher or lower than it was pre-invasion; (b) didn't like their results because of their political bias; and (c) tried other models until they got the answer they wanted.

2. (a) as before, but they didn't like their results because they didn't make intuitive sense, so (b) they decided the initial model was flawed (this happens to me when I make computer models of mechanical structures; sometimes I'm right and sometimes the computer is); and (c) they decided to use a more robust model. They didn't mention their earlier, flawed model in their paper - nor do I mention my original fumblings in my final reports.

3. They were smart enough to use the robust model right away, but mentioned "normal" statistics for the two data sets as a matter of standard practice.

It seems to me that either 2) or 3) is at least as consistent with the facts as 1). If so, the gentlemanly/womanly thing to do would be to give them the benefit of the doubt.

I find it interesting how Kane's original argument has morphed.

What ever happened to his "disproof by incredulity" argument (no survey done by Iraqi surveyors standing on their heads in a bathtub while juggling 6 chain saws has ever had such a high response rate, yada yada yada) including speculations about possible fraud? (which he never retracted, as far as I can tell).

I love how some people think they can say whatever they please without backing it up with evidence -- and, then, after they are called to the mat on their claims, simply skip along to something else (like Skippy the Bush Kangaroo) as if they never made the original claim to begin with.

David Kane on Lancet confidence intervals

More like this

The Lancet paper on the Iraq War toll

What are the adverse effects of vaccines?

The Iraq Study - how good is it?

The Lancet Report - Criticizing my Criticisms

Scienceblogs is shutting down

June 2017 Open Thread

March 2017 Open Thread

January 2107 Open thread

December 2016 Open Thread

There's no science behind denying climate change (Synopsis)

Eureka! It's a Book!

Ruin On An Islet