Before I get to the meat of the post, I want to remind you that our
DonorsChoose drive is ending in just a couple of days! A small number of readers have made extremely generous contributions, which
is very gratifying. (One person has even taken me up on my offer
of letting donors choose topics.) But the number of contributions has been very small. Please, follow the link in my sidebar, go to DonorsChoose, and make a donation. Even a few dollars can make a
big difference. And remember - if you donate one hundred dollars or more, email me a math topic that you'd like me to write about, and I'll
write you a blog article on that topic.
This post repeats a bunch of stuff that I mentioned in one of my basics posts last year on the margin of error. But given some of the awful rubbish I've heard in coverage of the coming election, I thought it was worth discussing a bit.
As the election nears, it seems like every other minute, we
hear predictions of the outcome of the election, based on polling. The
thing is, pretty much every one of those reports is
utter rubbish.
What happens is that they look at polls, and they talk about the results and what they mean. But they, like almost everyone, use the margin of error as if it means something very different than what it really does.
What you hear is that, for example, Barak Obama is leading Florida by 5 points, but the margin of error is +/- 4%, so it's really not a significant lead. What the journalists seem to think it means
is that the margin of error is a total measure of the accuracy of the polls - that the poll result is within the margin of error of the "true" result that the poll measures. So, by that interpretation,
the poll is predicting an outcome of 52/48, and the margin of error means that the range of actual voter preferences ranges between
48/52 and 56/44.
The thing is, that's not what the margin of error means. The margin
of error is a statistical measure of the probabilistic size of errors caused by unintentional sampling errors.
Polls - and much of statistics in general - are based on the idea of sampling. Given a large, relatively uniform population, you can
get an amazingly accurate measure of that population by looking at a small subset of it, called a representative sample. A sample
is a randomly selected group that is intended to be a microcosm of the
entire population. In an ideal representative sample, the
sample must have the same distribution of differences as the population as a whole.
There's a big problem there: how can you be sure that your sample is representative? The answer is, you can't! The only way to know for certain that a sample is representative is to measure the entire population, and compare the results of doing that to the sample. But once you've measured the entire population, what's the point of looking at a sample?
Fortunately, we can assess how likely it is that our sample is a good representation of the population. That's what the margin of error does - it measures the likelihood of the sample being representative of the population. It's computed by combining a
bunch of factors, the primary ones most commonly being the size of the population and the size of the sample. Given those, we can assess how
certain you can be of your measure being pretty close to accurate. Typically, we describe that certainty by stating how large an interval
you need to define on either side of the measured statistic to be
95% certain that the "actual" value is within that interval. The size of that interval is the margin of error.
So when you hear a pollster talking about a "poll of likely voters showing that Obama is ahead by 8 points with a margin of error of +/-4%", the big thing you should do is realize what they're measuring. In that case, the population isn't "the set of people who are going to vote next tuesday" (even though that's what the journalists try to make you think); the population is "the set of people who the poll believes are likely to vote next tuesday". So the margin of error
is a measure of how well their poll matches the population of people
who they believe are likely to vote - which is quite a different thing
from the population of people who actually do vote. In fact, it actually does slightly less, even, than that: it measures how much
sampling error is contained in their poll due to unintentionally
selecting a non-representative sample. That's not really saying very much in an election poll.
The population being sampled by polls is likely to be quite different from the actual population of voters for a number of reasons,
and this difference produces measurement errors that almost certainly significantly outweigh the unintentional sampling errors measured by the margin of error. For example:
- Intentional Sample Bias
- Intentional sample bias covers a variety of techniques that
pollsters use when they select people for the sample. For an extreme
example, some polls (like, I think, Zogby) try to get an equal number of
people who self-identify as republicans and democrats. But in most
states, the number of party members in the two major parties are not
equal. They are, in fact, often pretty dramatically uneven. A less
dramatic but still significant one is that many polls do their polling
through phone calls, and only call land-lines. Many younger people no
longer have land-lines; the exclusion of cell-phone numbers therefore
excludes some portion of the population from the sample. These kinds
of sample bias produce a significant mismatch between the population
of real voters, and the population being sampled. - Unknown Population
- The biggest of polling errors leading up to an election is the fact
that the real population is unknown. No one is sure who's going
to vote - which means that no one is certain of what the correct
population to sample is. Pollsters try to identify a sample of people
who are likely to vote. But since the population is unknown,
they don't know if they're including people in the sample who aren't in
the actual population of voters, and they don't know if they're
excluding people from their samples who are going to vote. In
this election, this is likely to be a significant effect, because huge
numbers of people registered to vote for the first time, but no one
knows how many of those newly registered voters are likely to show
up and vote. Once again, there's a problem related to the fact that the
population that they're sampling isn't the same as the population that
the poll is trying to measure - so that error factor is outside the
margin of error. - Phrasing Bias
- You can get significant differences in polls based on how the
question is phrased. "Who are you going to vote for?" will
likely generate different results from "Are you going to vote for Obama or McCain?", which will likely generate different results from
"Do you plan to vote for McCain or Obama?", which will generate different results from "Do you plan to vote for a Democrat or a Republican in the presidential election?". This is a well-known problem,
but it still has a significant effect. - Dishonest Answers
- People aren't entirely trustworthy. They don't necessarily answer
questions honestly. A frequently discussed version of this is called
the Bradley effect. The Bradley effect is a phenomenon where people
are reluctant to admit to being racist. So when a pollster asks them
if they're going to vote for a black man, they'll say "yes", but when
it actually comes to voting, they'll vote for the white guy. I've heard
some people speculate on a reverse Bradley effect this year in some southern states, where people are reluctant to admit that they're going to vote for a black man, so they lie and say they're voting McCain. But the truth of the matter is, we don't know if the people answering the
polls are answering honestly. If they're not, that skews the poll results, and once again, it's not covered by the margin of error.
- Log in to post comments
"95% certain"? How Bayesian of you.
Here's a frequentist/sampling theory rephrasing: "...how large an interval you need to define on either side of the measured statistic such that 95% of intervals in hypothetical repetitions of the poll would contain the "actual" value."
It turns out Bayesian intervals more-or-less coincide with the confidence intervals for polls with large sample sizes, so both statements are correct.
A great explanation, but I always thought Yes, Prime Minister did it the best: http://www.youtube.com/watch?v=2yhN1IDLQjo
Can someone please comment on the "Probability of Leading" concept...that if Candidate A leads Candidate B in a poll by 52% to 48% with a margin of error of say, 2%...then what is the actual probability that Candidate A is ahead or leading in the polls?
That was very interesting.
However, for the severely math-deficient like myself, would you apply your comments to the example? Does an eight-point lead with a +/-4% margin of error mean anything much about what is going to happen? Does it mean Obama really is likely to win, or not, or nobody knows?
I'm curious as to your take on Nate Silver's methods - see http://www.fivethirtyeight.com
In case you haven't seen it, Peter Norvig's US Election FAQ discusses polls in detail. among other pertinent topics: http://norvig.com/election-faq.html#polls
He includes links to a few sites that give averages of various polls.
Re: #3 & #4
There's no mathematical way to know what the real election result is going to be. The best we can do is look at the polls, understanding what the limitations of the sampling method is, and from that make an educated guess about how well the polls reflect reality.
Another commenter pointed to fivethirtyeight.com, which is a site where a guy who is very clued to statistics tries to combine results from all of the available polling, including factoring in weights based on the sampling methods used by the various polls. I think his method looks very good, and that he's likely the most accurate predictor. But we won't know for certain until his method is put to the test, by seeing how well it really matches the final results next week.
Another source of error is that these polls are self-selecting, they do not include the responses of people who refuse to take the poll. I have always thought that they should include that number to judge the magnitude of the error. After all, if they call 2000 people but only get 1000 responses, that has to add to the margin of error.
Dewey vs. Truman, 1948. Dewey's predicted victory arose from telephone polling that selectively excluded Truman's lower class support. A vast number of young adults with little hope of economic ascension are Obama's natural constituency by choice and by counter-reaction. His campaign makes a fetish of collecting cell phone contacts (be first to know his VP choice!). McCain's caterwauling is trench warfare compared to blitzkrieg. Your point about land line polling may be prescient.
Minor comment: You're missing tags around "Unknown Population".
Mark,
In addition to 538, you might be interested in www.electoral-vote.com. It's run by A. Tanenbaum, and in previous elections it has proved to be very accurate. He also posts a daily, very interesting news summary.
Uncle Al (haven't seen him in a while) mentioned Dewey vs. Truman. I seem to recall hearing that that problem what was made George Gallup set up shop to minimise this problem of unrepresentative sampling. I just thought that happened a lot earlier.
Quite a few pollsters are sampling cell phones. Nate at 538 has looked specifically at this effect.
By the way, one major phrasing bias is "McCain, Obama, Barr, Nader". There will be some 3rd party votes, and some might be conservatives rather than liberals. Then there is the phrasing bias when the ballot itself has a very poor user interface. One example is one where voting a straight ticket does not result in a vote for President, and another is where the candidates for one office are spread over two columns (leading to a multiple selection mis-vote).
Re 1948 polls
Although it is true that there was a sampling bias in Gallups' polling that hear, the biggest factor in the polls failure was that his organization stopped polling several weeks before the election under the impression that Truman was too far behind to catch up. He thus missed the late surge of Democratic voters. He did not make the same mistake in 1968 where polls up thru the 3rd week in October showed Nixon with close to a double digit lead. However, in his final poll, taken the day before the election, he detected the surge toward Humphrey and accurately predicted that the race would be close. In fact, the race was decided in Nixons' favor by some 50,000 votes in his home state of California.
There's another error often made when talking about polls. When the results are "within the margin of error", then people assume that the actual number has no meaning. For instance, a poll result asking whether a particular ballot measure will pass: 51% in favor (2% MOE) is treated the same as 49% in favor (2% MOE); but actually, if there are no systematic errors in the poll (of the sort described in Mark's post), the former indicates a much greater chance that the measure will pass.
I believe the poll which was distorted due to reliance on the telephone was actually the Literary Digest poll of 1936, which predicted a victory by Alf Landon.
"...if there are no systematic errors in the poll (of the sort described in Mark's post), the former indicates a much greater chance that the measure will pass."
How do you KNOW???? If I flip an unbiased coin or use an unbiased random number generator set to 50%, and flip it enough times so that my MOE is 2%, then there will be MANY instances where I will get heads 51% of the time. (Let's say between 50.5% and 51.5%).
So if you get a result that shows a 51% result with a 2% margin of error, HOW DO YOU KNOW the true result?
Let's look at it this way. I create a program that generates TWO trials of 1000 coin flips each. In the first trial, I've set the random number generator to land on heads 50% of the time in the first trial, and to land on heads 52% of the time in the second trial. I run the program, and BOTH trials spit out results that show that heads came up 51% of the time. If I do not label the results, then HOW are you going to identify which result was generated by which trial?