Why the Polls Were Wrong
A myriad of possible explanations.
David Leonhardt notes that, while the polling aggregates got 48 of 50 states right in the Presidential race, they were sufficiently off in other contests that we should rightly question their utility.
Indeed, even with the Presidential contest, the polls were wildly off by degree if not direction. Here’s his compilation based on the NYT aggregates:
While many of these are nearly spot-on, some—notably Wisconsin, Ohio, and Iowa—were massively wrong. And all in one direction. That’s a systematic error.
And, yes, it matters. The press are misleading the public with false expectations:
In response, polling firms are asking whether they need to accelerate their shift to new research methods, such as surveying people by text message. And media organizations including The New York Times, which financially support and promote polls, are re-evaluating how they portray polls in future coverage. Some editors believe the best approach may be to give them less prominent coverage, despite intense interest from readers and despite the dominant role polls play in shaping campaign strategies.
And, because the same errors impacted internal polls conducted by the campaigns and parties, it impacted strategy:
This year’s misleading polls had real-world effects, for both political parties. The Trump campaign pulled back from campaigning in Michigan and Wisconsin, reducing visits and advertising, and lost both only narrowly. In Arizona, a Republican strategist who worked on Senator Martha McSally’s re-election campaign said that public polling showing her far behind “probably cost us $4 or $5 million” in donations. Ms. McSally lost to Mark Kelly by less than three percentage points.
Mr. Biden spent valuable time visiting Iowa and Ohio in the campaign’s final days, only to lose both soundly. Democrats also poured money into races that may never have been winnable, like the South Carolina Senate race, while paying less attention to some of their House incumbents who party leaders wrongly thought were safe. The party ended up losing seats.
“District-level polling has rarely led us — or the parties and groups investing in House races — so astray,” David Wasserman of the Cook Political Report, a nonpartisan publication that analyzes races, wrote last week.
Of course, at least in terms of the Presidential race, this is exacerbated by the anachronism of the Electoral College. The polls led us to believe for months that Biden would win and he did. But the fact that, once again, the state-by-state allocation could have handed a major popular vote loser the election by shifting a handful of votes in three states, magnified the polling errors.
So, what went wrong? Well, lots of things.
- People’s decreasing willingness to respond to polls — thanks partly to caller ID — has reduced average polling response to only 6 percent in recent years, according to the Pew Research Center, from above 50 percent in many polls during the 1980s. At today’s level, pollsters cannot easily construct a sample of respondents who resemble the population.
- Some types of voters seem less willing to respond to polls than others, perhaps because they are less trusting of institutions, and these voters seem to lean Republican.
- The polling industry tried to fix this problem after 2016, by ensuring that polling samples included enough white working-class voters in 2020. But that is not enough if response rates also vary within groups — for instance, if the white or Hispanic working-class voters who respond to polls have a different political profile than those who do not respond.
- This year’s polls may have suffered from pandemic-related problems that will not repeat in the future, including a potential turnout decline among Democratic voters who feared contracting the coronavirus at a polling place.
- A much-hyped theory that Trump supporters lie to pollsters appears to be wrong or insignificant. Polls did not underestimate his support more in liberal areas, where supporting Mr. Trump can be less socially acceptable, than in conservative areas.
- In what may be the most complex pattern, polls underestimated the support of multiple Senate Republican candidates even more than Mr. Trump. This means the polls missed a disproportionate number of Americans who voted for both Mr. Biden and a Republican Senate candidate — and that the problems do not simply involve Mr. Trump’s base.
In the 2016 cycle, I still had a landline phone because it was included in my Internet bundle and I had small children and liked the assurance of 9-1-1 service. Because Virginia was still considered a swing state then, I got so many calls from pollsters a day that I finally just turned off the ringer.
Now that I’m mobile-only, I answer zero calls from people not in my contacts list. Indeed, once iPhone allowed auto-rejection of such calls, they all go straight to voicemail. I would expect that this is the norm for people younger than me.
This means we now have a no-no: the self-selected sample. Rather than a randomized group of people weighted to assured demographic representativeness, we have people who either can’t figure out how to avoid unwanted calls or just really, really want to share their opinions with pollsters. They’re unlikely to match the attitudes of the population as a whole.
Interestingly, I would have expected this to lean in the other direction: older, less technologically savvy people who would be more pro-Trump. But, of course, this is offset by the “shy Trump voter” phenomenon.
While I had long understood that to be some sort of conspiracy by Trump supporters to fool the pollsters and “prank the libs,” it’s actually something more basic: in certain swaths of the country, people fear being socially ostracized if they admit to voting for Trump.
Nothing under the post except the picture.
@MarkedMan: Not sure why it published early!
You include the following quote:
Then at the end you say:
You just endorsed the shy-Trumper theory right after including data that debunks it.
@Kylopod:
I noticed that as well. It appears to be a self-contradiction.
I’ve been saying for a while that the pandemic was going to cause problems for polling this year. I wasn’t sure which direction the problems would go or which party stood to benefit more, but it seemed obvious to me it was a major wild card.
As mentioned, Dems are likelier to take the pandemic seriously. This probably affected the race in multiple ways. First of all, it affected the campaigns themselves: Biden did a lot less on-the-ground campaigning than Trump, such as the door-to-door canvassing that is usually such a traditional part of getting the vote out.
I also believed that the massive increase in vote-by-mail was going to be a big problem for pollsters coming up with reliable turnout models. And on top of that, there was the active Republican sabotage of VBM, both DeJoy’s engineered slowdown of mail on top of all the restrictive, Simon-says technicalities that almost certainly caused many ballots to be rejected for stupid, trivial reasons. It’s amazing to me–Dems were sounding the alarm about these matters when they were first reported, but then it was like everyone forgot about them. I simply do not trust the results that we have. I believe it was part of Trump’s attempted coup. Just because it wasn’t sufficient to keep Trump in power doesn’t mean it didn’t succeed in suppressing the Democratic vote.
A queer conclusion insofar as I understand from reading articles that the data indicate that in very Democratic areas, where one should expect Shy Voter, there is not such an effect. Whereas the polls are quite significantly off in very Trumpy geographies, where the hypothesis of fearing “social ostracism” seems rather far-fetched as an explanatory hypothesis for the Why.
@Kylopod: @Lounsbury: I actually misread what Leonhardt wrote. But his argument makes no sense: both of his links are to pieces written before the election. There’s definitely some support for the theory in post-election surveys.
@James Joyner:
It makes plenty of sense. Among other things, Republicans as a whole overperformed, not just Trump. Are you going to say there were shy Collins supporters and shy Graham supporters and shy (generic R House candidate) supporters as well?
The article you link to presents no such evidence. It simply reports a survey of Trump supporters claiming they’re afraid to express their views to friends. It never mentions anything about their lying to pollsters. Indeed, you’re using evidence from a poll to prove that they’re afraid to divulge their opinion to pollsters? Do you even realize what you’re saying?
One thing that seems almost universally misunderstood is “margin of error.”
Margin of error is about the repeatability of the poll, has nothing to do with the poll’s accuracy.
Margin of error merely says if the poll were repeated with the exact same methodology (i.e., impossible unless done simultaneously) the results would be 95% likely to be within the “margin of error.”
I.e., margin of error is basically just about the sample size.
The polling for Dewey v. Truman was conducted by telephone, at a time when a significant number of voters did not have telephones – voters who skewed Democratic.
One thing that I’m wondering about is the degree to which polling models are adequately accounting for the incumbency factor. We know that in 2012, polls also appeared to underrepresent Obama’s support (see for example: https://www.theatlantic.com/politics/archive/2013/06/gallup-explains-how-it-messed-2012-presidential-polling/314613/). I also
To some degree, the same thing seemed to happen here as well. So perhaps, these types of presidential elections are a different beast than when both challengers are not incumbants.
If that is the case, then things become even more difficult to figure out as you opportunities to tune your models only come every 8 years.
I’ve seen it suggested that Americans are more poll-savvy than they used to be, and that may be affecting polling.
I think it’s also hard to poll a cult. One of the dangers that we fall into is the “shy Trump” voter, and by the same token, they won’t come out to vote if Trump is on the ticket. We can’t keep telling ourselves that – GOTV is more important than ever.
I would opine, though, if the USPS hadn’t been systematically dismantled and voter suppression hadn’t been systematically implemented, the results might have been closer to what some polling suggested.
Why should I care? I understand how the polls can be useful for the campaigns, but they don’t matter to me. Before the game there are lots of people willing to explain that Big U should beat State, but we know that those opinions are for entertainment only. Polls are the same.
I haven’t been fixated on this…but I never watched independent polls, only aggregators like 538.
If memory serves, 538 was pretty close…within 3%, I think. And they regularly included a “what if there is 3% error” forecast.
They also predicted the so-called “red-mirage”.
+/- 3% seems like a pretty good score when you are predicting the future and humans are involved.
In Florida you can bet on jai-alai…it never makes sense to me to bet on anything that can read and understand the tote board.
I wonder what relevant, non-polling data people like Google and Amazon might have.
It also seemed to me that the Biden campaign knew exactly what was going on, and where they could win. If you look at their focus the last few days that is fairly obvious.
@Slugger:
Maybe for you, but they affect people’s behavior. People like to support winners, look who supports which sports teams, attends which sports events, watches which events on the TV machine.
@Daryl and his brother Darryl:
I noticed that as well. I think their internal polls were better than the aggregate of the public ones. And it wasn’t just the last few days, either. Back in July one of his advisers told the press what sounds pretty prescient now:
“Texas is 22 fucking media markets. That is never going to happen. It’s just not going to happen. Everyone knows that. I don’t know why people are still even talking about it…. Georgia is real and that’s a decision this campaign will have to eventually make but not until we feel really comfortable about the six core states — and we are going up in Nevada tomorrow just to make sure since it’s a state that gets squirrelly in a recession.”
I’m going to call BS on that. For one thing, look at the turnout for Biden.
Yes, the trump pandemic has caused myriad problems and disruptions, but we cannot blame all failures on it.
@Slugger:
Oh, it matters a great deal. Systemic problems tend to point to systemic issues (obvious, yes?). In this case, if there’s a reliability problem with data gathering, this affects political polling, yes, but also other surveys used to shape policies and determine where resources are employed and to what end. I mean in things like welfare, medical assistance, etc.
Now, as the polls are eventually tested against elections, much of the other data can be judged against the census (but not all the data). The problem with this is the census takes place every ten years, not every four, and in the meantime resources and effort may be misdirected.
Another consequence of bad polling: misallocation of resources.
https://www.nytimes.com/2020/11/13/nyregion/election-2020-nyc-donations.html
…
I actually predicted the pollsters would overcompensate for 2016 and that Trump and the GoP would be slightly over-represented in polls. NOPE!
I don’t think there is a “shy” Trump voter phenomenon in the sense of people being embarrassed about Trump support. In my experience Trump supporters are not at all afraid. It may appear that way in elite bubbles but I don’t think it extends to the general population.
If anything, I think open political support this year was much more muted generally, a consequence of our toxic political climate.
I do think people on the right are much less likely to trust elite institutions which include news organizations and pollsters, so my guess is that is how the “shy Trump” voter is actually manifesting.
Of course, apropos of the other thread about the election confirming everyone’s priors, I would think this because I’ve been harping on the importance of the elite-pleb divide in America as a more important and relevant axis than the left-right divide.
But anyway, the failure of polling isn’t just going to affect elections and election strategy. It calls into question the accuracy of polling on everything.
I think the reality is that political organizations will have to go back to their roots and quit relying so heavily on big-data. Grassroots organizing and getting out and actually meeting people will have to make a comeback.
@Michael Reynolds:
This comment needs to be in neon, with blinking added. Big data analysis is the only real answer to the polling issue.
It’s going to be tough to figure out what data points will track to voting intent, but if we’re hoping for any kind of campaign analysis in the future, this is the way to do it.
I responded to exactly one poll this year. It was an Associated Press poll, and I received a card in the mail, with a scratch-off panel (like a lottery scratch ticket) that revealed an ID number. I went to the designated website, plugged in the ID number, and answered the questions.
It got around the “ignore/block” phone number issue, but I couldn’t help but think of all of the other factors involved–I’m a) willing to participate; b) have internet access to respond; c) I value polling and see the benefit in having my opinion registered. This is unlikely to reflect the vast majority of people, who will chuck the piece in the recycling without really looking at it.
I expect, this being a census year, that there was a massive GOTV operation conducted by PACs such as the ones the Kochs run. I expect they flew under the radar, and used social media to find the most persuadable people – who aren’t necessarily the people we would ordinarily expect to vote – and push arguments at them that most of us would find a bit ridiculous.
All of this is made possible by the way Facebook worked. Republicans love to complain about Facebook in public, but it’s the best thing that ever happened to them. It was critical in Trump’s win, and in the Brexit vote. Why on earth would they walk away from it?
I think it matters quite a bit which states were big misses and how they missed. Geography can definitely be targeted in Facebook advertising. And by doing so, you can avoid having people snoop on you just by pretending to be conservative. If you don’t live in the right place, you don’t see the message.
I did a little phone banking before the election, I’m surprised they get that much. At that point any effort to “skew” the polls, i. e. “make them more accurate” by adjusting sampling or weighting means extrapolating way past the available data. With COVID and expanded early and mail-in voting, any history based likely voter screen is guesswork. As @charon: notes, margin of error is a function only of sample size and measures repeatability, not accuracy. It says nothing about systematic errors or even whether the question asked actually elicits the data sought.
With a 6% response rate it’s a miracle they’re as accurate as they were. As to systematic error, at that low rate it’s probably a coin flip whether they erred left, right, or backwards.
@Jen:
There could be privacy concerns, but I suspect Google could use 2020 election data in combination with their own data and come up with some interesting information. Whether it would be predictive is a different question. But polling clearly has issues, and I don’t see that getting any better.
I have a Chapel Hill, NC phone, area code 917. Marketing firms try me using 917 numbers, so I just blocked all 917 numbers. The defensive tech is far outstripping the ability of pollsters to use phones.
@Jen: That’s interesting. I read his comment and thought
I’ll stick with reading whatever people are saying and replying “hmm… that’s interesting.” The whole idea of shaping your image to public opinion–or more importantly, what you imagine public opinion is–seems dishonest to me. Obviously, your mileage will vary.
@Just nutha ignint cracker:
I’m not talking about shaping public opinion, I’m more talking about predictive analytics.
I’ve read two books in recent years that make me think that something like this is possible in the near future. One was Big Data, the other is Everybody Lies: Big data, new data, and what the internet can tell us about who we really are.
Predictive analytics using big data sets don’t have to be individualized, in many cases they are more useful if they are anonymized. In the Everybody Lies book, Stephens-Davidowitz shows how he could pinpoint racism based on anonymized Google searches in different regions of the country. He also showed how searches revealed increases in child abuse and neglect during the 2008 economic crash, even though reported cases went down.
I’m guessing that Google has enough search data it could crunch and layer over voter turnout and election results that would provide a better understanding of what is going on than polling can.
Google searches are a goldmine.
@Jen: Hmm… That’s interesting. 😉
On a more serious note, it will be interesting to watch how big data and it’s use evolves. I’m more skeptical about human’s being able to interpret what they’re seeing in the data when they have stakes in the outcome. We tend to use data to construct the reality we believe we see, at least that’s my take.
@Jen:
The problem I have with Google is they are not a transparent company and their data is proprietary. Who is going to compete with them when it comes to big data analysis? Pollsters can at least compete even though it’s now apparent the industry has systemic problems. Who competes with Google when it comes to data? Can they actually be trusted to put the public interest before their monetary and corporate interest? Right now the answers to those questions are not positive IMO.
@Andy:
If Google’s analytics proves itself we can just cut out the middle man – the voter – and ask Mountain View who ‘won.’
@Andy:
Agreed, 100%.
@Jen: We are already there and I suspect much of Trump’s campaign decisions were based on Big Data forecast. IIRC the Cambridge Analytica team only need 6 data points from their FB datasets to predict how susceptible the user was Liberal, Conservative, Libertarian, etc viewpoints. I’d imagine its not that far of a stretch to predict likely voters in that data set.
That might explain the Biden campaigns moves in the days leading up to the election. I thought Georgia was as much of a pipe dream as Texas