The Elusive ‘Likely Voter’
Where the science and art of polling intersect.
Adding to our long-running conversation of how public opinion researchers have adapted to changing realities (see “Is Political Polling Still Useful?” and “Political Polling in the Trump Era” for recent examples), the NYT weighs in with “Accurate Polls Hinge on a Tricky Question: Who’s Actually Going to Vote?“
In the flood of election polls you’ll see over the next few weeks, most polling groups will include responses from “likely voters.” And often from nobody else.
In theory, these poll numbers should yield more accurate results, since the people who actually vote are the ones who dictate the outcome on Election Day. But creating a precise picture of who will vote in November is a complicated endeavor.
After all, how exactly can a pollster know who is “likely” to vote, and who therefore will be the focus of their results? There’s no one right answer, and every polling firm has its own strategy.
After a long discursion into the obvious, they turn back to said strategies.
There are a few different ways to identify likely voters, and one is simply to ask poll respondents whether they plan to vote. But that question by itself has limitations; people are not very good at predicting their future behavior when asked about it in surveys. And because many people feel as though they ought to vote, they will often say they plan on doing so, even if they don’t.
Brian Schaffner, a political science professor at Tufts University, is a co-director of the Cooperative Election Study, an annual survey from Harvard and YouGov. When Professor Schaffner and his team compared poll responses from 2020 to voter records for that year, they found that 27 percent of those who had said they would “definitely” vote had not been recorded voting in those elections.
Polling groups have developed a few different ways of dealing with this problem. Some, like Ipsos, which often conducts polls with media partners like ABC News, The Washington Post and Reuters, use a formula that analyzes responses to a series of questions that go beyond a simple “Will you vote?”
Employing demographic information, asking respondents if they have voted before and inquiring as to the location of their polling place can help create a profile of a likely voter. (Younger voters tend to be less likely to vote, for example; people who know the location of their polling place are more likely.)
Amusingly, I’m not sure I know my polling place’s location because I’ve voted at three different places in the five years I’ve lived here. But I know how to Google.
Another approach is to examine voter file data derived from government records, which can tell pollsters if a respondent has actually voted in the past (though not whom they voted for). Since people who have voted before are likelier to vote again, this can help improve pollsters’ accuracy in identifying those who will cast ballots.
But this approach relies heavily on the accuracy of voter files. “The voter file itself essentially starts to immediately decay because people move, people die, new people age into the electorate who weren’t in it,” said Chris Jackson, a senior vice president at Ipsos Public Affairs. Some firms blend the two methods, both asking voters questions and examining their voter files.
Granting that computerization has made this sort of thing easier than ever, it’s certainly labor-intensive. And a bit concerning. Obviously, voter registration and turnout records are public information. But pollsters tying responses to particular individuals undermines the secret ballot. Who knows what they’re doing with that information. And, even if they’re not sharing that information, they’re subject to hacking.
Once a subset of likely voters has been identified, what is to be done with it? Some pollsters use a “cutoff” model, simply removing voters they judge least likely to vote from the total sample.
Other polls, including The Times/Siena Poll, use what’s known as a probabilistic model. Rather than eliminating low-likelihood voters from the sample entirely, pollsters combine the available data to estimate how likely each respondent is to vote, and their responses are weighted accordingly.
“We feel that, because we have the information from the voter file, we should be using it in our data to contribute what it can,” said Jennifer Agiesta, the polling director at CNN, which this year switched from a cutoff model to a probabilistic model for the first time.
This, ultimately, is guesswork. It’s all we have, of course, but largely useless in tight elections. And, in elections that aren’t tight, we don’t really need polling.
President Biden, while he was running for re-election, tended to perform slightly better in national polls among likely voters than among registered voters. But in the tight race between Vice President Kamala Harris and former President Donald J. Trump, pollsters are somewhat split on which candidate benefits among likely voters.
In a Times/Siena national poll published Thursday, Ms. Harris and Mr. Trump were even among likely voters; among registered voters, Mr. Trump led by one percentage point.
In recent state-level Times/Siena polls, it was a mixed bag. In some states, such as North Carolina, Mr. Trump did better among likely voters than among registered voters, while in others, such as Michigan, it was Ms. Harris who had the advantage among this key group.
This reverses longstanding trends but isn’t surprising, as there’s been a modest partisan re-alignment over the past decade or so, with college-educated whites shifting to Democrats and blue-collar whites shifting toward the MAGA Republicans.
What do these mixed results tell us about November? In part, they tell the same story that all polling has been telling lately: This is an exceptionally close race. When considering a specific state, some of the differences can be attributed to a candidate’s relative strength among segments of the state’s population that are more or less likely to turn out on Election Day.
For example, in the Michigan poll from August, registered voters under the age of 45 favored Mr. Trump overall. But the people in that age group who are most likely to cast a ballot, according to historical turnout records, are overwhelmingly college educated, politically engaged liberals. So the likely voter model will, by definition, discount many of the younger Trump-leaning voters.
Again, that college-educated folks are more likely to vote is not new—but a greater percentage of them are now voting Democrat.
In the weeks to come, not all polls will report both registered voter and likely voter results, but when they do, these numbers can offer context in terms of how pollsters are viewing the electorate.
Just how accurate they will be will depend on who shows up to vote. That’s still one of the biggest unknowns for pollsters, Professor Schaffner said.
“The reason you should care is because it does reflect that turnout matters,” he said.
This is all a long way of saying—or not saying—“Who the fuck knows?”
I’m with you on this feeling invasive, but I’m also curious as to how it could even happen. Most polling I’m familiar with randomly dials from a pool of numbers. Particularly with cell phones, there’s not always a tie back to the actual location (I have friends who have had their cell numbers for years, and after having moved extensively, it’s not at all uncommon for a friend to live in NY but have a cell number from NC, for instance).
Voter files, at least in our area, do not contain phone numbers. So, these polling organizations must be doing match backs based on purchased lists.
@Jen:
That would be my guess. And I’m highly skeptical that at least some of them aren’t in turn selling lists that match people with name, addresses, phone numbers, and voting and issue preferences. That would have to be more lucrative than peddling yet another poll.
These things remind me of the Robot Devil in the first Futurama series finale. To paraphrase: I will definitely probably vote.
Ann Seltzer uses the method of asking respondents if they intend to vote, and it turned out more accurate than a lot of other pollsters in 2016 and 2020, probably because the turnout models used by most polling organizations were insufficient to predict voter behavior in those cycles–in 2016 with the many infrequent voters showing up to vote for Trump, and in 2020 with the unexpected asymmetry between Republicans and Democrats in their willingness to leave their homes and knock on doors during the pandemic.
One polling miss that isn’t talked about much anymore, but which is still one of the biggest polling errors of the modern age, is the Michigan Democratic presidential primary in 2016, which Bernie Sanders won despite having trailed Hillary in most polls by around 20 points. Part of the reason for the failure is that Michigan hadn’t held a competitive Democratic primary in over a decade–since 2004. (In 2008 the DNC stripped the state of delegates due to some rule violations, the candidates all agreed not to campaign, and Obama wasn’t even on the ballot. In 2012, as Obama was running for reelection, there wasn’t a serious primary there.) This polling blunder has been practically forgotten, but after Trump unexpectedly won the election, many analysts at the time looked back on it as having been a warning sign of the anti-establishment populism that helped put Trump in the White House. Whether that was the reason, or the more mundane explanation of outdated turnout models for the Michigan Democratic primary electorate (which is what I think was the root cause), is up for debate.
All of which is a very long way of saying that in a close race, turnout matters.
And only one side has extensive investment in GOTV efforts…
@James Joyner:
As the rule goes, “if you’re not paying for a service, you’re not really the customer, you’re actually the product”
Part of the reason I don’t respond to polls is that I’m not interested in collaborating in an attempt by some big corporation to exploit and monetize my personal data
@SKI!:
I give Kamala 2 points for GOTV and a point for youth registration. Unfortunately I take away a point for embarrassed MAGAs and two points for unacknowledged sexism and racism.
@Michael Reynolds:
When have you seen MAGA be embarrassed by anything, no matter how outrageous or contradictory?
Since 2018, if you look at polling averages versus results, the averages in most races consistently underrate Democratic performance in the 2-5 point range.
This is a big reason why the “red wave” never manifested, particularly in the Senate, where you had absurdities like Fetterman losing to Oz in PA polling averages even though he won by FIVE POINTS. But even in races where Republicans won, Democrats outperformed and where Democrats won, they also outperformed (see e.g. PA in 2020, where polling averages had Biden at 51 but the actual result was 54)
A big part of this, if you dig into the crosstabs, is that in a big flip from most of my lifetime, consistent voters, the kind who vote in every election, even dogcatcher primaries, now lean Democratic whereas they used to lean Republican.
Democrats used to overcome this in presidential election years by boosting turnout with big GOTV efforts but now it’s Republicans who need to.