Will One Small Shift Fix the Polls in 2022? (Wonkiness: 8/10)

https://www.nytimes.com/2022/11/02/upshot/polls-2022-midterms-fix.html

Version 0 of 1.

The great polling misfire of the 2020 election wasn’t just about how Trump supporters were less likely to respond to political surveys.

It was also about the failure of pollsters’ usual statistical adjustments to fix the problem.

After all, some demographic groups — like Hispanic voters — invariably respond to surveys at lower rates than others. Usually, pollsters just adjust for it, most often by “weighting” respondents from underrepresented groups to represent their share of the population. In 2020, the problem was that weighting didn’t do the trick. Even if a poll had the right number of Republicans or working-class whites, it still understated Donald J. Trump’s support against Joe Biden.

But this cycle, one weighting technique that didn’t do the job in 2020 might just be a little more powerful this time around.

That technique is called weighting by recalled vote choice. That term is a fancy way to say having the right number of people who say they voted for a candidate, like Mr. Trump or Mr. Biden, in the last election.

Not every pollster weights on recalled vote. The Times/Siena poll doesn’t. But based on Times/Siena data, weighting on recalled vote seems a lot likelier to shift a poll toward Republicans than two years ago, even if Trump supporters are no more likely to take surveys.

What’s changed? In 2020, Times/Siena respondents showed more voters reporting they voted for Mr. Trump four years earlier than the actual 2016 result. Now, our respondents are likelier to report voting for Mr. Biden than the actual 2020 result. As a consequence, weighting on recalled vote would now shift Times/Siena polls toward the right, since we would need to give additional weight to Mr. Trump’s 2020 supporters to match the 2020 tally.

If that’s a little confusing, here’s a concrete example:

In our final poll of Pennsylvania in 2020, voters who recalled backing Mr. Trump in 2016 outnumbered those who recalled backing Hillary Clinton by four percentage points, even though Mr. Trump won Pennsylvania by less than one point in 2016. If we had adjusted our poll to match the 2016 result, we would have needed to give more weight to Mrs. Clinton’s former supporters, shifting our already-too-Democratic poll result further to the left.

This year, the pattern is reversed: In our recent Pennsylvania poll, voters who said they recalled voting for Mr. Biden outnumbered those who backed Mr. Trump by four points, compared with Mr. Biden’s actual one-point victory. If we had adjusted our poll to match the 2020 results, we would have given more weight to the voters who said they backed Mr. Trump, shifting our results to the right (if you’re curious, John Fetterman would have led by three points in our recent Senate poll of Pennsylvania, 48 percent to 45 percent, rather than by 5.5 points).

Nowadays, many pollsters weight on recalled vote choice. If their underlying data looks similar to ours, the decision to weight on past vote might do a lot to shift those polls to the right compared with the last cycle. This effect of recalled vote weighting might wind up improving the accuracy of the polling averages, even as the underlying data quality remains unchanged.

If you’re a reader of this newsletter, you probably know that I remain concerned about the possibility that the polls — including our own polls — are still biased toward the Democrats.

So you might be surprised that we’re not weighting on recalled vote. It’s certainly tempting: In fact, weighting on recalled vote would have moved seven of our eight recent House and Senate polls to the right, by an average of around two percentage points:

Senate races:

Pennsylvania: D+6 (our result) —> D+3 (result if we had weighted by recalled vote)

Nevada: Even —> R+3

Georgia: D+3 —> D+2

Arizona: D+6 —> D+9 (notice the exception here)

House races:

Kansas 3: D+14 —> D+12

Pa. 8: D+8 —> D+6

Nevada 1: Even —> R+3

N.M. 2: D+1 —> R+1

If I had to bet, I’d guess those recall-weighted numbers come closer to the final results than our reported numbers. I imagine a lot of you are thinking the same thing.

Nonetheless, I’m not convinced that this is a good practice — at least for us.

The biggest reason: There’s longstanding evidence that voters are less likely to recall voting for the losing candidate, and more likely to recall voting for the winner (this is one of my earliest polling-nerd memories).

The shift in our data is consistent with this pattern: Mr. Trump won the 2016 election, thus he outperformed the final result on recalled 2016 vote in 2020 polling; Mr. Biden won the 2020 election, thus he’s the one now outperforming. This suggests weighting on recalled vote will bias a poll against the party that won the last election, all else being equal.

This isn’t just a theoretical proposition: The partisanship of the people who refuse to tell us whom they supported last time around offers evidence that this is playing out in the Times/Siena poll. In our recent wave of congressional polling, nearly 10 percent of validated 2020 voters didn’t tell us whom they supported last time around. As a group, these voters are registered Republicans by a two-to-one margin, 48 percent to 25 percent. They disapprove of Mr. Biden by an even greater margin, 61 percent to 26 percent. This is certainly consistent with the possibility that an important and disproportionate sliver of Trump 2020 voters would prefer not to recall or divulge their vote.

I would find it hard to embrace something that would have unequivocally made our results even worse in 2020, no matter what our 2022 data showed. This evidence makes it very hard for me to justify weighting on recalled vote — even if I think the results look better that way.

There’s also an important practical challenge: What’s the right target? It’s easy enough to say that it should match the 2020 election, but that’s really not quite so clear. It’s entirely possible that Mr. Trump ought to lead on recalled vote with the likely electorate, if Republicans enjoy the usual midterm turnout advantage. Or maybe it’s the other way around, if Democrats benefit from demographic change or an influx of new registrants. And what about the voters who don’t seem to provide accurate information — like the folks who won’t tell us whom they supported or those who say they voted, even though they don’t have a track record of doing so. It’s messy.

Nonetheless, pollsters have been using recalled vote more and more over the last few years, and it’s easy to see why.

Initially, these pollsters tended to have less-than-industry-standard data collection practices. For these pollsters, recalled vote choice was a bludgeon: a way to hammer even the worst data into the ballpark of the actual results. You could probably get a plausible poll result for West Virginia using a sample of New York City residents this way.

But increasingly, it’s not just lower-quality pollsters weighting on recalled vote. Many reputable ones are also employing the measure.

Part of the justification: Some pollsters believe the measure is less biased than it used to be. They think partisan polarization and even Mr. Trump’s election denial campaign mean that voters are much likelier to say whom they supported the last time around. You can literally see the signs of this if you drive around rural America right now (the Trump signs are still everywhere).

But the bigger factor is that many pollsters remain deeply concerned about overstating Democratic support. The Democratic-leaning recalled vote results look like a sign of a recurring problem; weighting looks like a solution.

Or put differently: As reputable pollsters have grown to doubt their data, they’ve taken up the crutches used by the pollsters who never had much cause to trust their data in the first place.

I think this is understandable, especially if a pollster is finding a larger gap on recalled vote than what we found here. It’s easy for me to stand on principle when the effect is two points, but I might be singing a different tune if we found the Democratic Senate candidate Tim Ryan up three points in Ohio with Mr. Biden leading the recalled vote (Mr. Trump won the state by eight points).

Interestingly, our friends at Siena College are among those higher-quality pollsters using recalled vote in 2022. They’ve been using it in their state polling with Spectrum News in states like Ohio, Wisconsin, Florida and Texas (they would like us to weight the Times/Siena polls this way, too, even if I’m stubborn about it). And they’re not alone. The CNN/SSRS polls in Wisconsin and Pennsylvania do it. There are Ipsos polls weighting by it. Just about every campaign pollster I talk to nowadays seems to be doing it as well. And that’s on top of the pollsters who have been doing it for a while, like Emerson College.

Unfortunately, this methodological debate is hard to resolve. It’s entirely possible that recalled vote will help cancel out a Democratic bias. It just won’t be clear whether that choice yielded a representative sample — whether a hypothetical perfect poll of America would show no bias on recalled vote — or whether it created a new, rightward bias that canceled out other biases.

And even if it is unbiased this time, there will be no way to know whether it will be unbiased in the future. After all, it would have hurt the Times/Siena polls in 2020.

What is fairly clear, though, is many pollsters using recalled vote weighting might show more favorable results for Republicans in 2022 than they did in 2020, even if their underlying data remains just as biased toward Democrats. It tends to reduce the risk of another 2020-like polling error.