Skip to Content
Advertisement
Math

The Devil in the Polling Data

The same problem that caused the 2007 financial crisis also tripped up the polling data ahead of this year’s presidential election.

The devil in the data that left election forecasters with egg on their facean amazingly prescient article in which he predicted exactly how Trump would win in excruciating detail. Both of these reasons are ultimately related to the well-documented enthusiasm gapput it back in August, “If people could vote from their sofa via their Xbox or remote control, Hillary would win in a landslide.”

Featured Video

Other factors like the inability to contact rural voters have been proposed, but it seems to me that good pollsters should have been able to overcome those kinds of problems.

So even the best of the pollsters have a lot to learn. How about the modelers?

I think modelers need to make some changes too.

Advertisement

Consider a hypothetical state that had numbers similar to Michigan this year. The raw polls showed about a 3.5 percent edge for Clinton. I’ve tried to reverse engineer two simple models with predictions and behavior similar to the FiveThirtyEight and PEC models using the same kinds of tools they used. Imagine that Model 1 predicted a 70 percent probability of Clinton winning and Model 2 predicted a 99 percent probability. Here is how these predictions would have to be modified in the presence of systematic correlated polling error:

With correlated error of:0%1%2%3%4%
Probability of Clinton win:
Model 170%65%59%53%47%
Model 299%95%84%63%37%

Advertisement

The actual correlated error for Michigan turned out to be four percentage points. If Model 1 had known and taken into account this magnitude of correlated error, its prediction of Clinton winning would have changed from 70 percent to just 47 percent, and Model 2’s prediction would have changed from 99 percent to 37 percent. Both models would have predicted a Trump win in this hypothetical scenario. What’s interesting is how large the swings in the probabilities are with very small changes in the correlated error.

Some readers here defended Nate Silver’s forecast, which had the probability of Clinton winning at 71.4 percent, on the grounds that it should not surprise anyone that about a one in three chance materialized. Technically, that is correct. I also agree that Silver’s model had some built-in defense against correlated error, while the other models had much less or none. But remember how large the swings in probabilities were in the models above. The modelers knew about the Brexit fiasco, which had a correlated error of four points, in an election with a similar “enthusiasm gap.” As I argued last month, it is extremely misleading to state such a potentially fragile probability to one decimal place: It implies that you are confident about the accuracy of the prediction to the precision stated. Most people are not deeply familiar with the technical details of a probability and tend to think of it as a “score” of the race. They are easily misled by the falsely stated precision. As I recommended then, probabilistic election forecasts should be dispensed with altogether and replaced with the seven-point qualitative scale already in wide use. If probabilities have to be stated, they should include a hedging statement that shows how much they would change in the presence of, say, a two or four percent correlated error as “margins of error.” If forecasters had done this, the potentially large error swings would have discouraged people from taking the forecasts as gospel truth. It would have saved the entire field of election forecasting from public embarrassment.

Hopefully, further research will identify the causes of correlated polling errors and find ways to detect them, and the modelers will build on the lessons learned from this humbling experience.

Advertisement

Lead image: Lucy Reading-Ikkanda for Quanta Magazine

Advertisement

Stay in touch

Sign up for our free newsletter

More from Math

Explore Math

Ancient Math Hidden in Oldest Known Floral Pottery

Bowls from the Halafian culture of Mesopotamia suggest people used art to enumerate their rapidly changing world

December 15, 2025

When Monsters Came for Mathematics

Adam Kucharski’s 3 greatest revelations while writing <i>Proof: The Art and Science of Certainty</i>

June 13, 2025

The Mathematical Mysteries of Fireflies

What blinking bugs reveal about synchrony in the universe

May 2, 2025

When Work Is Play

Patchen Barss on his 3 greatest revelations while writing <i>The Impossible Man: Roger Penrose and the Cost of Genius</i>.

November 25, 2024

Why Physics Is Unreasonably Good at Creating New Math

The secret sauce is the real world.

September 3, 2024

The Elegant Math of Machine Learning

Anil Ananthaswamy’s 3 greatest revelations while writing <i>Why Machines Learn.</i>

July 23, 2024