'How did Nate Silver predict the US election?' diff viewer (1/2)

This article is from the source 'guardian' and was first published or seen on November 08, 2012 12:54 (UTC). It last changed over 40 days ago and won't be checked again for changes.

You can find the current article at its original source at http://www.guardian.co.uk/science/grrlscientist/2012/nov/08/nate-sliver-predict-us-election

The article has changed 4 times. There is an RSS feed of changes available.

Previous version 1 2 3 Next version

Previous version 1 2 3 Next version

Version 1	Version 2
How did Nate Silver predict the US election?	How did Nate Silver predict the US election?
2012-11-08 14:10:14 UTC	2012-11-08 14:45:08 UTC (35 minutes later)
One of the surprises of the American presidential election was the attacks from the Republican side. Not that they were attacking Obama (hey, unless the airwaves were full of attack ads from both sides, how would we know there was an election on?), but rather that they were attacking a statistician, Nate Silver. But ~~now~~ Mr Silver is having the last ~~laugh,~~ having predicted every state correctly even as most media were saying that the race was tied (or that it may possibly be drifting ever so slightly in Obama's favour). But how did Mr Silver predict the presidential race so accurately? What was this dark magic that he used?	One of the surprises of the American presidential election was the attacks from the Republican side. Not that they were attacking Obama (hey, unless the airwaves were full of attack ads from both sides, how would we know there was an election on?), but rather that they were attacking a statistician, Nate Silver. But Mr Silver is having the last laugh now, having predicted every state correctly even as most media were saying that the race was tied (or that it may possibly be drifting ever so slightly in Obama's favour). But how did Mr Silver predict the presidential race so accurately? What was this dark magic that he used?
For the Nate-haters, here's the 538 prediction and actual results side by side twitter.com/cosentino/stat…	For the Nate-haters, here's the 538 prediction and actual results side by side twitter.com/cosentino/stat…
— Michael Cosentino (@cosentino) November 7, 2012	— Michael Cosentino (@cosentino) November 7, 2012
Now, I don't have any inside knowledge about Nate Silver's method, but an outline of the approach is fairly easy to guess at, since this is similar to the methods used by votamatic. It is also the same approach that has become widely used in statistics over the last 20 years: I have used similar ideas to look at scientific problems like divergent natural selection and cycling voles. So, although some of my outline is probably wrong (and I've simplified some of the process in my explanation for clarity's sake), I hope my discussion gives you a feel for the types of statistical models used and how they work.	Now, I don't have any inside knowledge about Nate Silver's method, but an outline of the approach is fairly easy to guess at, since this is similar to the methods used by votamatic. It is also the same approach that has become widely used in statistics over the last 20 years: I have used similar ideas to look at scientific problems like divergent natural selection and cycling voles. So, although some aspects of my outline are probably wrong (and I've simplified some of the process in my explanation for clarity's sake), I hope my discussion gives you a feel for the types of statistical models used and how they work.
The problem – choosing the US president – is a national one, but it involves voting at the state level (residents in each state vote for the candidate they support, and the winning candidate gets all of the state's electoral votes). The polls are also arranged at both state and national level, so one way or another both need to be taken into account. This makes the problem inherently hierarchical, and rather conveniently there is an area of statistics called hierarchical modelling.	The problem – choosing the US president – is a national one, but it involves voting at the state level (residents in each state vote for the candidate they support, and the winning candidate gets all of the state's electoral votes). The polls are also arranged at both state and national level, so one way or another both need to be taken into account. This makes the problem inherently hierarchical, and rather conveniently there is an area of statistics called hierarchical modelling.
It is also worth splitting the model into two parts: the process (i.e. the percentage of the population who intend to vote for Obama), and the sampling (how the polls are affected by the actual voting intention, and other factors). The mathematics (which I will not discuss in detail) allows us do this, and the separation of the model nicely reflects the separation of the processes that create the data we see.	It is also worth splitting the model into two parts: the process (i.e. the percentage of the population who intend to vote for Obama), and the sampling (how the polls are affected by the actual voting intention, and other factors). The mathematics (which I will not discuss in detail) allows us do this, and the separation of the model nicely reflects the separation of the processes that create the data we see.
Basically, we are trying to model an unobserved ~~variable,~~ ~~i.e.~~ the ~~actual~~ intended voting behaviour in each state. This unobserved variable is then used to predict the actual vote, which we do observe.	Basically, we are trying to model an unobserved variable: the intended voting behaviour in each state. This unobserved variable is then used to predict the actual vote, which we do observe.
It's also worth noting that although we are ultimately interested in how people will vote on election day, the data we get is based on how people think they will vote at the time they are asked, which may be months before the election. What people think changes over time, so this variable has to be incorporated into this model. This means we have to include a temporal component: in short, we must generate a time series.	It's also worth noting that although we are ultimately interested in how people will vote on election day, the data we get is based on how people think they will vote at the time they are asked, which may be months before the election. What people think changes over time, so this variable has to be incorporated into this model. This means we have to include a temporal component: in short, we must generate a time series.
To make things simple for this discussion, I am ignoring third party candidates, so in this model, only Obama and Romney are in the race, and whoever gets more than 50% of the vote in a state wins.	To make things simple for this discussion, I am ignoring third party candidates, so in this model, only Obama and Romney are in the race, and whoever gets more than 50% of the vote in a state wins.
Modelling voting behaviour	Modelling voting behaviour
First, we start with a mathematical model of how people in the US states will vote if the election were held on any particular day – this is what Mr Silver calls a "nowcast". There are many variables that affect this, so we identify these variables and use them to help predict voting behaviour. For example, race is a reasonable predictor of voting behaviour: Democrats tend to do better than Republicans among black voters.	First, we start with a mathematical model of how people in the US states will vote if the election were held on any particular day – this is what Mr Silver calls a "nowcast". There are many variables that affect this, so we identify these variables and use them to help predict voting behaviour. For example, race is a reasonable predictor of voting behaviour: Democrats tend to do better than Republicans among black voters.
So, let's start with a baseline for voting behaviour, for example, the percentage of the vote Obama would get on 1 January 2011. We then incorporate a variable for national voter behaviour (the overall mean percentage), plus another variable for each individual state. The latter variable includes the effects of race, wealth, etc. So, for example, a state with a larger proportion of black voters would have a higher Obama vote. The strength of this relationship has to be estimated; I'll explain how this is done below.	So, let's start with a baseline for voting behaviour, for example, the percentage of the vote Obama would get on 1 January 2011. We then incorporate a variable for national voter behaviour (the overall mean percentage), plus another variable for each individual state. The latter variable includes the effects of race, wealth, etc. So, for example, we can predict that a state with a larger proportion of black voters would have a higher Obama vote. The strength of this relationship has to be estimated; I'll explain how this is done below.
Once we have the baseline, we start the clock. As time goes on, voting intention will change. This might be because of something measurable, such as a change in employment. So, if the unemployment rate in a particular state falls, the incumbent tends to become more popular, so his share of the expected vote in that state goes up.	Once we have the baseline, we start the clock. As time goes on, voting intention will change. This might be because of something measurable, such as a change in employment. So, if the unemployment rate in a particular state falls, the incumbent tends to become more popular, so his share of the expected vote in that state goes up.
But national changes can also happen. For example, if the president doubles federal income tax, he's not likely to be popular.	But national changes can also happen. For example, if the president doubles federal income tax, he's not likely to be popular.
But there are also changes that we can't measure, such as less tangible effects on the economy, or a particularly successful campaign ad. We can include these as extra random terms (technically they are called "shocks", which seems appropriate).	But there are also changes that we can't measure, such as less tangible effects on the economy, or a particularly successful campaign ad. We can include these as extra random terms (technically they are called "shocks", which seems appropriate).
Based on this, we get graphs like this one that describe the national electoral vote, and we also get something similar to this for each individual state. The overall percentage of the vote for each state is the sum of national and state-specific effects. This number evolves through time, and if Obama receives more than 50% of the popular vote in a state, he would then win that state's electoral votes. We can thus add up the states that Obama wins at any particular time, and if this adds up to more than 270 electoral votes, he wins the presidency.	Based on this, we get graphs like this one that describe the national electoral vote, and we also get something similar to this for each individual state. The overall percentage of the vote for each state is the sum of national and state-specific effects. This number evolves through time, and if Obama receives more than 50% of the popular vote in a state, he would then win that state's electoral votes. We can thus add up the states that Obama wins at any particular time, and if this adds up to more than 270 electoral votes, he wins the presidency.
But the election is held on the first Tuesday in November. In August, say, there is still a lot of time for various events (like debates and superstorms) to occur. How do we deal with this? Since we have a model, we can simulate it forward in time starting from the present date. All of that uncertainty is treated as being random, so we use our model to generate a spread of possible percentages of the popular vote in each state, which might look like this graph.	But the election is held on the first Tuesday in November. In August, say, there is still a lot of time for various events (like debates and superstorms) to occur. How do we deal with this? Since we have a model, we can simulate it forward in time starting from the present date. All of that uncertainty is treated as being random, so we use our model to generate a spread of possible percentages of the popular vote in each state, which might look like this graph.
We can convert this into a probability that Obama will win each state by asking what proportion of the possible percentages are greater than 50% – how many of the possible lines end above 50%.	We can convert this into a probability that Obama will win each state by asking what proportion of the possible percentages are greater than 50% – how many of the possible lines end above 50%.
So we use these data to calculate the probability that Obama will win at each time, and how these data changed over time, which then gives us something like this:	So we use these data to calculate the probability that Obama will win at each time, and how these data changed over time, which then gives us something like this:
~~That's ~~great~~ a a model to describe voters, but how do we connect it to the data – the polls and (eventually) the actual election?~~	That's a great model to describe voters, but how do we connect the model to the data – the polls and (eventually) the actual election?
Modelling the polls	Modelling the polls
Let's start by examining the polls. There are two sorts of poll: national polls, which are the headline polls that a lot of the media were using to claim that the presidential race was tight, and state polls (e.g. a poll for Ohio). This neatly ties in with the model for voter intention, so the simplest way of dealing with the polls is to say that the poll result equals the corresponding voter intention. Thus, if a national poll on 23 October says that Obama would win 53% of the vote, then that is the national voting intention.	Let's start by examining the polls. There are two sorts of poll: national polls, which are the headline polls that a lot of the media were using to claim that the presidential race was tight, and state polls (e.g. a poll for Ohio). This neatly ties in with the model for voter intention, so the simplest way of dealing with the polls is to say that the poll result equals the corresponding voter intention. Thus, if a national poll on 23 October says that Obama would win 53% of the vote, then that is the national voting intention.
Theoretically, this means that all polls for the same state (or nationally) would all agree for any given day. But in reality, they are different, so we need to factor this in to our model.	Theoretically, this means that all polls for the same state (or nationally) would all agree for any given day. But in reality, they are different, so we need to factor this in to our model.
Variation between polling results from a number of factors. The first is sampling error. Pollsters necessarily don't ask everyone for their intentions, so instead they take a random sample. This leads to sampling error: they may, by chance, poll more Obama supporters than are represented in the general population. The good news is this sort of variation can be estimated, which is the reason that pollsters give a margin of error: typically about 3% for a sample size of 1,000 voters.	Variation between polling results from a number of factors. The first is sampling error. Pollsters necessarily don't ask everyone for their intentions, so instead they take a random sample. This leads to sampling error: they may, by chance, poll more Obama supporters than are represented in the general population. The good news is this sort of variation can be estimated, which is the reason that pollsters give a margin of error: typically about 3% for a sample size of 1,000 voters.
There are also other reasons the polls may vary. Although we ~~might~~ hope that our samples are random and voters answer truthfully, this isn't always the case. For example, if pollsters use phone surveys, they find that a lot of people don't answer the ~~phone~~ for whatever reason. Supporters for one of the candidates may have a greater tendency to do this. Or supporters may lie about who they will vote for (maybe they are embarrassed to admit whom they support: the "shy Tory effect") or they may lie about whether they will vote at all.	There are also other reasons the polls may vary. Although we hope that our samples are random and voters answer truthfully, this isn't always the case. For example, if pollsters use phone surveys, they find that a lot of people don't answer the phone, for whatever reason. Supporters for one of the candidates may have a greater tendency to do this. Or supporters may lie about who they will vote for (maybe they are embarrassed to admit whom they support: the "shy Tory effect") or they may lie about whether they will vote at all.
Pollsters are aware of these problems, and try to correct for them, but their corrections do vary. This is the "house effect". Pollsters, such as Rasmussen Reports tended to give Romney a couple percentage points more of the popular vote than did other pollsters. This doesn't necessarily mean that Rasmussen were wrong: it might be that the other pollsters were all making the wrong assumption to factor in their correction.	Pollsters are aware of these problems, and try to correct for them, but their corrections do vary. This is the "house effect". Pollsters, such as Rasmussen Reports tended to give Romney a couple percentage points more of the popular vote than did other pollsters. This doesn't necessarily mean that Rasmussen were wrong: it might be that the other pollsters were all making the wrong assumption to factor in to their correction.
So the mathematical model (so far) would look like this: the proportion of people saying they will vote for Obama is the actual percentage + the house effect + sampling variation.	So the mathematical model (so far) would look like this: the proportion of people saying they will vote for Obama is the actual percentage + the house effect + sampling variation.
Estimating everything	Estimating everything
The mathematical model I've described above is all very well, but we only have poll results, and we don't know either the house effects or the effects of the economy on the polls, or the actual percentage of the public who will vote for Obama (until the election anyway). Thus, we must estimate these numbers. This is the point where we stop being mathematicians and become statisticians. The problems that statisticians face is estimating ~~unknowable~~ unknowns.	The mathematical model I've described above is all very well, but we only have poll results, and we don't know either the house effects or the effects of the economy on the polls, or the actual percentage of the public who will vote for Obama (until the election anyway). Thus, we must estimate these numbers. This is the point where we stop being mathematicians and become statisticians. The problems that statisticians face is estimating knowable unknowns.
Suppose several pollsters poll a state, and they all ~~say~~ the percentage of the public who plan to vote for Obama is 53%. We ~~may~~ be sure that there is little variation in the house effect, so if the pollsters are unbiased, then the actual ~~percent~~ of voters supporting Obama is 53%. But what if the same pollsters all ~~say~~ that Obama's support is ~~now~~ 56% just one week later? If the pollsters are unbiased, then this is Obama's new support. But if the polls are all biased by, say, two per cent, then the actual voting numbers would be between 51% and 54%. Note that the change in Obama's polling support is the same, even if the actual support changes. So, even if there is an overall house bias in all pollsters, we can still look at changes in support.	Suppose several pollsters poll a state, and they all find that the percentage of the public who plan to vote for Obama is 53%. In this situation, we can be sure that there is little variation in the house effect, so if the pollsters are unbiased, then the actual percentage of voters supporting Obama is 53%. But what if the same pollsters all find that Obama's support is 56% just one week later? If the pollsters are unbiased, then this is Obama's new support. But if the polls are all biased by, say, two per cent, then the actual voting numbers would be between 51% and 54%. Note that the change in Obama's polling support is the same, even if the actual support changes. So, even if there is an overall house bias in all pollsters, we can still look at changes in support.
What about variation in the house effect? If one pollster is consistently polling Obama one per cent higher than the ~~rest,~~ then this would be their house effect, and with enough ~~polls~~ the average of a pollster's deviation from the overall average is their particular house effect.	What about variation in the house effect? If one pollster is consistently polling Obama one per cent higher than all the others, then this would be their house effect, and with enough polling data, we can calculate the average of a pollster's deviation from the overall average to be their particular house effect.
In reality, of course, things aren't this simple: there is random variation. Rather than simply plugging in the estimates to get the support for Obama, we also talk about the likelihood for this support. So, for example, if:	In reality, of course, things aren't this simple: there is random variation. Rather than simply plugging in the estimates to get the support for Obama, we also talk about the likelihood for this support. So, for example, if:
actual percentage + the house effect + sampling variation = 53%	actual percentage + the house effect + sampling variation = 53%
and (for the moment) we assume no house effect, then we can say that the sampling variation is random but is zero on average ~~and~~ more variation is less likely. But if we assume "more variation is less likely" we can put a probability on this, and then ask what actual percentage has the ~~most~~ probability.	and if (for the moment) we assume no house effect, then we can say that the sampling variation is random but zero on average so therefore more variation is less likely. But if we assume "more variation is less likely" we can put a probability on this, and then ask what actual percentage has the greatest probability.
If we look at several poll surveys, they each provide a different weight to this probability: those from better surveys count for more, so the most likely actual percentage is closer to their poll values. So we can use this most likely number as the estimate of the actual percentage. If we have several numbers to estimate, such as support at different times and house effects, we can do the same calculation, but using the data from all of the surveys: we end up maximising more variables, but the basic principle is the same.	If we look at several poll surveys, they each provide a different weight to this probability: those from better surveys count for more, so the most likely actual percentage is closer to their poll values. So we can use this most likely number as the estimate of the actual percentage. If we have several numbers to estimate, such as support at different times and house effects, we can do the same calculation, but using the data from all of the surveys: we end up maximising more variables, but the basic principle is the same.
But really, Mr Silver doesn't quite do this. Instead of looking at the maximum probability, he looks at the full range of probabilities. So, for each date, he could calculate the probability of support of 51%, 52%, 53%, etc. Then, for the following day he can use the model for the actual support to ask what is the probability that support has shifted from 53% to, say, 56%. ~~But~~ if the polling data suggest a 58% support for Obama, then it is more likely that the new support is 56% than 51%.	But really, Silver doesn't quite do this. Instead of looking at the maximum probability, he looks at the full range of probabilities. So, for each date, he could calculate the probability of support of 51%, 52%, 53%, etc. Then, for the following day he can use the model for the actual support to ask what is the probability that support has shifted from 53% to, say, 56%. If the polling data suggest a 58% support for Obama, then it is more likely that the new support is 56% than 51%.
The maths behind this is calculated using Bayes' Theorem, plus a healthy dose of graph theory. From this, he can then calculate the new probabilities of each level of support (in practice, I think Silver does something slightly differently, but using the same underlying idea).	The maths behind this is calculated using Bayes' Theorem, plus a healthy dose of graph theory. From this, Silver can then calculate the new probabilities of each level of support (in practice, I think Silver does something slightly differently, but using the same underlying idea).
If you're still with me, you'll remember that I mentioned hierarchical modelling above, so this is a good point to demonstrate its usefulness. If we look at a national poll, it will affect the national voting preference, which in turn affects voting preference in each state. But what about state polls?	If you're still with me, you'll remember that I mentioned hierarchical modelling above, so this is a good point to demonstrate its usefulness. If we look at a national poll, it will affect the national voting preference, which in turn affects voting preference in each state. But what about state polls? This is where hierarchical modelling comes in.