Saturday, December 23, 2023

A menace to democracy

 

 


        Source: Wikipedia

 

As the Presidential race comes down to the wire, the wire heats up. This week, The New York Times reported on its poll with Siena College. “Mr. Trump leads Mr. Biden 46 percent to 44 percent among registered voters. Among those deemed likeliest to vote, however, Mr. Biden actually edges Mr. Trump, 47 percent to 45 percent.”

Both statements are lies.  The Times itself reports at the bottom of the story that the margin of error in the poll for registered voters is plus or minus 3.5%.  We don’t know whether former President Donald Trump leads President Joe Biden or if Biden leads Trump. It’s basically a dead heat.

As for those likeliest to vote, they comprise a subsample of registered voters. Since the margin of error is inverse to sample size, it is even larger in absolute value than 3.5%. (It’s 3.7%. Why didn’t the Times story note this?) So the race among likely voters is also basically a dead heat.  

But: What is the margin of error?

Any poll is just a sample of the population that interests us. In this case, we want to know about likely voters in next November’s election. We can’t question them all, so we will question a thousand or so. We should choose them at random to get an accurate picture of all likely voters, and of course we should note their responses correctly. But even if we do, and these conditions are by no means givens, the sample will, to some extent, mislead us. No sample perfectly mirrors its population. In the case at hand, many deviations can creep in. We probably want to know the average intentions of the likely voters over a week or so. The opinions they state on a given day may vary from their average for a week. I saw a Fox News attack on Biden this morning, so if you poll me now, I will say that I favor Trump. Had you polled me at any other time this week, I would have named Biden. All kinds of chance events may lead me away from my “true” opinion.

How to read a sample

Suppose that Trump and Biden are in a dead heat. It is extremely unlikely that the poll, being just a sample, will register 50.0% for Biden and 50.0% for Trump. Instead, the results will be influenced by a few random factors, a few errors. How large are these accumulated errors likely to be? A difference of 1% between the candidates? Or 10%?

To answer this question, we assume that each of the two candidates receive 50% of the vote. If we take a lot of samples, their average vote share for a given candidate will also be 50%. But in reality, we take only one sample, so its average is not likely to be exactly 50%. What range of values is likely?

We can visualize the range by looking at the distribution of probabilities associated with certain outcomes. Define the outcome as Biden’s share of votes in a race against Trump. In the figure above, concentrate on the green curve. I’ll explain the others later. If there really is a tied vote, what is the probability that in the sample, Biden receives no more than, say, 47% of the vote? This probability is the area beneath the green curve from 47% leftward to 0% (not shown in the graph). You can see that there is virtually no area beneath the green curve in this range; it is almost on top of the horizontal axis. Since there is virtually no area beneath the curve, the probability that Biden receives no more than 47% of the sample vote when the actual race is a tie, is virtually zero.  You can why this is useful to know: If Biden did receive 47% of the sample vote, then our assumption that the real vote was tied, is probably wrong.

A fling on the green

Next, move to the right on the green curve. What is the probability that Biden receives no more than 50% of the sample vote in a tied race? You can see that at 50%, half of the area beneath the curve is to the left, and the other half to the right.  By convention, all probabilities sum to 100%. So, the area to the left of the 50% point is 50%. That is, given an actual tie in the population, the probability that Biden receives up to 50% of the sample vote is 50%.  This cumulative probability exceeds that of a sample vote up to 47% (remember that this probability was close to zero!), because we have added tcomes: The one in which Biden receives 48%, 48.5%, 49%, and so forth, up to 50%. We could keep adding outcomes by moving right on the curve, until we have accounted for all possible outcomes in the sample for Biden’s share of the vote, from 0% to 100%. (The figure shows neither extreme.)  The cumulative probability of all outcomes is 100%. This is the total area beneath the green curve. If we are measuring in fractions rather than percentages, then the total area would sum to 1.       

A simple example is a coin toss. Only two outcomes are possible:  Heads, which has a 50% chance; or tails, which also has a 50% chance. The probabilities of the two outcomes sum to 100%. Expressed as fractions, the probability of a head is .5 and of a tail .5. These sum to .5 + .5 = 1.

With this background, we can talk about the likelihood of random errors in the poll sample. First, we assume a tied race. Biden would receive 50%. This corresponds to the middle point in the graph. Because we take only a sample of likely voters, we probably won’t observe an average of exactly 50%.  But the sample average should not be too far from 50%. So, to infer whether the candidates are tied, we look at whether our sample average is close to 50%.

The 95% confidence interval is the range of sample averages that have a 95% chance of occurring if the race is tied. In the figure, the confidence interval is the green horizontal line beneath the figure, from 48% to 52%.  The area beneath the green curve that corresponds to this line is 95%—47.5% to the left, from 48% to 50%,; plus 47.5% to the right, from 50% to 52%.

If the sample average lies in the confidence interval, then by convention, we accept that the race may is a dead heat. If the sample average lies outside of the confidence interval, then we reject the notion of a dead heat. For example, suppose that Biden’s share is 47%. Since this is outside of the confidence interval, we reject the possibility that Trump and Biden are tied. In particular, Trump is leading.

The margin of error is one half of the confidence interval. That’s why we usually express it as, in our example, plus or minus 2% as compared to the mean, which is 50%.

For instance, suppose that Biden receives 49% in the sample. The confidence interval ranges from 48% to 52%. Since 49% is in this range, we consider the race tied. Another way to say this is that the confidence interval consists of Biden vote shares that are within 2% of the mean of 50%, on either side. The margin of error is therefore plus or minus 2%. Now, 49% is only 1% away from 50%, so it is in the margin of error. Thus, if Biden receives 49% of the sample vote, we accept that the race is tied.

Suppose instead that Biden receives 53% of the sample vote. Since 53% is outside of the confidence interval, we don’t consider the race tied. We conclude that Biden is winning. Another way to say this is that 3% is greater than the margin of error of 2%, so we accept that Biden is ahead.

Throwing a curve

Now look at the other curves in the figure.  You will see that the margin of error increases as the sample size decreases. This should make sense. There is less information in a smaller sample, so the chances for error are greater.

In the Times Siena poll of 1,016 respondents, the margin of error was plus or minus 3.5%. Since 2% is less than 3.5%, we should accept that the race is tied for both registered voters and likely voters. To say instead that Trump is leading is simply wrong. But that’s what The Times did.

Using the 95% confidence interval is a conservative approach. It treats the race as too close to call unless chances are better than 95% that it is not that close. The reason for this caution is to avoid the sort of costly mistake that The Times made. We don't want to say that Trump is leading, or that Biden is leading, without good evidence.   

The margin of error is calculated on the assumption that the pollsters computed the sample average correctly. But in reality, pollsters err in noting, inputting, and tabulating responses. They also often do not take a random sample. For example, minor polls with increasing frequency these days gather responses via online invitations, because this is easy and cheap. But this technique can enable a respondent to bias the results by organizing his friends to submit responses.

The upshot: Pollsters should calculate the change in the expected value of the reported sample average due to errors in selecting the sample and processing the data.  They should thus expand the conventional margin of error. But they rarely do. Siena College didn’t. Indeed, one may question whether its stratification of the sample was truly random selection.

However, things are bad enough as they are. The Times reporters knew about the margin of error. It is at the bottom of their news story. Indeed, one of them claims 15 years of experience in polling, although only God knows how this is possible without stumbling once across a confidence interval. Yet they and their editors chose to ignore the margin of error, probably in hot pursuit of a headline saying Trump led Biden.  

This is truly fake news, and it may reshape the Presidential election. All news media follow The New York Times, unfortunately, and arrogant lies like this one propagate ad infinitum…especially in an era so divisive that small margins in both Houses of Congress have become common, suggesting that Presidential margins may become small, too. In such circumstances, ignoring the margin of error may lead commonly to error in publication.

“Democracy dies in darkness,” says a rival of The Times that has also descended into mediocrity. Isn’t it time that reporters learned how to turn on a statistical flashlight?—Leon Taylor, Baltimore tayloralmaty@gmail.com

 

 Notes

For helpful comments, I thank but do not implicate Annabel Benson, Paul Higgins, Mark Kennet, and David Schatz.

  

References

Shane Goldmacher, Ruth Igielnik, and Camille Baker.  Trump’s Legal Jeopardy Hasn’t Hurt His G.O.P. Support, Times/Siena Poll Finds - The New York Times (nytimes.com)  December 20, 2023.

Margin of error - Wikipedia

No comments:

Post a Comment