Source: Wikipedia
As the Presidential race comes down to the wire, the
wire heats up. This week, The New York Times reported on its poll with
Siena College. “Mr. Trump leads Mr. Biden 46 percent to 44 percent among
registered voters. Among those deemed likeliest to vote, however, Mr. Biden
actually edges Mr. Trump, 47 percent to 45 percent.”
Both statements are lies. The Times itself reports at the bottom
of the story that the margin of error in the poll for registered voters is plus
or minus 3.5%. We don’t know whether former
President Donald Trump leads President Joe Biden or if Biden leads Trump. It’s
basically a dead heat.
As for those likeliest to vote, they comprise a subsample
of registered voters. Since the margin of error is inverse to sample size, it
is even larger in absolute value than 3.5%. (It’s 3.7%. Why didn’t the Times
story note this?) So the race among likely voters is also basically a dead
heat.
But: What is the margin of error?
Any poll is just a sample of the population that
interests us. In this case, we want to know about likely voters in next November’s
election. We can’t question them all, so we will question a thousand or so. We
should choose them at random to get an accurate picture of all likely voters,
and of course we should note their responses correctly. But even if we do, and
these conditions are by no means givens, the sample will, to some extent,
mislead us. No sample perfectly mirrors its population. In the case at hand,
many deviations can creep in. We probably want to know the average intentions of
the likely voters over a week or so. The opinions they state on a given day may
vary from their average for a week. I saw a Fox News attack on Biden this
morning, so if you poll me now, I will say that I favor Trump. Had you polled
me at any other time this week, I would have named Biden. All kinds of chance
events may lead me away from my “true” opinion.
How to read a sample
Suppose that Trump and Biden are in a dead heat. It is
extremely unlikely that the poll, being just a sample, will register 50.0% for
Biden and 50.0% for Trump. Instead, the results will be influenced by a few
random factors, a few errors. How large are these accumulated errors likely to be?
A difference of 1% between the candidates? Or 10%?
To answer this question, we assume that each of the
two candidates receive 50% of the vote. If we take a lot of samples, their
average vote share for a given candidate will also be 50%. But in reality, we
take only one sample, so its average is not likely to be exactly 50%. What
range of values is likely?
We can visualize the range by looking at the distribution
of probabilities associated with certain outcomes. Define the outcome as Biden’s
share of votes in a race against Trump. In the figure above, concentrate on the
green curve. I’ll explain the others later. If there really is a tied vote, what
is the probability that in the sample, Biden receives no more than, say, 47% of
the vote? This probability is the area beneath the green curve from 47%
leftward to 0% (not shown in the graph). You can see that there is virtually no
area beneath the green curve in this range; it is almost on top of the
horizontal axis. Since there is virtually no area beneath the curve, the
probability that Biden receives no more than 47% of the sample vote when the actual
race is a tie, is virtually zero. You can
why this is useful to know: If Biden did receive 47% of the sample vote, then
our assumption that the real vote was tied, is probably wrong.
A fling on the green
Next, move to the right on the green curve. What is
the probability that Biden receives no more than 50% of the sample vote in a
tied race? You can see that at 50%, half of the area beneath the curve is to
the left, and the other half to the right.
By convention, all probabilities sum to 100%. So, the area to the left
of the 50% point is 50%. That is, given an actual tie in the population, the
probability that Biden receives up to 50% of the sample vote is 50%. This cumulative probability exceeds that of a
sample vote up to 47% (remember that this probability was close to zero!), because
we have added tcomes: The one in which Biden receives 48%, 48.5%, 49%, and so
forth, up to 50%. We could keep adding outcomes by moving right on the curve,
until we have accounted for all possible outcomes in the sample for Biden’s
share of the vote, from 0% to 100%. (The figure shows neither extreme.) The cumulative probability of all outcomes is 100%.
This is the total area beneath the green curve. If we are measuring in
fractions rather than percentages, then the total area would sum to 1.
A simple example is a coin toss. Only two outcomes are
possible: Heads, which has a 50% chance;
or tails, which also has a 50% chance. The probabilities of the two outcomes
sum to 100%. Expressed as fractions, the probability of a head is .5 and of a
tail .5. These sum to .5 + .5 = 1.
With this background, we can talk about the likelihood
of random errors in the poll sample. First, we assume a tied race. Biden would receive
50%. This corresponds to the middle point in the graph. Because we take only a
sample of likely voters, we probably won’t observe an average of exactly 50%. But the sample average should not be too far
from 50%. So, to infer whether the candidates are tied, we look at whether our
sample average is close to 50%.
The 95% confidence interval is the range
of sample averages that have a 95% chance of occurring if the race is tied. In
the figure, the confidence interval is the green horizontal line beneath the
figure, from 48% to 52%. The area beneath
the green curve that corresponds to this line is 95%—47.5% to the left, from
48% to 50%,; plus 47.5% to the right, from 50% to 52%.
If the sample average lies in the confidence interval,
then by convention, we accept that the race may is a dead heat. If the sample
average lies outside of the confidence interval, then we reject the notion of a
dead heat. For example, suppose that Biden’s share is 47%. Since this is outside
of the confidence interval, we reject the possibility that Trump and Biden are
tied. In particular, Trump is leading.
The margin of error is one half of the
confidence interval. That’s why we usually express it as, in our example, plus
or minus 2% as compared to the mean, which is 50%.
For instance, suppose that Biden receives 49% in the
sample. The confidence interval ranges from 48% to 52%. Since 49% is in this
range, we consider the race tied. Another way to say this is that the confidence
interval consists of Biden vote shares that are within 2% of the mean of 50%, on
either side. The margin of error is therefore plus or minus 2%. Now, 49% is only
1% away from 50%, so it is in the margin of error. Thus, if Biden receives 49%
of the sample vote, we accept that the race is tied.
Suppose instead that Biden receives 53% of the sample
vote. Since 53% is outside of the confidence interval, we don’t consider the
race tied. We conclude that Biden is winning. Another way to say this is that
3% is greater than the margin of error of 2%, so we accept that Biden is ahead.
Throwing a curve
Now look at the other curves in the figure. You will see that the margin of error
increases as the sample size decreases. This should make sense. There is less information
in a smaller sample, so the chances for error are greater.
In the Times Siena poll of 1,016 respondents, the margin of error was plus or minus 3.5%. Since 2% is less than 3.5%, we should accept that the race is tied for both registered voters and likely voters. To say instead that Trump is leading is simply wrong. But that’s what The Times did.
Using the 95% confidence interval is a conservative approach. It treats the race as too close to call unless chances are better than 95% that it is not that close. The reason for this caution is to avoid the sort of costly mistake that The Times made. We don't want to say that Trump is leading, or that Biden is leading, without good evidence.
The margin of error is calculated on the assumption
that the pollsters computed the sample average correctly. But in reality, pollsters
err in noting, inputting, and tabulating responses. They also often do not take
a random sample. For example, minor polls with increasing frequency these days
gather responses via online invitations, because this is easy and cheap. But this
technique can enable a respondent to bias the results by organizing his friends
to submit responses.
The upshot: Pollsters should calculate the change in
the expected value of the reported sample average due to errors in selecting
the sample and processing the data. They
should thus expand the conventional margin of error. But they rarely do. Siena
College didn’t. Indeed, one may question whether its stratification of the
sample was truly random selection.
However, things are bad enough as they are. The Times
reporters knew about the margin of error. It is at the bottom of their news
story. Indeed, one of them claims 15 years of experience in polling, although only
God knows how this is possible without stumbling once across a confidence interval.
Yet they and their editors chose to ignore the margin of error, probably in hot
pursuit of a headline saying Trump led Biden.
This is truly fake news, and it may reshape the
Presidential election. All news media follow The New York Times,
unfortunately, and arrogant lies like this one propagate ad infinitum…especially
in an era so divisive that small margins in both Houses of Congress have become
common, suggesting that Presidential margins may become small, too. In such
circumstances, ignoring the margin of error may lead commonly to error in
publication.
“Democracy dies in darkness,” says a rival of The Times
that has also descended into mediocrity. Isn’t it time that reporters learned
how to turn on a statistical flashlight?—Leon Taylor, Baltimore tayloralmaty@gmail.com
Notes
For helpful comments, I thank but do not implicate Annabel
Benson, Paul Higgins, Mark Kennet, and David Schatz.
References
Shane Goldmacher, Ruth Igielnik, and Camille
Baker. Trump’s
Legal Jeopardy Hasn’t Hurt His G.O.P. Support, Times/Siena Poll Finds - The New
York Times (nytimes.com) December
20, 2023.
No comments:
Post a Comment