Tuesday, November 1, 2022

Smoke and mirrors at The New York Times

Sometimes the critical detail in a news story is invisible.

Consider polling, one of the few areas where the news media still have the edge over other purveyors of information. Since anyone can text a message or post a cell-phone picture, the newspapers and newscasts no longer have a monopoly on even foreign news. But John Q. Public cannot afford to run a legitimate nationwide poll.  (He can post an online survey and invite responses, of course.  But this won't produce a random sample, since only the most motivated will reply, which skewers the sample.)  Only a few newspapers and TV networks can afford good polling.  

Traditionally, articles and newscasts about polls in political races gave the sampling error.  This measures the mistake that you can make by drawing, from the sample, larger conclusions.  For example, a newspaper may survey a few hundred likely voters in a Senate race to infer what all likely voters will do in that election.  The reason for publishing the sampling error is to enable the reader to check the reporter's claims, although I am not sure that today's journalists realize this. One double-checks because reporters are not exactly experts on statistics.   

Anyway, The New York Times, which sets the pace for US news media, has stopped publishing sampling errors in the main text of its new stories.  

In the past week, it failed to mention sampling errors in two stories about its polls concerning United States elections.  The co-author of both stories is Ruth Igielnik, a polling editor for The Times. Previous stories had given the sampling error at the end of the article.

The reader and the leader

In one of the stories, The Times declared a slight lead for one candidate when in fact the lead was less than the sampling error.  This was in the Georgia Senate race (a lead of 3%; the sampling error was plus or minus 4.8%).  In the other story, on four gubernatorial races, the comparison of the lead to the sampling error is less clear, as I'll explain.

The reported leads in the gubernatorial races varied:  0% in Arizona (Democrat Katie Hobbs versus Republican Kari Lake); 4% in Nevada (Republican Joe Lombardo); 6% in Georgia (Republican incumbent Brian Kemp); and 13% in Pennsylvania (Democrat Josh Shapiro).  The polling leads were rounded. I think we can agree that the Democrats are winning in Pennsylvania and that the Arizona race is anybody’s guess.  The question concerns Nevada and Georgia.  The Times finds a “narrow” preference for Kemp and a “slim lead” for Lombardo.   

The story on the gubernatorial races is linked to The Times/Siena College poll for the midterm elections.  For example, one question is: “Thinking ahead to the November midterm election, are you almost certain that you will vote, very likely to vote, somewhat likely to vote, not very likely to vote or not at all likely to vote?”  The Times treats the Senate and gubernatorial races as part of the same election. But a respondent may well interpret "the November midterm election" as applying to only national races.  In any event, the distribution of likely voters in Senate elections is likely to differ from that for state offices.

If we accept The Times's assumption of the same distribution for state and national races, then the sampling error in Nevada, plus or minus 4.2%, slightly exceeds Lombardo's edge.    

Why does the comparison of the reported winner’s edge to the sampling error matter?  One reason is that it affects campaign finances.  When the newspapers report a toss-up in the final weeks of a campaign, donors judge a fifty-fifty chance that the favored candidate will win.  When the papers report a slight edge for that candidate, the donors perceive a better-than-even chance of a successful donation and hence are more likely to pony up. When the newspapers ignore the sampling error, they overstate the candidate’s perceived chances of winning and thus can cost the donors money.

Covering up

Why then would the newspaper understate, or ignore, the sampling error?  Perhaps it thinks that the traditional criterion for the error is too rigorous.  The rule of thumb is that you don’t declare the candidate as winning unless there is less than a 5% chance that you will be wrong. (I’m not certain that The Times applied this rule, because its polling link nowhere specifies the confidence interval, another violation of basic rules for statistical reporting. Basically, the confidence interval states the chances that you will be right to declare the candidate a winner. The usual confidence interval applied is 95%.  In other words, you won't declare the candidate to be winning unless chances are at least 95% that you are right.) There is no particular reason for the 5% rule.  The newspaper may be willing to tolerate a larger chance of being wrong, because the news value of a clear conclusion from the poll (“Smith is winning!”) exceeds the likely cost to the newspaper of a mistake, since people will probably never discover its error (“Suddenly, the dynamics of the race changed!”). 

An alternative to this Machiavellian reasoning is that the newspaper regards political races as a sport. It is entertainment, and an error about its outcome won’t matter much.  Finally, of course, maybe the newspaper doesn't have a clue.  

Whatever.  If the newspaper doesn't know what a sampling error is, it should find out.  It should explain how it is handling the error and why.

To see why, suppose that it adopts a smaller confidence interval than the traditional 95% (that is, it permits a larger chance of an error than 5%) but does not tell the reader. Donors will think that the usual 5% rule still holds, and they will overestimate the candidate’s chances of winning.

Trumptistics

A larger question looms.  The historical purpose of the newspapers has been to promote democracy by enabling readers to think for themselves.  They reported both sides of an issue, even when one side struck the reporter as much weaker than the other, because they thought the reader should decide for herself. But since the 2016 election of former President Donald Trump, and especially since his accusations of fraud in the 2020 election that he lost, the newspapers have increasingly discounted the ostensibly weaker side of an issue, often as “baseless.”

Of course, the 2020 election was legitimate.  My point is that the newspapers should enable the reader to decide. They don’t have to repeat the evidence against Trump’s charges in every story; just provide a link.  Ritually characterizing the claims as groundless makes the characterization a shibboleth. Eventually, people forget the reason for calling them groundless, and they may then suspect that the charges were correct.  John Stuart Mill pointed to this danger in his 1859 essay “On Liberty.”

The same reasoning applies to statistical reporting.  Times readers are highly educated, and they would take an interest in the newspaper’s statistical analysis. The Times should provide a sidebar or link explaining sampling errors, with examples that the general reader can follow.

I wrote The Times yesterday morning asking why its story failed to report the sampling errors. I received no reply.

--Leon Taylor (gingerly climbing down from his soapbox), Baltimore, tayloralmaty@gmail.com

 November 6: This update corrects a mistake about The Times's omission of sampling errors. In its link to detailed results, The Times assumes the same sampling error for gubernatorial races as for Senate races.

References

Lisa Lerer and Ruth Igielnik.  2022.  Senate control hinges on neck-and-neck races, Times/Siena poll finds.  The New York Times, October 31.  Retrieved from nytimes.com

Reid J. Epstein and Ruth Igielnik.  2022.  In close, crucial governor’s races, poll finds sharp split on elections.  The New York Times, November 1.  Retrieved from nytimes.com

  

No comments:

Post a Comment