Friday, September 13, 2024

More games newspeople play

 

                                           Do the voices add up?  Photo credit: Unsplash

As always, The Washington Post offers, er, interesting political math.

It tells us that in the seven battleground states that would probably determine the electoral vote in the Presidential election, Vice President Kamala Harris leads in three, former President Donald Trump leads in two, and the other two are ties. A tie is defined as a margin of a quarter of a percentage point. No explanation.

But digging into the story, we read that "every state is within a normal-sized polling error of 3.5 points and could go either way." In other words, all seven states are too close to call. Neither Harris nor Trump leads in any of them merely because they lead in the sample. The sample is never a perfect reflection of all voters, and one must consider how imperfect the reflection may be before basing conclusions on the sample.

For example, suppose that in Pennsylvania, Harris led in the sample by one vote. Would we conclude that she is winning the Pennsylvania race? Surely not. A one-vote margin is tiny. It is very likely that one winning vote results from an error, such as voters who misunderstand the question. So we would not put much faith in the conclusion that Harris is truly ahead.

How large must the margin be, then, for us to conclude that it gives us good information? The answer to that question is a statistic called the "margin of error."

The Post tells us that the margin of error is 3.5%. The usual interpretation is that if the margin exceeds 3.5%, then the probability is 95% that the leader in the sample is truly winning.

But here is where the Post math gets really interesting. The explanatory notes say that the 3.5% estimate is based on a calculation that in "the last few presidential cycles...the average modeled polling error in competitive states was 3.5 percentage points." Which presidential cycles? Which competitive states? Were they the same as this year's seven battleground states? Who knows?

Well, OK, 3.5% is the "average" polling error. I presume that this means that chances are 50% that the candidate ahead by 3.5% in the sample is actually winning the race. I presume wrong. Reading on: "...To account for this [3.5% polling error], our averages factor in the 90th percentile possible error (i.e., how bad would the error be in the worst 10% of cases)." In other words, chances are 90% -- not 95%, not 50% -- that the candidate ahead by 3.5% in the sample is truly winning. Feel free to scratch your head.

Freaky fractions

Sports fans, here's the score. Usually, the margin of error is based on the probability distribution. This is the range of probabilities for particular outcomes. For example, a probability distribution for the outcome of one coin toss is 50% no head (that is, a tail) and 50% one head (no tail). The distribution for the outcome of 100 coin tosses can give us the probability of zero heads (or 100 tails), the probability of one head (99 tails), and so forth. All distinct probabilities sum to 100%. For example, on a coin toss, there is a 50% chance of no head and a 50% chance of one head, adding up to 100%.

The distribution of a Harris margin may be the probability of minus 100% (that is, she got no votes), the probability of minus 99% (she got 1% of the vote), etc. We could also look at fractions like minus 99.9%.

The most common distribution used is the normal. This has a bell shape: Small probabilities at the extremes (like minus 100% of the vote for Harris, or plus 100%) and large probabilities in the middle (like a zero margin for Harris, that is, both candidates get the same vote).

The probability distribution is a theory. But it leads to accurate conclusions when correctly handled. For example, if we observe that Harris loses 100% of a well-executed and large poll sample, we may confidently conclude that she is not winning the race. To calculate the precise margin of error, one fits out the probability distribution by using information from the poll sample.

But The Post derives its margin of error not from a probability distribution but from recent actual errors. Its information came not from the current sample but from past performance. How it gets from this estimate based on past empirics to the present theoretical one is beyond me. Perhaps it assumes the same probability distribution for past election cycles as for the present poll samples, but The Post says nothing about this. It looks to me as if it arrived at its estimates essentially by playing pin-the-tail-on-the-donkey.

To recap: Harris is winning in three states! No, wait a minute. We're not sure. It could be an error. No, wait. We're not sure how to calculate the possible error. No, wait....

This matter is serious, and not just for nerds like me. The Post is wrong about how close the race is. It's too close to call not only across the battleground states on average, but in every battleground state. The Post's nonchalance would lead campaigns to understate the need for staff, volunteers, ads, and money in most battleground states.

The Post's FAQ asks: "Are you going to release the code of your model?" The newspaper replies: "We really want to and are working on that." Outstanding. Shouldn't The Post have released the code when it published the results? One delays code publication to clean up confusion and error. Why didn't The Post clean up the code first?

Continuing: "When we release the code, we're also hoping to publish a more technical explanation." In other words, The Post did not think through its assumptions, since one does so by writing out their justification. The Post winged it.

Democracy dies in darkness. And The Post is smashing the lamps. -- Leon Taylor, Seymour, Indiana, tayloralmaty@gmail.com


References

Lenny Bronner, Diane Napolitano, Kati Perry, and Luis Melger. Harris vs. Trump 2024 presidential polls: Who is ahead? - Washington Post   September 13, 2024.

Thursday, September 12, 2024

The misshape of things to come

 

                                           Going south?  Photo credit: NBC News.

The New York Times writes: "With the Kursk incursion, Mr. [Rustem] Umerov [Ukraine's defense minister] argued, Ukraine has demonstrated it can invade, and even occupy, Russian territory without igniting World War III, according to two officials.

"But American officials say it is too early to reach that conclusion, because there are many ways for Mr. [Vladimir] Putin [Russia's president] to retaliate."  

November 5, for example -- the day of the Presidential election in the United States. If Putin seeks to win his war with Ukraine, his cheapest means may be to ensure, by hook or crook, the election of former President Donald Trump. In the Republican candidate's debate Tuesday with the Democrat candidate, Vice President Kamala Harris, Trump refused repeatedly to say he wanted Ukraine to win the war. Instead, he said he wanted to end the war and if elected would do so in 24 hours by phoning Putin and Ukrainian President Volodymyr Zelensky. The implication, as Harris said, was that Trump would force Zelensky to concede the war by threatening to cut off military aid to Ukraine.

However, even Harris did not seem to understand how a Russian victory would affect Central Asia. Harris said Putin would next target Poland. This, I think, is ludicrous. Poland has belonged to NATO since 1999. An invasion of Warsaw would activate the NATO requirement that all members defend the one under attack. That would mean World War III, and Putin is not so stupid as to risk it.  More likely he would target a nation that does not belong to NATO and that has relatively little strategic interest for the US and Europe -- Kazakhstan.   Leon Taylor, Seymour, Indiana tayloralmaty@gmail.com        


References

David Sanger, Helene Cooper, and Erich Schmitt.  Biden Poised to Approve Ukraine’s Use of Long-Range Western Weapons in Russia - The New York Times (nytimes.com)  September 12, 2024.

Sunday, August 18, 2024

The games newspeople play

 

                                            A winner or just a statistic? Photo credit: Britannica


Is Kamala Harris winning? The Washington Post and The New York Times would love to tell you. What they won't tell you is that they are either hopelessly confused or lying. 

Let's start with today's Post. "Vice President Kamala Harris holds a narrow lead over former President Donald Trump in the presidential election, a notable improvement for Democrats in a contest that a little more than a month ago showed President Joe Biden and Trump in a dead heat, according to a Washington Post-ABC News-Ipsos poll....Given the margin of error in this poll, which tests only national support, Harris's lead among registered voters is not considered statistically significant."

Big news, sports fans! Harris is winning! Our poll says so! But...wait a minute...it's not statistically significant, which means...um...hmm.

Friends and neighbors, you can't have it both ways. Either Harris is winning, or she isn't. The rule of thumb is this: If the poll margin is within the margin of error, you cannot deduce with 95% confidence that either Harris or Trump is winning. The race is a dead heat.

But.

Digging into the story, we learn from a graph that the poll has a margin of error of plus or minus 2.5%. Harris's lead in the poll is 4%, or 49% to 45%. So Harris is winning. The poll margin, 4%, is larger than the margin of error, 2.5%. It is statistically significant: In other words, the result is very likely to hold as well outside of the sample, for the country in general. The Post is breathtakingly mucked up.

Its confusion probably arises from its misunderstanding of the plus-or-minus designation. The idea is this: We want to test the hypothesis that the race is a tie. That would happen if Harris is neither winning nor losing. Our usual criterion is that we will accept that the race is a dead heat unless we are 95% confident that Harris is either winning or losing. Well, if Harris's lead exceeds 2.5%, it is not a dead heat. And had Trump's lead exceeded 2.5%, that is, had Harris's lead been -2.5%, it would not have been a dead heat. It is not the case that Harris's margin must exceed 2 times 2.5% for us to conclude that the poll result is statistically significant, that is, that the race is not a dead heat.

The further conclusion, that Harris is winning, is easy to confirm once we see that the race cannot be a tie. The intuition is this: If Harris is so far in the lead that we can reject the possibility that the race is a tie, then we can also reject the possibility that Trump is winning.

In this case, The Post lucked out. Its headline correctly said the poll indicated that Harris was winning. But there is a more important point: The Post doesn't know what the hell it is doing.

 The error about errors

 Newspapers goof when reporting polls because they do not understand how polling errors occur. When you ask someone in newspapers why they don't take the margin of error seriously, you will get an answer like this: Well, the pollsters told us they were careful about polling. So we didn't worry about potential errors. They had to be small. We reported the margin of error only because everyone says we should.  

Grrr.  Now hear this: The margin of error is calculated under the assumption that the polling was perfect. Even when the polling sample is an accurate mirror of likely voters, the outcome of the poll may not be accurate. There is still a good chance that too many Harris supporters were interviewed. There is also a good chance that too many Trump supporters were interviewed.

A simple example will show what I mean. Suppose that we have a class of 100 students: half receive As, and half receive Bs. (Welcome to grade inflation.) We take a sample of 10 students. Even if the sampling is utterly fair, there is still a chance that at least 6 of the 10 students sampled received As. Based on the sample, we wrongly conclude that the majority of students received As. In reality, only half did.

Because of such possibilities, statisticians test the idea that of all likely voters, half favor Trump and half favor Harris. We can dismiss this hypothesis of a dead heat if a large-enough share of the sample favors either Harris or Trump. The "margin of error" reflects how large the share of respondents backing one candidate must be if we are to dismiss the possibility of a dead heat, assuming perfect polling. If the pollsters made mistakes, and they usually do, the actual margin of error is even larger than the one usually reported.

Moving right along…

 For years, The New York Times has reported poll survey results as if they were true for likely voters in general. Yesterday's story is an example. Yes, we can conclude that Harris is leading in Arizona, because her margin (barely) exceeds the margin of error (see the figure below). But contrary to The Times, we cannot conclude with 95% confidence that either Harris or Trump is leading in North Carolina, Nevada, or Georgia. Those survey margins are within the margins of error, which are shown in the graph in the article. By the usual definitions in political polling, those races are dead heats.

Remarks later in the story indicate what The Times is really up to:  "As the Harris and Trump campaigns rush to define each other in the remade race, voters see a choice between strength and compassion. Voters were about equally likely to see each candidate as qualified and change-makers, but significantly more voters viewed Mr. Trump as a strong leader....When voters were asked who 'cares about people like me,' Ms. Harris had a slight edge over Mr. Trump: 52 percent compared to 48 percent."

The Times seems to think that it can report results from a sample as if they held for the population, as long as it labels results that hold with 95% confidence as "significant." (The poll margin for the question about strong leaders was 8%, much larger than the margin of error.) The Old Grey Lady is playing word games. Most readers are not statisticians. When The Times says Harris has "a slight edge," they think that The Times is talking about the population, that is, the world outside of the sample. They view the word "significantly" as redundant. 

So The Times has it both ways. It tricks readers into believing that it is breaking news by declaring a winner. And, if challenged, it can always point out that, after all, it did refer to significance, sort of.

A simple example will show why this is a cardinal sin. Suppose that the president of General Motors misstated stock earnings for years. The Securities and Exchange Commission would investigate and probably force the president to resign. He might even face charges of fraud. Shouldn't we hold the nation's leading newspapers to standards at least as high? In a close race like this, newspaper reports of the polls affect donors. The money is sloshing towards Harris because of her perceived momentum. How real is that perception?

Another common dodge of newspapers: Well, do we really need 95% confidence that Candidate X is winning? Surely 90% is good enough. In that case, the margin of error is smaller, and the poll margin may now exceed it.  We can then conclude that X is winning.

It is true that 95% is an arbitrary standard. But it is also a universal one, especially in political polling. To apply 95% in some cases and (quietly) 90% to others, which The Post is fond of doing, amounts to moving the goalposts at halftime for some games but not for others. It makes it hard to compare poll results. Is Harris ahead in certain polls because she is winning the race, or because those polls adjusted the margin of error until she was "winning"?       

The Times and The Post say their polling staffs have graduate training in statistics. I do not see how a well-trained statistician can unknowingly make errors this basic. I will leave it at that.  

Noncardinal sins....

There's more, much more. The Times writes: "The polls show some risk for Ms. Harris as she rallies Democrats to her cause, including that more registered voters view her as too liberal (43 percent) than those who say Mr. Trump is too conservative (33 percent)." 

It is not clear why The Times says this shows "some risk" for Harris. Perhaps it means that voters are significantly more likely to view Harris as liberal than to view Trump as conservative. If that's what it means, it should provide evidence. The margin of error applies here, too. But even if it is true that voters are more likely to judge Harris as liberal than to judge Trump as conservative, why would this be a risk for Harris?

The Times continues, "For now, [Harris] is edging ahead of [Trump] among critical independent voters." Evidence?

The Times also writes, "Mr. Trump and Ms. Harris are tied at 48 percent across an average of the four Sun Belt states in surveys conducted Aug. 8 to 15. That marks a significant improvement for Democrats compared with May, when Mr. Trump led Mr. Biden 50 percent to 41 percent across Arizona, Georgia, and Nevada in the previous set of Times/Siena Sun Belt polls, which did not include North Carolina." 

The Times is comparing apples to oranges. One sample includes North Carolina, the other doesn't. And it is not clear why it bothers to make this top-level comparison. The polls differ from state to state, and The Times has specific information about each. Why not stick to reporting the results for each state poll rather than aggregate them into an inferior measure? 

If The Times simply must have a nation-level conclusion, it can view each state poll as an observation in a sample of all state polls and compute the probability that Harris is leading nationwide, given the sample of state polls. Do the same for May.

Finally, a puzzle. In Arizona, The Times/Siena poll reports Harris ahead in the sample by 5% among likely voters and 4% among registered voters. But the poll by the Competitiveness Coalition/Public Opinion Strategies about two weeks earlier reported Trump ahead by 5% among likely voters. And The Hill/Emerson College poll about three weeks before The Times/Siena poll reported Trump ahead among registered voters by 5%. Were these differences due to Harris's momentum? Or were they due to differences in how the polls were conducted?

 


                                           Source: The New York Times

 The key question: Are The Washington Post and The New York Times statistically significant? 

Anyway, these conundrums pertain to Central Asia because the political polling in the region, or at least the reporting of it, is anything but perfect. The margin of error is correspondingly higher.  In short, politics pervade political statistics. --Leon Taylor, Seymour, Indiana, tayloralmaty@gmail.com 

 

Notes             

For helpful comments, I thank but do not implicate Annabel Benson, Richard Green, and Mark Kennet. Parts of this post draw upon my earlier articles on my Facebook page.

 

References

Dan Balz, Scott Clement, and Emily Guskin.  Kamala Harris holds slight national lead over Donald Trump, Post-ABC-Ipsos poll finds - The Washington Post August 18, 2024.

Shane Goldmacher and Ruth Igielnik.  Kamala Harris Puts Four Sun Belt States Back in Play, Times/Siena Polls Find - The New York Times (nytimes.com)  August 17, 2024.

Lisa Lerer and Ruth Igielnik.  Harris Leads Trump in Three Key States, Times/Siena Polls Find - The New York Times (nytimes.com)  August 10, 2024.  


Thursday, July 4, 2024

Stabbing Caesar

 

                                           The last exit?  Photo credit: Al Drago, The New York Times

Should Joe Biden quit the Presidential race? Two questions are in play.
The first is whether he is too sick to be President. Of the three branches of government, the Executive is unique for being just one person, and an extremely powerful one. He doesn’t just manage the agencies. He commands the armed forces, makes foreign policy, issues pardons, and advises the nation in speeches that are the most influential of any American. The Supreme Court notes that he cannot do his job if he decides feebly. Indeed, the Court’s grant this week of absolute immunity from criminal prosecution for official Presidential assumes that fending off criminal charges would leave him little time for big decisions. The Court wrote that, as with civil suits, criminal prosecution might “chill” the President “from taking the ‘bold and unhesitating action’ required of an independent Executive.”
But surely feeble reasoning also leads to feeble decisions. In a debate last week for which he himself proposed the rules, Biden struggled for thought and sometimes ended in non sequiturs (“We finally beat Medicare”). This was not just one bad night. In an interview with David Muir of ABC News just before his D-Day speech, Biden again spoke barely above a whisper and was virtually impossible to understand. In his first two years, Biden gave fewer press conferences than any President since Reagan, reported The New York Times. He met with journalists 54 times; Trump in his first two years, 202 times. On Monday night, Biden passed up the opportunity to show that he could think on his feet. Instead, he read from a teleprompter his statement about the Court’s immunity ruling and refused to answer questions. His recent faux pas include saying in February that he had just met François Mitterrand, a French President who died in 1996, rather than President Emmanuel Macron, as well as meeting with Chancellor Helmut Kohl of Germany, who died in 2017, rather than with former Chancellor Angela Merkel.
On the other hand, Trump in a January speech confused former House Speaker Nancy Pelosi with his own United Nations Ambassador and chief rival for the GOP nomination, Nikki Haley. And in 2020 he proposed on TV to cure Covid-19 by injecting bleach into the blood.
Is Biden mentally incompetent as President? The Constitution has a way to answer this question – the 25th Amendment. The President can be removed if the Vice President and a majority of Cabinet officers agree that he is not fit to serve. There is no evidence that Biden is anywhere near this point. By Constitutional standards, he is competent. At the moment.
But we have faced this problem before. When President Woodrow Wilson fell to a neurological illness in 1919 while campaigning for the Versailles Treaty ending World War I, his wife Edith was said to have secretly made decisions for him. Aides to President Ronald Reagan considered the 25th Amendment when his attention flagged in his second term. Yet the Constitution recognizes only the Vice President as the successor to an incapacitated President.
What about informal standards for Presidential competence? Every person will answer for herself. I think that most of Biden’s decisions have been reasonable, although, as a fiscal conservative, I disagree with most of them. I have not seen an increasing trend toward ridiculous decisions, aside from his threat in May to withhold smart bombs from Israel if it invaded Rafah by land, a threat with no obvious connection to American national interests.
Trump, on the other hand, has proposed to use the military as mercenaries, which demonstrates feeble reasoning. He brought this idea up in a 2017 meeting in the “the Tank” of the Pentagon, where the Joint Chiefs of Staff met, wrote Leonnig and Rucker. And he returned to it in last week’s debate, when he said he would consider pulling out of North Atlantic Treaty Organization if NATO members did not pony up to their funding pledges. That amounts to using the armed forces to make, or save, money.
The second question is whether Biden’s poor speaking will cost him the election. In national polls, Biden and Trump are in a dead heat. In a few battleground states, Trump leads, according to the newspapers. But they have overstated the number of states in which he leads, because they apply to state polls too small of a margin of error.
For example, The Washington Post declared on June 26 that Trump was winning in four of seven battleground states (North Carolina, Nevada, Arizona, and Georgia), assuming a 3.5% “normal-sized polling error.” The other three battleground states—Pennsylvania, Wisconsin, and Michigan—were too close to call.
How did The Post decide that 3.5% was a “normal” error? Its technical notes say: “We found in the last few presidential cycles that the average modeled polling error in competitive states was 3.5 percentage points, so to account for this, our averages factor in the 90th percentile possible error (i.e., how bad would the error be in the worst 10 percent of cases).”
The point to note is that The Post is allowing for a potential error of 10% in identifying the winning candidate, not the usual 5% maximum error. Because The Post is unusually tolerant of error, it uses an unusually small margin of error in determining whether the candidate leading in the poll is actually winning the race. This means that Trump may not be winning in all four states by the usual criteria. For a state poll, and a potential error in identifying the winner of 5% or less, a margin of error of 5% would not be surprising. Using this margin of error, Trump led in only one of the seven battleground states—Georgia, where his margin over Biden was 6%.
The Post did not explain why it had doubled the conventional tolerance of error. Nor did it tell the general reader that it had done so.

To the Post's credit, it did try to go beyond the standard margin-of-error analysis, which, as you have pointed out, assumes that the polled sample is randomly drawn from the population of likely voters and therefore a reasonably accurate reflection of it. The Post mentions measurement error and frame error. 

(A note for readers who have better things to do with their time than pore over statistics: Measurement error arises from one's attempt to measure a process. For example, if I use a ruler marked off in centimeters to estimate the length of a cockroach -- hey, I've got to have some kind of hobby -- my answer will probably be off by several millimeters. In political polling, measurement error may occur because the interviewers are poorly trained, and so forth. 

(Frame error occurs because the makeup of the sample differs from the makeup of the population studied.  If my sample is 70% women and the population is 50% women, then the sample might not accurately depict the population.) 

But The Post doesn't make clear exactly what it did about particular errors. 

I worry especially about nonresponse error in Trump-Biden polling. Nonresponse error may be growing: The Pew Research Center said that in telephone surveys, its response rate in surveys had dropped from 36% in 1997 to 9% in 2012. It is reasonable to think that potential respondents whose favored candidate is losing are less likely to take part in a survey. That would bias the sample in favor of the winning candidate. I suspect that many Biden fans will be too depressed to talk to interviewers. 



But Biden is certainly not winning the race. In a post-debate CBS poll, 72% of the respondents doubted that he could think clearly enough to be President. In a flash poll after the debate, CNN found that two-thirds of the respondents thought that Trump had won the debate. Roper and Gallup, which have been polling reactions to Presidential debates since the 1940s, have not yet released their results.
Biden has committed to the second and last debate on September 9, hosted by ABC News. I assume that it would follow the same rules as the first. Given the visible deterioration in Biden's speaking over the past month, I do not see why he would do better in two months. And losing a second debate two months before the election cannot help him.
In sum, I worry less about his fitness for office than about his chances of winning, although I worry about both.

This question bears upon Central Asia because Biden strongly supports Ukraine in its war with Russia. The American support of Ukraine has led to sanctions against Russia, and these indirectly create a black market for military exports from Central Asia to Russia. If Trump returns to the White House, he will probably end sanctions against Russia, and the black market will collapse. -- Leon Taylor, Seymour, Indiana tayloralmaty@gmail.com

Notes
For helpful comments, I thank but do not implicate Annabel Benson, Paul Higgins, and Mark Kennet.

References
Peter Baker and Susan Glasser. James Baker’s 7 Rules for Running Washington - POLITICO September 28, 2020.

Peter Baker, David E. Sanger, Zolan Kanno-Youngs, and Katie Rogers. Biden’s Lapses Are Said to Be Increasingly Common and Worrisome - The New York Times (nytimes.com)
Lenny Bronner and Diane Napolitano. How The Washington Post’s presidential polling averages work - The Washington Post June 26, 2024.
Lenny Bronner, Diane Napolitano, Kati Perry, and Luis Melgar. Who is ahead in 2024 presidential polls right now? - Washington Post June 26, 2024.
Dartunorro Clark. Trump suggests 'injection' of disinfectant to beat coronavirus and 'clean' the lungs (nbcnews.com) April 23, 2020.
Jim Dunnelly. Watch David Muir's Exclusive Interview with President Biden on the D-Day Anniversary | ABC Updates June 5, 2024.
Ariel Edwards-Levy and Jennifer Agiesta. CNN Flash Poll: Majority of debate watchers say Trump outperformed Biden | CNN Politics June 28, 2024.
Michael Gold. Trump Confuses Haley and Pelosi, Accusing Rival of Jan. 6 Lapse - The New York Times (nytimes.com) January 24, 2024.
Carol D. Leonnig and Philip Rucker. 'A Very Stable Genius' book excerpt: Inside Trump's stunning tirade against generals - The Washington Post January 17, 2020.
Becky Little. Reagan Aides Once Raised the Possibility of Invoking the 25th Amendment | HISTORY November 6, 2023.
Iain Marlow. Biden Threatens to Withhold Weapons to Israel Over Rafah Invasion | TIME May 9, 2024.
National Constitution Center. 25th Amendment - Presidential Disability and Succession | Constitution Center
Zoë Richards. In his second mix-up this week, Biden talks about meeting with dead European leaders (nbcnews.com) February 7, 2024.
Michael D. Shear. Biden Holds Fewest News Conferences Since Reagan - The New York Times (nytimes.com) April 21, 2023.
Andrea Vacchiano. Doubt in Biden's cognitive abilities jumped after debate against Trump: poll | Fox News June 30, 2024.
Woodrow Wilson - Wikipedia
Caitlin Yilek. When the next presidential debate of 2024 takes place and who will moderate it - CBS News June 28, 2024.

Monday, May 27, 2024

Why do people die in the military?

 


                                          Oops. Photo credit: US General Accounting Office

For Memorial Day, I looked at the annual causes of death among active-duty personnel in the US Army from 1980 through 2022. More recent data were not available.

According to my calculations, the death rate has fallen by more than half since the 2007 withdrawal of troops from the war with Iraq that stemmed from President George W. Bush’s invasion in March 2003. The Iraq War continued through 2011, and about 2,500 US troops are still stationed there. Since 2008, the military death rate has fallen from 14.3 deaths per 10,000 on active duty to 6.5 (Figure 1).


                                          Figure 1: Total US military deaths per 10,000 on active duty

For decades, the leading cause of military deaths was accidents. But since 1980, the death rate due to accidents has fallen by nearly three-fourths, from 7.6 to 2.0 per 10,000 active-duty troops. Meanwhile, the death rate due to suicides has more than doubled, from 1.1 to 2.6. As of 2022, suicides were the leading cause of Army deaths (Figure 2). 

Illness is not a major cause of military death. The spike in deaths due to illness around 2020-2021 is probably due to Covid-19. Even in those years, the number of deaths due to illness was smaller than the number due to either suicides or accidents.  This smallness was probably due to the Army’s mandatory vaccination in 2021.  RFK Jr., take note. But the Army no longer requires vaccination.  


                                  Figure 2: Military deaths by cause per 10,000 on active duty

Terrorism was a significant cause of military deaths only in October 1983, when two trucks loaded with explosives blew up American and French barracks in Beirut, killing 220 Marines, 18 sailors, and three soldiers. Three months later, President Ronald Reagan withdrew troops from Beirut. One might have expected this withdrawal to encourage more terrorist attacks killing soldiers, but in fact it had no significant effect. The number of active-duty deaths due to terrorism is random over time (Figure 3). After 1983, the largest terrorist attack claimed 46 active-duty deaths on September 11, 2001. Since then, there has been only one military death attributed to terrorism, in 2008 (Figure 3).


                                         Figure 3: Number of active-duty deaths due to terrorism

Neither were terrorist deaths obviously due to hostile action. The correlation between deaths due to action and deaths due to terrorism is -.12, which is essentially random. Interpreted literally, the number of deaths due to terrorism fell when the number due to action rose, for those on active duty. But the correlation is so small that it might well have occurred by chance. 

Like deaths due to terrorism, the number of deaths in action has also fallen to virtually zero, from 847 in 2007. Until then, the deaths in action due to the invasion of Iraq were the leading cause of active-duty deaths (Figure 4). 


                               Figure 4: Active-duty deaths due to hostile action

I thought that fatal accidents might have been more likely in action. But no. The correlation between deaths due to action and those due to accidents, again for those on active duty, was -.17, again essentially random. The Army says its leading cause of deaths is vehicle accidents during training, especially in the summer, when new platoon leaders arrive and may not know how to avoid mishaps. Also, urban recruits are used to taking the bus or subway rather than driving.  And many Army and Marine recruits drive poorly because they lack sleep, according to the General Accounting Office, the watchdog of Congress. 

The Army says the vehicle death rate has fallen because of emphasis on safety, which means exactly nothing.  Airplane accidents seem to occur in August, when daylight hours are longer for flight training. Anticipating this timing may have improved training.  But in general, the statistical work on military accidents is too poor to permit many firm conclusions, in my opinion. The dataset is not rich, and the statistical techniques are primitive (so of like the ones in this post!).   

I conclude: The military has reduced its death rate steadily, outside of times of war, especially by avoiding fatal accidents. But mental illness has become much more important as a cause of death, when compared to other causes, perhaps partly because most soldiers are too young to die of old age. Nevertheless, the military may be in a position to provide more mental health care than do the public high schools, although mental illness is a growing diagnosis for their students.  In its 2021 survey of high school students, the Centers for Disease Control and Prevention found that more than two in five “felt persistently sad or hopeless.” Nearly three in ten reported “poor mental health.” More than one in five “seriously considered” suicide, and one in 10 attempted it.  Distress, often manifest in thoughts of, or attempts at, suicide, was more common among students who were LGBQ+, female, and black.  Leon Taylor, Seymour, Indiana tayloralmaty@gmail.com

 

Notes

For valuable comments, I thank but do not implicate Sergeant Annabel Benson of the US Army. All data are from the Pentagon’s Defense Casualty Analysis System.


References

Benyon, Steve. The Top Killer of Soldiers, Army Vehicle Deaths Are Tied to Poor Training, Though Numbers Down | Military.com November 12, 2021.

Centers for Disease Control and Prevention.  Youth Risk Behavior Survey Data Summary & Trends Report: 2011-2021 (cdc.gov) . 2021. 

United States Department of Defense. Defense Manpower Data Center.  Defense Casualty Analysis System (osd.mil) 

United States General Accounting Office. Military Vehicles: Army and Marine Corps Should Take Additional Actions to Mitigate and Prevent Training Accidents | U.S. GAO July 7, 2021.


 


Tuesday, April 30, 2024

Leading the future of social sciences (and tripping)

 

                                         Meet KIMEP's new dean.  Credit: Getty Images

A few months ago, in one of his periodic blowups, as predictable as thunderstorms in summer, the President of KIMEP University, Chang Young Bang, fired the longtime dean of the College of Social Studies, Gerald Pech. Since then, the university has scrambled for Pech’s successor. It seems to have found him, and I must say that Dr. Bang richly deserves him.

KIMEP flew in Jason Gainous, last seen at Duke Kunshan University (search me), to present a Powerpoint seminar entitled, “Interconnectedness and impact: Leading the future of social sciences.” It went downhill from there.  Dr. Gainous is passionately concerned with “the web of interconnectedness,” in which “social sciences are increasingly interconnected,” “human behaviors and societal structures are interwoven, influencing global phenomena (hey, they’re interconnected, too!), and “digital technologies are reshaping disciplinary interconnectedness.” And that was just the first slide. But listen, we’ve got to get our priorities straight, because “interconnectedness fuels impactful changes beyond social sciences.” Dr. Gainous’s inspiring vision embraces “Unified Impact: Merging Liberal Arts with Social Sciences.” This breathtaking dream focuses on “philosophical interconnectedness” with “holistic insights” that “deepens [sic] understanding of human and societal dynamics.” I don’t know about you, but my pulse is already racing, especially since Dr. Gainous is dedicated to “Building Interconnectedness Through a New Political Science Department.”

But enough of the noble nightmare, er, dream. Let’s get down to brass tacks! Dr. Gainous proposes the KIMEP Center for Social Science Research Methods. This would develop a “comprehensive Ph.D. program focused on advanced research methods (I guess we wouldn’t want a doctorate focused on backward methods), including applied statistics, modeling, artificial intelligence, big data, and machine coding, and qualitative methods.” Research Methods Training Seminars would “foster[] a community of learning and sharing, interconnectedness (of course!) and impact.”

Most of Dr. Gainous’s gas production can be safely dispersed.  But his new Ph.D. program is another matter, because it must be approved by the Ministry of Education and Sciences. So let’s talk about it.

Dr. Gainous is a connoisseur of ritzy-sounding terms that he doesn’t understand. “Applied statistics” is “modeling.” On the other hand, AI, big data and coding, and qualitative methods are very separate concepts.

Qualitative methods are as old as statistics themselves. We can measure data in two ways. One is in small chunks. For example, we can measure income in dollars and cents. But not all data are suitable for continuous numbers. An example is gender. It’s male or female, and it cannot be measured with a number. Another example is racial origin. Such data are “qualitative.” Data that can be measured are “quantitative.”

Nothing about qualitative data requires a new doctorate degree. It differs from quantitative data only in its units of measurement. We can gauge the impact of a one-dollar increase in income on spending. Those data are quantitative. But if we’d like to know how gender affects spending, we will have to compare spending by a male to spending by a female. The usual way to do this in a statistical model is with a variable that has the value of 1 for females and 0 for males. For example, suppose that we estimate this model: Spending = 40 + 2* Female. The typical female spends 40 + 2*1 = 42. The typical male spends 40 + 2*0 = 40. That’s all there is to it.

Well, almost all. There is a temptation to introduce two gender variables, Male and Female. Male would equal 1 for males and 0 for females. Female would equal 1 for females and 0 for males. But a moment’s thought will show that this is saying the same thing twice. If we have a Female variable, we already have a distinctive value for males; it’s Female = 0. We do not need a Male variable.

In fact, if we include both Male and Female variables, the statistical software may go berserk. This is because it assumes that all data in a dataset are useful.  The Male variable is useless, because it duplicates what we already know from the Female variable. But as long as we avoid such duplicate variables, we will have no problem with qualitative data. That part of Dr. Gainous’s “advanced” doctorate can be explained in five minutes.

“Big data” is another term for which Dr. Gainous needs to go get a clue. It just refers to a lot of data.  If you gather data on gross domestic product (the value of what an economy produces) for each year of independent Kazakhstan, you will have only about 32 observations. That’s not big data, although it is very valuable. If instead you gather survey responses by every person in Kazakhstan, you will have more than 19 million observations. That’s big data.

It requires a shift in the way that we model statistics. Our usual problem is to reach conclusions about the real world when we know only a little bit about it. For example, we may want to know whether the typical Kazakhstani approves of President Kassym Jomart Tokayev. But we have enough money only to survey 100 people. Can we extrapolate our results to the nation? Maybe, if we pick a sample so impartially that it is like a microcosm of the nation. “Inferential testing” determines when this is the case, and it usually eats up half of a course on econometrics (the economist’s term for applied statistics). But if we can survey every Kazakhstani, our usual problem disappears. Even if we can survey “only” a million Kazakhstanis, inferential testing will go the way of the dodo. If randomly sampled, a million observations can give us an accurate picture of the nation.  

But big data are not a panacea.  They cannot substitute for clear thought. For example, even if we survey every Kazakhstani, we will get nonsense from regressing the respondent's opinion of Tokayev upon the respondent's blood type. 

A more subtle problem is the failure to control for factors that relate to the explanatory variable that interests us. For example, in the Mincer equation, we may regress the (natural log of) the wage on education across workers: Wage(i) + a + b*Number_of_years_of_schooling(i)+..., where i indexes the worker. Our estimate of the coefficient b gives the rate of return to another year of education. But the model is not as perfect as it may seem. Talent also affects the wage: Talented people are highly productive. And talent correlates with education: Talented people earn advanced degrees. But there is no obvious way to measure talent: IQ is but a crude gauge. So we cannot control for talent in the Mincer equation. Since it does rise with education, part of the rate of return that we ascribe to schooling is really due to talent. That is, we will overestimate the rate of return to education. This problem will persist even if we have a zillion observations.           

Finally, big data are a headache to manage. We need a way to find particular facts quickly in a dataset of millions of observations, and to identify fake or misleading data. This may require new theories of computer science and statistics. Whether these can be developed in a doctoral program aimed at political science students still struggling with the multiplication tables is, well, food for thought.

I don’t mean to suggest that Dr. Gainous would be wholly useless.  Academic deans are often appendages anyway, and one like Dr. Gainous may have entertainment value. The problem is to determine his fair salary. May I recommend consulting the pay schedule at Barnum & Bailey? – Leon Taylor, Seymour, Indiana tayloralmaty@gmail.com

Notes

For useful comments, I thank but do not implicate Mark Kennet.



Friday, April 26, 2024

Keystone clowns

                                                 Indy Eleven stadium: Boon or boondoggle?

The Keystone Group developers are miffed that Indianapolis Mayor Joe Hogsett proposes to offer a 20-acre site near the White River, 80% of which is financed by a special taxing district, to the highest bidder. Keystone wants to use the site for a 20,000-seat stadium for the Indy Eleven, a soccer team, as well as for 600 apartments, nearly 200,000 square feet for shops and restaurants, and so forth. The total cost of developing the site would exceed $1 billion.

It is unusual for the taxpayer to foot 80% of the cost of building a sports stadium. In Nevada, the legislature last year approved $380 million for a proposed $2.2 billion stadium for the Athletics in Las Vegas, and even that relatively modest amount has the teachers up in arms. They’d rather spend the money on schools. The $2 billion Allegiant stadium, where the Las Vegas Raiders football team plays, consumed $750 million of public funding. Not pocket change, but again only a little more than a third of the total cost of building the stadium, which opened in 2020. Eighty percent is out of line. Why should the state subsidize sports fans?

Keystone accused Hogsett of “shopping state legislation championed by Indy Eleven, working behind closed doors to offer publicly-owned real estate and public financing to the highest bidder, with assurances that neither the redevelopment of his riverfront parcel [on the White River] on the continuation of the Indy Eleven would be requirements for city support.”  Keystone was founded by the owner of Indy Eleven, Ersal Ozdemir.  

Why shouldn’t Indianapolis demand the best return on its land?  That’s not likely to be a sport stadium.  For the most part, the city would want the use of the land that would provide the greatest tax revenues net of what the city spends on the site. That use is unlikely to be a stadium for a minor sport, because it won’t attract many out-of-towners, whose spending would increase the city’s income. The fans are most likely to be Indianapolis residents. In that case, there may be little net gain in spending and therefore little net gain in sales tax revenues.  The Indy family may spend $100 more on soccer games, and pay for it by spending $100 less on meals at local restaurants.

In general, there is little evidence that building stadiums benefits a city economically.  A stadium has "an extremely small" effect on the local economy, wrote Andrew Zimbalist and Roger G. Noll of the Brookings Institution in 1997. 

The reason is simple: Economic growth depends mainly on knowledge (“human capital,” if you want  the lingo). When workers know how to produce cars faster, their productivity rises. When programmers figure out how to speed up a robot, their productivity rises, too. Nothing about a sports stadium need increase labor productivity.  Perhaps by providing needed recreation, yes.  But the empirical evidence for that is weak, perhaps because today there are so many forms of recreation already to choose from.

The true cost of the stadium to Indianapolis is not the hundreds of millions of dollars spent on it but the amount of education that the money could have provided. Assume conservatively that Indiana assumes $500 million of the cost of developing the site. In Nevada, a pricey state, the total cost of providing a schoolteacher is $81,000 to $86,000 per year. At that cost, spending $500 million on education rather than the stadium could hire nearly 6,000 teachers for a year. Indeed, the number is likely to be greater than 6,000 for Indiana, because it is much cheaper than Nevada. Teachers here don’t demand such high salaries.

Keystone boasts that building the stadium would create 1,000 construction jobs. Well, building schools creates construction jobs, too.  Hogsett has his priorities straight. --Leon Taylor, Seymour, Indiana  tayloralmaty@gmail.com


Notes

For useful comments, I thank but do not implicate Forest Weld. 


 References

Alexandria Burris.  Keystone accuses Hogsett administration of trying to walk away from Eleven Park deal.  Indianapolis Star.  April 25, 2024. 

Andrew Zimbalist and Roger G. Noll.  Sports, Jobs, & Taxes: Are New Stadiums Worth the Cost?  June 1, 1997.  Sports, Jobs, & Taxes: Are New Stadiums Worth the Cost? | Brookings