Thursday, August 28, 2014

Out of control

Do Central Asian researchers use outdated methods?

Kazakhstan’s ballyhooed campaign to become one of the 30 most competitive economies by 2050 is mystifying.  “Competitive” is a fashionably vague word.  The government can declare victory whenever it wants, by defining competitiveness in a way that puts the country in the top 30.  In fact, Kazakhstan might be there already if it just used, as its measure of competitiveness, the corruption index of Transparency International.       

Sarcasm aside, the government’s strategic plan is an old-fashioned industrial policy.  The idea is clear: The more factories we have, the more we can produce.  To support new factories, let’s create a market for their output by discouraging imports.  The fact that import substitution was an abject failure in Latin America in the Fifties and Sixties, and that the Asian tigers like Hong Kong, Singapore and South Korea have demonstrated the potential in export-led growth, which Kazakhstan already has, seems lost on Astana.

Most disheartening of all, the government is ignoring a half-century of statistical research suggesting that the most powerful cause of economic growth may be knowledge.  Just building more factories won’t accomplish much if we don’t have workers to run them.  If they already have their hands full, then the new factories will be excessive.  To use them, we must empower our workers to produce more.  That requires know-how.  A strategic plan focusing on human capital would emphasize education and research, particularly research about research itself.

Along these lines, the government could take a long step toward competitiveness (defined as potential economic growth per capita, and measured by the potential value of output per worker) by jettisoning clumsy methods of research.  Exhibit A is the control group.

To be precise

The idea is familiar.  To test a new treatment for cancer, divide up the patients into two identical groups.  One takes the treatment; the other, a placebo.  The treatment passes the test if the share of patients surviving the cancer is larger for the first group than for the second (the control group).  Statistical tests can detect whether the difference in survival rates may well be just a matter of chance.   

The method is simple but flawed.  First, use of a control group may destroy more information than it provides.  In principle, the two groups should be identical; in reality, they vary, even if slightly, in such key traits as age, sex, and record of health.  Pooling the two groups, by permitting all subjects to undergo the experimental treatment, tells us more about how the success of the treatment (perhaps measured as the number of days of survival) varies with age (say), given all other factors.  Enlarging the dataset also avoids such problems as a high probability that the treatment did work although we concluded that it didn’t.  (In statistical parlance, the difficulty is that the “power of the test” is low.)

Yes, comparing the success rates of the two groups tells us a little, but only a little.  In the pooled approach, we could extract the same information from the constant, known as the intercept, in the estimated statistical equation.  The intercept captures the amount of success in the treatment that is not related to the explanatory variables like age and sex.  (For the gory details, see the Notes.) 

Another possibility is to use the equation from the pooled group to determine the sub-samples for which success of the treatment is smallest – say, men over the age of 80.  This would be more informative than just determining whether the treatment is a success in a group with the same average age as the control group.

One argument for the control group might be this:  Patients who are willing to try an experimental treatment are unusually pro-active about recovery, so they are likely to take other steps to get well, some of which may succeed.  If we pool all patients, then we might wind up attributing to the experimental treatment, success that was really due to the other steps.  We could avoid this problem by randomly assigning patients to one group or the other regardless of their enthusiasm for the studied treatment.

The problem with this argument is that pro-active patients are likely to tell the doctor about the other steps that they have taken if they are asked.  With this information, we can introduce explanatory variables into the pooled equation – variables that control for the steps.  If this is not possible, we can still control for the bias due to the self-selection of patients into the experiment; we would use statistical methods that control for the omitted information.  The best-known such model in economics is “heckit,” named after the Nobel-laureate econometrician James Heckman.

The control-group method is becoming obsolete.  Once upon a time, it was valuable because researchers couldn’t gather much data.  If we can’t control for many possible factors influencing cancer, then we might as well just compare the performance of two somewhat similar groups of patients; it’s better than nothing.  Today, however, computers and the Web enable us to analyze millions of observations.  --Leon Taylor, tayloralmaty@gmail.com


Notes

Consider the regression equation

Y(i) = a + b*X1(i) + c*X2(i) + …. + z*Xk(i),

where Y measures the number of days of survival despite cancer for patient i, i = 1, 2,…, nXk(i) is an explanatory variable for patient i, and the intercept is a


Denote X1 as the independent variable identifying patients who took the experimental treatment.  Set X1(i) equal to one if patient i has taken the studied treatment; otherwise, set X1(i) = 0.  For patients who have not taken the treatment, the intercept is simply a.  This is the average number of days of survival for patients who didn’t take the treatment, when we set all explanatory variables equal to zero.  For patients who have taken the treatment, the intercept is a + b, since X1(i) = 1 for all of these patients.  If the treatment is successful, then b should be significantly greater than zero.          

No comments:

Post a Comment