Do
Central Asian researchers use outdated methods?
Sarcasm aside, the government’s strategic
plan is an old-fashioned industrial policy.
The idea is clear: The more factories we have, the more we can
produce. To support new factories, let’s
create a market for their output by discouraging imports. The fact that import substitution was an
abject failure in Latin America in the Fifties and Sixties, and that the Asian
tigers like Hong Kong , Singapore and South
Korea have demonstrated the potential in export-led
growth, which Kazakhstan
already has, seems lost on Astana.
Most disheartening of all, the government
is ignoring a half-century of statistical research suggesting that the most
powerful cause of economic growth may be knowledge. Just building more factories won’t accomplish
much if we don’t have workers to run them.
If they already have their hands full, then the new factories will be excessive. To use them, we must empower our workers to
produce more. That requires
know-how. A strategic plan focusing on
human capital would emphasize education and research, particularly research
about research itself.
Along these lines, the government could
take a long step toward competitiveness (defined as potential economic growth
per capita, and measured by the potential value of output per worker) by
jettisoning clumsy methods of research.
Exhibit A is the control group.
To be
precise
The idea is familiar. To test a new treatment for cancer, divide up
the patients into two identical groups.
One takes the treatment; the other, a placebo. The treatment passes the test if the share of
patients surviving the cancer is larger for the first group than for the second
(the control group). Statistical tests can
detect whether the difference in survival rates may well be just a matter of
chance.
The method is simple but flawed. First, use of a control group may destroy
more information than it provides. In
principle, the two groups should be identical; in reality, they vary, even if
slightly, in such key traits as age, sex, and record of health. Pooling the two groups, by permitting all
subjects to undergo the experimental treatment, tells us more about how the
success of the treatment (perhaps measured as the number of days of survival)
varies with age (say), given all other factors.
Enlarging the dataset also avoids such problems as a high probability
that the treatment did work although we concluded that it didn’t. (In statistical parlance, the difficulty is
that the “power of the test” is low.)
Yes, comparing the success rates of the two
groups tells us a little, but only a little.
In the pooled approach, we could extract the same information from the constant,
known as the intercept, in the estimated statistical equation. The intercept captures the amount of success
in the treatment that is not related to the explanatory variables like age and
sex. (For the gory details, see the
Notes.)
Another possibility is to use the equation
from the pooled group to determine the sub-samples for which success of the
treatment is smallest – say, men over the age of 80. This would be more informative than just
determining whether the treatment is a success in a group with the same average
age as the control group.
One argument for the control group might be
this: Patients who are willing to try an
experimental treatment are unusually pro-active about recovery, so they are
likely to take other steps to get well, some of which may succeed. If we pool all patients, then we might wind
up attributing to the experimental treatment, success that was really due to
the other steps. We could avoid this
problem by randomly assigning patients to one group or the other regardless of
their enthusiasm for the studied treatment.
The problem with this argument is that
pro-active patients are likely to tell the doctor about the other steps that
they have taken if they are asked. With
this information, we can introduce explanatory variables into the pooled equation
– variables that control for the steps. If
this is not possible, we can still control for the bias due to the
self-selection of patients into the experiment; we would use statistical
methods that control for the omitted information. The best-known such model in economics is
“heckit,” named after the Nobel-laureate econometrician James Heckman.
The control-group method is becoming
obsolete. Once upon a time, it was
valuable because researchers couldn’t gather much data. If we can’t control for many possible factors
influencing cancer, then we might as well just compare the performance of two
somewhat similar groups of patients; it’s better than nothing. Today, however, computers and the Web enable
us to analyze millions of observations. --Leon Taylor, tayloralmaty@gmail.com
Notes
Consider the regression equation
Y(i) = a + b*X1(i) + c*X2(i) + …. +
z*Xk(i),
where Y
measures the number of days of survival despite cancer for patient i, i = 1, 2,…, n. Xk(i) is an explanatory variable for patient i, and the intercept is a.
Denote X1 as the independent variable
identifying patients who took the experimental treatment. Set X1(i)
equal to one if patient i has taken
the studied treatment; otherwise, set X1(i)
= 0. For patients who have not taken the
treatment, the intercept is simply a. This is the average number of days of
survival for patients who didn’t take the treatment, when we set all
explanatory variables equal to zero. For
patients who have taken the treatment, the intercept is a + b, since X1(i) = 1
for all of these patients. If the
treatment is successful, then b
should be significantly greater than zero.
No comments:
Post a Comment