Paul Christian is an economist in the Development Impact Evaluation Unit of the World Bank Development Research Group and Chris Barrett is the Stephen B. & Janice G. Ashley Professor of Applied Economics and Management and an International Professor of Agriculture in the Charles H. Dyson School of Applied Economics and Management, a Professor of Economics in the Department of Economics, and a Fellow in the David R. Atkinson Center for a Sustainable Future at Cornell University.
In a recent blog post on this site, we wrote about the evidence for a connection between food aid and conflict. The methods described in that post refer to the recent attention in the applied econometrics literature given to instrumental variables approaches sometimes called Bartik or shift-share instruments. The food aid debate is an important context in which to consider the implications of our findings for this policy question and for applied econometrics in general.
The entry point to using instrumental variables is usually a research or policy question where it’s impossible or unethical to do a randomized control trial. It would be deeply unethical to randomize food aid. Instrumental variables can be a way to get something like experimental results without the experiment. Unfortunately, the “perfect” instruments often don’t vary as much as we like. In the food aid example, wheat production in the US could be plausibly related to the quantity of aid shipped per year if food aid volumes fluctuate with domestic production surpluses, and if production is driven by weather in the US, it might be random enough to be uncorrelated with conflict except through food aid, especially if one controls (directly or indirectly) for global market prices that might be affected by US agricultural output levels and might affect conflict propensity. But the problem is that US wheat production is only realized once per year and does not vary among potential recipient countries.
Examples of instruments with limited variation are common for important policy questions. Economic shocks in migrant-sending countries might affect migration and not be otherwise correlated with wages in migrant receiving countries, but they might not happen frequently enough to justify treating the data covering these events as large samples. The timing of an industrial plant opening might be random relative to household health and correlated with pollution, but unless lots of plants open or close, everyone is either affected or not at the same time. Limited variation in the instrument creates more room for omitted variables or spurious correlation.
Several recent working papers shed light on the problem. Goldsmith-Pinkham et al (2018) helpfully clarify the relationships underlying models where the effect of economic shocks are measured by comparing groups who are more or less exposed to the shocks. They show that these results are clean of omitted variables bias only if one is willing to assume that the factors that determine exposure to the shock are unrelated to the outcome variable. In technical terms, interacting the instrument with an exposure variable only buys power, not identification.
Jaeger et al (2018) show that another problem for these models is that for many variables, the responses to shocks persist over months or years, so outcomes are correlated over time. Treating wheat production and conflict (or, in their case, migration and wages) as a new random shock this year compared to last year will lead to bias, because what happened last year is still showing up today. They show that in the migration literature they study, differencing the outcome variable over time can help. Unfortunately, first differencing the conflict variable in the food aid case does not seem to work well. When the trends are non-linear, the differencing correction doesn’t eliminate the bias we worry about.
In our paper about the food aid-conflict case, we worry that non-linearly trending variables increase the risk of spurious correlation. We showed that any variable with a similar trend to conflict would “work” as an instrument even though the interaction allows both the reduced form and first stage regressions to include time fixed effects. It turns out that if you instrument for food aid with the global sale of audio cassette tapes, both stages of the IV procedure “work.” Audio cassette sales predict aid flows to regular aid recipients, and conflict in frequent recipients of food aid is correlated with audio cassette sales, even after controlling for country and year fixed effects. Since audio cassette tape sales are clearly not causally associated with either aid or conflict, this example shows how year fixed effects don’t solve the spurious trend problem that creates the identification concern. The problem is that the time fixed effects only control for the average conflict experience across all types of countries, and countries are clearly heterogenous in their risk of conflict along a dimension that is correlated with the amount of aid they receive.
We underscore this risk of mistaken inference with a simple Monte Carlo exercise in which we can replicate the NQ results even if US food aid agencies prefer to send food aid to conflict-affected countries–as is the stated policy of the US food aid program (but opposite to how NQ explain the sign shift between their OLS and 2SLS estimates) – and food aid has no causal effect on–or even prevents (not prolongs) – civil conflict in recipient countries.
Another working paper by Jaeger, Joyce, and Kaestner makes a similar diagnosis of the problem of identification from trends in the context of an earlier paper claiming to show that an MTV reality show changed pregnancy rates. They show that the only instrument available is the start date of the show. Interacting with the ratings share of MTV before the show airs introduces bias from demographic trends. Because there is a clear pre-treatment period, they are able to show that these trends would identify a treatment effect even before the show aired. This is a useful placebo test in cases where an un-treated period is available, but in cases like the food aid one, this is not always the case.
The risk of spurious correlation may sound hypothetical, because readers and referees can use their judgement about whether correlations are absurd. Wheat production is plausibly related to wheat aid, while cassette sales are not. But the risk comes from natural experiments that seem just plausible enough. While researching variables that could be spuriously correlated with conflict, we searched for variables that have an inverse-U shape. On the World Development Indicators page, the real interest rate for lending shows the same pattern in time series. This led us to believe that one could easily construct an argument that conflict was associated GDP growth by instrumenting GDP growth with real interest rates. We knew it would “work,” because the IV strategy would pick the coinciding trends in both the conflict and interest rate variables. Amazingly, we discovered that this fact we reverse engineered from knowing the risks in the strategy was already a published result. In a 2013 Journal of Development Economics paper, real interest rates were used to instrument for GDP growth in a regression of conflict on growth. When real interest rates are interacted with ethnic fractionalization, the association becomes stronger, and the usual diagnostic tests for IV pass.
We raise this story not because we believe authors are up to something nefarious in cooking up IV results from the spurious correlation as we did to demonstrate the problem. But it raises an alarming possibility nonetheless. Researchers interested in drivers of conflict will try a lot of instruments that seem plausible. The ones that do not show an association will be abandoned and never reported, much less published. But the published results will include two types of variables: those that are truly good experiments and those that have a U- or inverse-U shaped trend over time. The usual diagnostic tests do not help to distinguish which is which.
The takeaway from all of these papers is that finding variation that can be used to establish causality is likely even harder than we thought. That does not mean we should not use instrumental variables. But we should treat them with the same skepticism we treat OLS regressions. If year fixed effects don’t eliminate endogeneity in an OLS regression, one needs a clear model for why they do solve the problem in the IV regression.
Paul and Chris,
You guys write
“The timing of an industrial plant opening might be random relative to household health and correlated with pollution, but unless lots of plants open or close, everyone is either affected or not at the same time. Limited variation in the instrument creates more room for omitted variables or spurious correlation.”
Isn’t that more of a problem of instrument strength, given that the IV affects unit in a highly clustered way?
Marc, your comment is on point. The limited strength of the time varying – but almost surely exogenous – instrument is precisely what motivates interacting it with the cross-sectional variable in the interacted instrument that defines the Bartik/shift-share estimator. But endogeneity can inadvertently creep back into the weak instrument tests too, boosting the apparent strength of the spurious instrument.