Instrumental variable

What Is Instrumental Variable?

An instrumental variable (IV) is a statistical technique used in econometrics and causal inference to estimate the causal effect of one variable on another when a direct observational study might be biased. This bias often arises from endogeneity, a situation where an explanatory variable in a regression analysis is correlated with the error term. Instrumental variables help to isolate the exogenous variation in the problematic explanatory variable, allowing for a more accurate estimation of its true impact.

History and Origin

The concept of instrumental variables has roots in early 20th-century econometrics, particularly within efforts to identify economic relationships such as supply and demand curves. The method was first explicitly proposed by American economist Philip G. Wright in 1928, appearing in an appendix to his book, The Tariff on Animal and Vegetable Oils. Wright needed to estimate the elasticity of demand for flaxseed but faced the challenge that observed prices and quantities were simultaneously determined by both supply and demand, leading to the identification problem. His work introduced a formal econometric method to address this, using variables that influenced supply but not demand directly as "instruments." The term "instrumental variables" itself was later coined by Olav Reiersøl in 1945.

Key Takeaways

Instrumental variables are a method for estimating causal effects when direct observation is complicated by endogeneity.
They are crucial for addressing issues like omitted variable bias, measurement error, and reverse causality.
A valid instrumental variable must satisfy two key conditions: relevance (correlated with the endogenous variable) and exogeneity (uncorrelated with the error term and affecting the outcome only through the endogenous variable).
The technique is widely applied in fields like economics, public health, and social sciences for policy evaluation.

Formula and Calculation

The most common method for estimating instrumental variable models is Two-Stage Least Squares (2SLS). Consider a scenario where you want to estimate the causal effect of an endogenous variable (D) on an outcome (Y), but (D) is correlated with the error term (\epsilon). You have a set of exogenous instruments (Z).

The two stages are:

First Stage: Regress the endogenous variable (D) on the instrumental variables (Z) and any other exogenous covariates (X). This stage estimates the part of (D) that is explained by the instruments and is uncorrelated with the error term.

D_i = \gamma_0 + \gamma_1 Z_i + \gamma_2 X_i + u_i

Here, (D_i) is the endogenous variable for individual (i), (Z_i) is the instrumental variable(s), (X_i) are other exogenous controls, and (u_i) is the error term for the first stage. From this regression, we obtain the predicted values of (D), denoted as (\hat{D}_i).

Second Stage: Regress the outcome variable (Y) on the predicted values of the endogenous variable (\hat{D}_i) and the exogenous covariates (X).

Y_i = \beta_0 + \beta_1 \hat{D}_i + \beta_2 X_i + v_i

Here, (Y_i) is the outcome variable, (\hat{D}_i) is the predicted value from the first stage, (X_i) are the exogenous controls, and (v_i) is the error term for the second stage. The coefficient (\beta_1) is the instrumental variable estimator of the causal effect of (D) on (Y).

Interpreting the Instrumental Variable

Interpreting the results from an instrumental variable analysis requires careful consideration of the assumptions. The estimated coefficient (\beta_1) from the second stage represents the causal effect of the endogenous variable on the outcome, purged of the endogeneity bias. For this interpretation to hold, the chosen instrumental variable(s) must be both relevant and exogenous.

Relevance: The instrument must be sufficiently correlated with the endogenous variable. A weak correlation means the instrument does not provide enough variation to accurately identify the causal effect, leading to unreliable estimates.
Exogeneity: The instrument must not directly influence the outcome variable except through the endogenous variable, nor should it be correlated with unobserved factors affecting the outcome. This is a critical and often untestable assumption that relies heavily on theoretical justification and subject-matter knowledge.

If these conditions are met, the instrumental variable estimate provides a robust measure of the causal impact, offering a significant advantage over simple ordinary least squares (OLS) regression when endogeneity is present.

Hypothetical Example

Imagine a researcher wants to determine the causal effect of attending an intensive financial literacy program on an individual's savings rate. A simple regression of savings rate on program attendance might suffer from endogeneity because individuals who choose to attend the program might already be more financially savvy or motivated, leading to an omitted variable bias.

To address this, an instrumental variable could be used. Suppose the program randomly offered free slots to a subset of eligible individuals via a lottery system. Being offered a free slot (regardless of whether they attended) could serve as an instrumental variable:

Relevance: Being offered a free slot is likely to increase the probability of attending the financial literacy program.
Exogeneity: The random offer of a free slot, by itself, should not directly influence an individual's savings rate other than through its effect on program attendance. It is uncorrelated with unobserved factors like inherent financial discipline.

First Stage: Regress "Program Attendance" (endogenous variable) on "Offer of Free Slot" (instrumental variable) and other demographic controls. This yields the predicted probability of attendance based purely on the random offer.

Second Stage: Regress "Savings Rate" (outcome variable) on the predicted probability of attendance from the first stage. The coefficient on this predicted probability would then estimate the causal effect of attending the financial literacy program on savings rate, free from the bias of self-selection.

Practical Applications

Instrumental variables are widely applied across various fields to establish causal relationships in observational data where controlled experiments are not feasible.

Policy Evaluation: Governments and research institutions use instrumental variables to assess the true impact of policies and interventions. For instance, evaluating the effect of education policies on student outcomes or health reforms on public health indicators often uses instrumental variables to account for confounding factors. The United Nations Development Programme (UNDP) highlights IVs as a quasi-experimental impact evaluation method when randomized controlled trials are not possible, helping to determine the net change caused by an intervention and how much of observed outcomes can be attributed to it.
⁷* Economics: Beyond demand and supply estimation, IVs are used to study the effect of schooling on earnings, the impact of minimum wage on employment, or the causal link between financial regulations and market stability.
Public Health: Researchers use instrumental variables to estimate the impact of medical treatments or health behaviors (e.g., smoking) on health outcomes, controlling for unobserved patient characteristics or self-selection bias.
Social Sciences: In sociology and political science, instrumental variables can help determine the causal effects of social programs, voting laws, or community interventions on various social indicators. An online guide from BetterEvaluation, a global community for evaluation, provides further examples of how instrumental variables can be applied to evaluate outcomes in areas such as high school employment and extracurricular activities.
⁶

Limitations and Criticisms

While powerful, instrumental variable analysis is not without its limitations and criticisms.

One of the most significant challenges is the "weak instruments" problem. This occurs when the instrumental variables are only weakly correlated with the endogenous explanatory variable. When instruments are weak, the instrumental variable estimator can be severely biased towards the ordinary least squares estimate, leading to unreliable results and inflated standard errors. ⁵This means that statistical significance and confidence intervals can be misleading, potentially leading to incorrect conclusions in hypothesis testing. ⁴Researchers advocate for robust tests and a higher standard of instrument strength to mitigate these issues, as conventional thresholds for instrument strength may still lead to problematic results.
², ³
Another criticism revolves around the untestable nature of the exogeneity assumption. While relevance can often be statistically tested (e.g., via a first-stage F-statistic), proving that an instrument is truly uncorrelated with the error term and affects the outcome only through the endogenous variable is often a theoretical argument rather than an empirical one. ¹If this core assumption is violated, the instrumental variable estimates will be biased and inconsistent, undermining the validity of the causal inference. The selection of appropriate instruments thus requires deep subject-matter knowledge and careful consideration of potential alternative pathways or confounding factors that might invalidate the chosen instrument.

Instrumental Variable vs. Ordinary Least Squares

The fundamental difference between an instrumental variable (IV) approach and ordinary least squares (OLS) lies in how they handle endogeneity. OLS regression provides unbiased and consistent estimates when the explanatory variables are uncorrelated with the error term (i.e., they are exogenous). However, in many real-world scenarios, particularly in social sciences and economics, this assumption is violated due to issues like omitted variable bias, reverse causality, or measurement error.

When endogeneity is present, OLS estimates are biased and inconsistent, meaning they will not converge to the true causal effect even with a very large sample size. Instrumental variables, conversely, are specifically designed to address this problem. By using an instrument that is correlated with the endogenous variable but uncorrelated with the error term, IV methods can "cleanse" the endogenous variable of its problematic correlation with the error term, allowing for consistent estimation of the true causal effect. The trade-off is that IV estimates can be less precise (have larger standard errors) than OLS estimates, especially with weak instruments, and the validity of the IV estimates critically depends on the untestable exogeneity assumption of the instrument. The choice between IV and OLS thus depends on the presence and severity of endogeneity in the model specification.

FAQs

What are the two main conditions for a valid instrumental variable?

A valid instrumental variable must satisfy two conditions: relevance and exogeneity. Relevance means the instrument must be correlated with the endogenous explanatory variable. Exogeneity means the instrument must not be correlated with the error term in the outcome equation, and it should not affect the outcome variable through any channel other than the endogenous variable.

Why is instrumental variable analysis used?

Instrumental variable analysis is used to estimate the causal effect of a variable on an outcome when direct ordinary least squares regression would produce biased results due to [endogeneity]. This endogeneity can arise from factors like unmeasured confounding variables, simultaneity (where variables influence each other), or errors in measuring variables.

Can an instrumental variable be chosen arbitrarily?

No, an instrumental variable cannot be chosen arbitrarily. Its validity depends critically on satisfying the relevance and exogeneity conditions, especially the latter. Choosing a poor or invalid instrument can lead to more biased results than if ordinary least squares were used directly. Researchers often spend considerable effort justifying their choice of instruments based on economic theory, institutional knowledge, or natural experiments.

What is the "weak instruments" problem?

The "weak instruments" problem occurs when the instrumental variable is only slightly correlated with the endogenous variable it's meant to instrument for. This weak correlation can lead to instrumental variable estimates that are biased, inconsistent, and have very large confidence intervals, making it difficult to draw reliable [statistical significance] conclusions.