Lagged variables

What Is Lagged Variables?

In the field of econometrics, a lagged variable is a past observation of a variable used as a predictor or explanatory factor in a current statistical model. These variables are fundamental to understanding dynamic relationships in time series data, where the present state of a system is influenced by its past states. For instance, today's stock price might be affected by yesterday's price, or current inflation by previous periods' money supply. The inclusion of lagged variables allows researchers to capture the inertia, momentum, or delayed effects inherent in many financial and economic processes, providing a more comprehensive view than models that only consider contemporaneous variables.

History and Origin

The concept of using past values to explain current phenomena has roots in early statistical and economic thought. The formal application of lagged variables became prominent with the development of time series analysis and econometrics in the early to mid-20th century. Pioneers like Udny Yule introduced autoregression models in the 1920s to predict future values based on past observations, notably in his analysis of sunspots.¹⁴ Further significant advancements in the use of lagged variables came with the work of George Box and Gwilym Jenkins in the 1970s, who formalized procedures for building models that heavily rely on lagged terms, such as Autoregressive Moving Average (ARMA) models.¹³ This development provided a structured approach for incorporating temporal dependencies, moving beyond static analyses to capture dynamic behavior.

Key Takeaways

Lagged variables are past observations of a variable used in a current statistical model to account for temporal dependence.
They are crucial in econometrics and forecasting to capture dynamic behavior, delayed effects, and persistence in data.
The inclusion of lagged variables can help mitigate issues like autocorrelation and, under certain conditions, endogeneity.
Determining the optimal number of lags to include is a critical step in model specification, often guided by statistical criteria.
Common applications include macroeconomic forecasting, policy impact analysis, and financial market modeling.

Formula and Calculation

Lagged variables are typically incorporated into regression analysis models. For a simple autoregressive model, where a variable's current value is explained by its own past values, the formula might look like this:

$Y_t = \beta_0 + \beta_1 Y_{t-1} + \beta_2 Y_{t-2} + \dots + \beta_p Y_{t-p} + \epsilon_t$

Where:

( Y_t ) is the dependent variable at time ( t ).
( Y_{t-1}, Y_{t-2}, \dots, Y_{t-p} ) are the lagged values of the dependent variable at periods ( t-1, t-2, \dots, t-p ) respectively.
( \beta_0 ) is the intercept.
( \beta_1, \beta_2, \dots, \beta_p ) are the coefficients representing the impact of each lagged term.
( p ) is the number of lags.
( \epsilon_t ) is the error term at time ( t ).

In models with independent variables, known as distributed lag models, the formula would include lagged values of the explanatory variable(s) as well:

$Y_t = \beta_0 + \alpha_0 X_t + \alpha_1 X_{t-1} + \dots + \alpha_q X_{t-q} + \epsilon_t$

Where:

( X_t, X_{t-1}, \dots, X_{t-q} ) are the current and lagged values of the independent variable ( X ).
( \alpha_0, \alpha_1, \dots, \alpha_q ) are the coefficients for the current and lagged independent variables.
( q ) is the number of lags for the independent variable.

Interpreting the Lagged Variable

Interpreting lagged variables involves understanding the temporal dynamics of a relationship. A significant coefficient on a lagged variable suggests that the past value of that variable has a measurable effect on the current outcome. For example, if a model predicts consumer spending (( Y_t )) based on past income (( X_{t-1} )), a positive coefficient on ( X_{t-1} ) indicates that an increase in income in the previous period leads to higher consumer spending in the current period.

The magnitude of the coefficient indicates the strength of this delayed impact. Analysts also look at the cumulative effect by summing coefficients across multiple lags to understand the total impact over time. This interpretation is crucial for policymaking, as it helps determine how long it takes for interventions, such as changes in monetary policy or fiscal policy, to affect economic outcomes. Understanding the appropriate lag length and the persistence of effects is vital for accurate analysis and prediction.

Hypothetical Example

Consider a hypothetical scenario for a small business that wants to understand how its monthly advertising spending impacts its sales. They collect data over several months and build a simple regression model.

Without lagged variables, a model might look at current advertising spend and current sales. However, the business suspects that advertising efforts might not show an immediate effect, but rather a delayed one. They hypothesize that advertising from the previous month (lag 1) and two months prior (lag 2) also influences current sales.

Let:

( S_t ) = Sales in month ( t ) (in thousands of dollars)
( A_t ) = Advertising spend in month ( t ) (in thousands of dollars)

A model incorporating lagged variables could be:
$S_t = \beta_0 + \beta_1 A_t + \beta_2 A_{t-1} + \beta_3 A_{t-2} + \epsilon_t$

Suppose, after running the regression analysis, the estimated coefficients are:

( \beta_0 = 100 )
( \beta_1 = 0.5 )
( \beta_2 = 0.8 )
( \beta_3 = 0.3 )

If advertising spend for the past three months was:

Month 1: ( A_1 = 10 ) ($10,000)
Month 2: ( A_2 = 12 ) ($12,000)
Month 3: ( A_3 = 15 ) ($15,000)

To predict sales in Month 3 (( S_3 )), the model would use current advertising spend (( A_3 )), and lagged advertising spends (( A_2 ) and ( A_1 )):

$S_3 = 100 + (0.5 \times A_3) + (0.8 \times A_2) + (0.3 \times A_1)$
$S_3 = 100 + (0.5 \times 15) + (0.8 \times 12) + (0.3 \times 10)$
$S_3 = 100 + 7.5 + 9.6 + 3$
$S_3 = 120.1$

This suggests that projected sales in Month 3 are $120,100. The coefficients ( \beta_2 ) (0.8) and ( \beta_3 ) (0.3) highlight the significant delayed impact of past advertising on current sales, indicating that the effects of advertising efforts carry over for at least two months. This insight can help the business optimize its marketing budget and planning. This example demonstrates the practical application of lagged variables in understanding economic relationships.

Practical Applications

Lagged variables are widely applied across various domains in finance and economics due to the inherent time-dependent nature of many processes.

Economic Forecasting: In macroeconomics, lagged variables are indispensable for predicting future economic indicators like Gross Domestic Product (GDP) growth, inflation, or unemployment rates. Models often include past values of these indicators, as well as other relevant factors like interest rates or consumer confidence, to capture trends and cycles. For example, current GDP growth might be modeled as a function of previous quarters' GDP growth and investment.¹²
Policy Analysis: Central banks and governments extensively use lagged variables to assess the impact of their policies. Understanding the delayed effects of monetary policy, such as interest rate adjustments, on inflation and economic activity is critical for effective policy implementation. Economist Milton Friedman famously coined the phrase "long and variable lags" to describe the unpredictable time it takes for monetary policy to affect the macroeconomy.¹¹,¹⁰ Similarly, fiscal policy analysis utilizes lagged variables to evaluate the delayed effects of government spending or tax changes on consumer spending and investment.⁹ The Federal Reserve also frequently discusses these lags when making policy decisions, recognizing that the full effects of their actions may not be observed immediately.⁸
Financial Markets: In financial econometrics, lagged variables are used to analyze asset pricing, market momentum, and volatility. For instance, past stock returns or trading volumes can be incorporated into models to predict future price movements or assess risk.
Risk Management: Credit scoring models and default prediction systems often employ historical default rates as lagged predictors to enhance the accuracy of risk assessments for loans and other financial instruments.⁷
Business Planning: Companies use lagged sales data, advertising spend, or inventory levels to optimize production schedules, marketing strategies, and resource allocation.

Limitations and Criticisms

While lagged variables are powerful tools in regression analysis and time series modeling, their use comes with several limitations and potential pitfalls:

Optimal Lag Length Determination: A significant challenge is determining the correct number of lags to include in a model. Too few lags can lead to underfitting and misspecified models, failing to capture true dynamic relationships. Conversely, including too many lags can result in overfitting, where the model becomes overly complex, captures noise, and performs poorly on new data.⁶ Information criteria like AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) are often used to select the optimal lag length.
Multicollinearity: Lagged variables, especially closely related ones, can be highly correlated with each other. This can introduce multicollinearity into the model, making it difficult to precisely estimate individual coefficients and leading to inflated standard errors.⁵ ¹ ² ³ ⁴