Under identified model

What Is an Underidentified Model?

An underidentified model is a statistical or econometric model in which there is insufficient information to uniquely determine the values of all its unknown parameters. This critical issue, central to the field of Econometrics and Statistical Inference, arises when the number of unknown parameters to be estimated exceeds the number of independent pieces of information available from the data. Consequently, multiple sets of parameter values could explain the observed data equally well, making accurate Parameter Estimation impossible.

When an underidentified model is encountered, it implies that the relationships specified within the model are not distinct enough to isolate the effect of each individual variable or coefficient. This typically occurs in complex setups, such as a System of Equations, where variables are interdependent, and the data alone cannot untangle these interconnected relationships. Without unique parameter estimates, the model cannot be used reliably for prediction, policy analysis, or understanding causal links.

History and Origin

The concept of identification, including the problem of underidentification, became a focal point in econometrics during the mid-20th century, particularly with the rise of Simultaneous Equations Models. Pioneering work by economists and statisticians at the Cowles Commission for Research in Economics, starting in the 1930s and 1940s, significantly advanced the understanding of these challenges. The "identification problem" was a core concern of the Cowles Commission, which sought to establish rigorous methods for estimating complex economic models.⁶

Key figures like Trygve Haavelmo, a Nobel laureate, laid theoretical foundations for how structural economic models could be identified from observed data. His seminal work addressed the conditions under which the parameters of a System of Equations could be uniquely determined, distinguishing between statistical relationships and underlying causal structures. This intellectual movement aimed to move beyond mere correlations in data to robust causal inference, necessitating a clear understanding of when a model's parameters were, in fact, identifiable. The work at institutions like the Cowles Commission was instrumental in formalizing the criteria for identifying parameters in linear systems.⁵

Key Takeaways

An underidentified model lacks enough independent information to uniquely estimate all its parameters.
This problem typically arises in Simultaneous Equations Models, making it impossible to determine a single correct set of coefficients.
Underidentification prevents reliable Parameter Estimation and compromises the model's utility for prediction or policy analysis.
The concept is foundational in Econometrics and Statistical Inference, with its understanding formalized by pioneers like those at the Cowles Commission.
Addressing an underidentified model often requires incorporating additional, valid information or imposing credible theoretical restrictions.

Formula and Calculation

The identification status of a model, including whether it is an underidentified model, is typically assessed using "order conditions" and "rank conditions." These conditions apply particularly to Simultaneous Equations Models.

Order Condition:
For a specific equation within a System of Equations to be identified (meaning its parameters can be uniquely estimated), the number of Exogenous Variables excluded from that particular equation but included in the other equations of the system must be greater than or equal to the number of Endogenous Variables included in that specific equation minus one.

Let:

(M) = Total number of exogenous variables in the entire system.
(k) = Number of endogenous variables in the specific equation.
(m) = Number of exogenous variables included in the specific equation.

The number of excluded exogenous variables is (M - m).
The order condition states that for an equation to be identified:
[
M - m \ge k - 1
]
If (M - m < k - 1), the equation (and thus the model if this applies to a key equation) is an underidentified model. This is a necessary, but not sufficient, condition for identification.

Rank Condition:
The rank condition is a more stringent and sufficient condition. For an equation to be identified, it requires that the rank of a specific matrix (composed of coefficients of excluded variables from other equations) is equal to the number of Endogenous Variables in that equation minus one. If this condition is not met, the model is an underidentified model. The rank condition ensures that there is enough independent variation among the excluded exogenous variables to uniquely determine the structural parameters of interest.

The order condition provides a quick preliminary check, while the rank condition offers definitive proof of identification.

Interpreting the Underidentified Model

An underidentified model indicates a fundamental structural ambiguity. In essence, the data you have collected do not contain enough unique variation to distinguish between different possible explanations of the underlying economic or statistical process. Imagine trying to solve a puzzle where multiple pieces fit into the same spot, making it impossible to complete the picture uniquely. This is the essence of an underidentified model.

When a model is underidentified, any attempt at Parameter Estimation will yield estimates that are not unique. Different estimation algorithms or even slight variations in initial conditions might produce different sets of coefficients, all of which fit the observed data equally well. This means the model's coefficients cannot be reliably interpreted as reflecting true economic or statistical relationships. For instance, in a supply and demand model, if it's underidentified, you might not be able to discern whether an observed change in price and quantity is due to a shift in supply, demand, or a combination that cannot be uniquely separated.⁴

A common indicator in statistical software that a model is an underidentified model is the reporting of infinite standard errors for some parameters, or a failure to converge during estimation. This is because the optimization routine cannot find a unique minimum for the objective function. Researchers must resolve underidentification before drawing any meaningful conclusions or making policy recommendations based on the model. Understanding the concept of Degrees of Freedom is also crucial here; an underidentified model effectively has negative or insufficient degrees of freedom relative to the number of parameters.

Hypothetical Example

Consider a simple macroeconomic model with two equations aiming to explain the relationship between consumption (C) and income (Y).

Equation 1 (Consumption Function):
[
C_t = \alpha_0 + \alpha_1 Y_t + \epsilon_{1t}
]
Equation 2 (Income Identity):
[
Y_t = C_t + I_t
]

Here, (C_t) (consumption) and (Y_t) (income) are Endogenous Variables, meaning they are determined within the system. (I_t) (investment) is an Exogenous Variables, determined outside the system. (\alpha_0) and (\alpha_1) are the parameters to be estimated.

Let's check the identification of the Consumption Function (Equation 1) using the order condition:

Total exogenous variables in the system ((M)) = 1 (only (I_t)).
Endogenous variables in Equation 1 ((k)) = 2 ((C_t) and (Y_t)).
Exogenous variables included in Equation 1 ((m)) = 0 (no exogenous variables explicitly appear in the consumption function, only endogenous (Y_t)).

According to the order condition: (M - m \ge k - 1)
Substituting the values: (1 - 0 \ge 2 - 1)
[
1 \ge 1
]
This condition is met, so the Consumption Function might be identified.

Now let's check the identification of the Income Identity (Equation 2):

Total exogenous variables in the system ((M)) = 1 (only (I_t)).
Endogenous variables in Equation 2 ((k)) = 2 ((Y_t) and (C_t)).
Exogenous variables included in Equation 2 ((m)) = 1 (only (I_t)).

According to the order condition: (M - m \ge k - 1)
Substituting the values: (1 - 1 \ge 2 - 1)
[
0 \ge 1
]
This condition is not met. Since (0 < 1), the Income Identity (Equation 2) is an underidentified model. Even if one equation in a system is identified, if another crucial equation is underidentified, the entire system's parameters may not be uniquely solvable without further restrictions or information. This means we cannot uniquely determine the coefficients relating to how income is formed from consumption and investment within this simplified structure. This type of problem often necessitates advanced techniques like Instrumental Variables if suitable instruments can be found.

Practical Applications

The concept of an underidentified model is critical across various quantitative fields, particularly in areas involving complex interdependencies such as Financial Modeling, Structural Equation Modeling (SEM), and advanced Regression Analysis.

In practice, an underidentified model can manifest in several ways:

Econometric Policy Analysis: When economists build large-scale macroeconomic models to forecast economic indicators or evaluate the impact of policy changes (e.g., tax cuts, interest rate adjustments), ensuring that all equations are properly identified is paramount. An underidentified model in this context would render policy simulations unreliable, as the estimated impacts would not be uniquely attributable to specific policy levers.
Market Microstructure Models: Models attempting to explain bid-ask spreads, order flow, and price discovery in financial markets often involve simultaneous interactions between different market participants. If the theoretical restrictions imposed on these interactions are insufficient, the model could be an underidentified model, preventing accurate measurement of, for example, the impact of liquidity on volatility.
Causal Inference in Social Sciences: Researchers aiming to establish causal links between variables (e.g., education and income, advertising and sales) using observational data often rely on Structural Equation Modeling. If the paths and relationships are not sufficiently constrained by theory or external information, the model can become underidentified, making it impossible to isolate the true causal effects.
Empirical Research: Empirical underidentification can occur in data analysis, even if the theoretical model appears identified. This can happen if certain variables are perfectly correlated in the collected sample, or if the sample provides insufficient variation for unique estimation.³

Recognizing an underidentified model is the first step toward correcting it, which might involve collecting more data, imposing additional theoretical constraints, or employing alternative estimation methods.

Limitations and Criticisms

Despite its theoretical importance, the concept of an underidentified model, and identification in general, faces several limitations and criticisms in practical application:

Reliance on Theoretical Assumptions: Identifying a model, especially avoiding an underidentified model, heavily relies on strong theoretical assumptions about the underlying data generating process. If these assumptions are incorrect (e.g., misidentifying Exogenous Variables or incorrectly specifying functional forms), a seemingly identified model might still yield biased or inconsistent estimates. Critics argue that real-world economic phenomena are rarely as neatly structured as theoretical models suggest.
Empirical vs. Theoretical Identification: A model might be theoretically identified (satisfying the order and rank conditions), but still be "empirically underidentified" if the specific data set lacks sufficient variation or exhibits high multicollinearity. In such cases, while the parameters are technically unique, they cannot be precisely estimated from the given data, leading to large standard errors and unreliable inference.²
Complexity and Intractability: For very large and complex System of Equations, formally checking the rank condition can become computationally intensive or even intractable. Researchers might rely solely on the necessary (but not sufficient) order condition, which can lead to false confidence in a model's identification status.
Model Misspecification: The identification problem is closely linked to model misspecification. An underidentified model might be a symptom of a fundamentally flawed model structure, where the chosen variables and relationships simply cannot explain the observed phenomena in a unique way. Addressing identification alone without revisiting the underlying theory can be an incomplete solution. The assumption of a "true data generating process" is itself a conceptual challenge in econometric model selection.¹
Over-identification Bias: While the opposite of an underidentified model, an Overidentified Model (where there is more information than necessary) can also pose challenges. If the "extra" information is based on invalid assumptions, it can lead to inconsistent estimates, though tests for overidentifying restrictions exist.

In essence, while essential for model validity, ensuring identification is not a panacea for all modeling challenges and requires careful consideration of both theoretical grounding and data characteristics.

Underidentified Model vs. Non-Identifiable Model

The terms "underidentified model" and "non-identifiable model" are often used interchangeably in econometrics and statistics, and for most practical purposes, they refer to the same fundamental problem: the inability to uniquely estimate all model parameters from the available data. However, a subtle distinction can sometimes be made in academic discussions:

Underidentified Model: This term specifically refers to the situation where a model, typically a Simultaneous Equations Model, fails to meet the necessary (order) and/or sufficient (rank) conditions for unique parameter Parameter Estimation. It implies that there are multiple sets of parameter values that are observationally equivalent, meaning they generate the same observed data. The problem arises from the structure of the equations and the available Exogenous Variables or restrictions.
Non-Identifiable Model: This is a broader term that encompasses underidentification. A model is non-identifiable if its parameters cannot be uniquely determined from the population distribution of the observed variables. This can occur due to underidentification, but also due to other issues, such as inherent redundancies in the model's parametrization or conceptual ambiguities. For example, a model might include two perfectly collinear predictor variables, making their individual effects non-identifiable, even if the model structure itself isn't a simultaneous equation system.

In practice, when a statistician or econometrician says a model is "underidentified," they are typically pointing to the specific issue of insufficient information within a System of Equations that leads to non-uniqueness. When they say "non-identifiable," they might be referring to this, or a more general case where parameters simply cannot be disentangled for any reason. For the purposes of Financial Modeling and Regression Analysis, these terms largely describe the same core problem of parameter indeterminacy.

FAQs

What causes an underidentified model?

An underidentified model is caused by having too many unknown parameters relative to the independent pieces of information available from your data. In Simultaneous Equations Models, this often means that not enough Exogenous Variables are excluded from each equation to provide the necessary unique variation for Parameter Estimation.

How can you detect an underidentified model?

In practice, an underidentified model is often detected during the estimation process. Statistical software may fail to converge, produce extremely large standard errors for some coefficients, or issue warnings about identification issues. Examining the model's structure against the order and rank conditions (if applicable) can also reveal underidentification before estimation.

What are the consequences of using an underidentified model?

Using an underidentified model means that the estimated parameters are not unique and therefore unreliable. You cannot trust the coefficients to represent the true relationships, making the model useless for forecasting, hypothesis testing, or policy analysis. Any conclusions drawn would be arbitrary and dependent on the specific algorithm or random starting points used in estimation.

How can an underidentified model be fixed?

To fix an underidentified model, you typically need to add more information or impose credible restrictions. This might involve:

Adding more Exogenous Variables: Introduce new variables that influence some endogenous variables but not others, thereby helping to distinguish relationships.
Imposing theoretical restrictions: Fix certain parameters to a known value (e.g., zero if a variable is theoretically believed to have no effect) or impose relationships between parameters based on economic theory.
Collecting more data: While more data doesn't fix a fundamentally underidentified structure, it can help with "empirical underidentification" by providing more variation.
Respecifying the model: Sometimes, the entire structure of the model needs to be re-evaluated and simplified, or a different modeling approach (e.g., Regression Analysis instead of a System of Equations) might be necessary.