Akaike information criterion

What Is Akaike Information Criterion?

The Akaike information criterion (AIC) is a statistical measure used to evaluate the relative quality of statistical models for a given set of data, falling under the broader category of statistical modeling. It provides a means for model selection by estimating the prediction error and the relative information lost by a model. AIC balances the trade-off between the goodness of fit of a model and its complexity, helping to prevent overfitting and underfitting. A model that perfectly fits observed data with excessive complexity may not generalize well to new data; the Akaike information criterion helps identify a parsimonious model that maintains strong explanatory power.³⁹

History and Origin

The Akaike information criterion was developed by Japanese statistician Hirotugu Akaike in the early 1970s. It was first informally introduced in English at a 1971 symposium, with its formal publication following in a 1974 paper. Dr. Akaike's pioneering work extended the maximum likelihood principle by combining the processes of parameter estimation with the determination of model structure and dimension.³⁸ This criterion established a new paradigm, bridging the gap between raw data and statistical modeling, and significantly contributed to the fields of information theory and statistics.³⁷ For his groundbreaking contributions, including the development of AIC, Hirotugu Akaike was recognized with the Kyoto Prize in 2006.³⁶(https://www.kyotoprize.org/en/laureates/hirotugu-akaike/)

Key Takeaways

The Akaike information criterion (AIC) is a tool for selecting the best statistical model from a set of candidates for a given dataset.
It quantifies the trade-off between a model's goodness of fit and its complexity, penalizing models with more parameters to mitigate overfitting.³⁵
A lower AIC value indicates a better-quality model that loses less information, suggesting it is more suitable for explaining the data.³⁴
AIC is relative; it does not provide an absolute measure of model quality but rather ranks models against each other.³³
It is particularly useful when comparing models that are not nested, meaning one model is not a simpler version of another.³²

Formula and Calculation

The formula for the Akaike information criterion is expressed as:

AIC = -2 \ln(L) + 2k

Where:

(\ln(L)) is the natural logarithm of the maximized likelihood function for the model. This term reflects how well the model fits the data.
(k) is the number of estimated parameters in the model. This term represents the complexity of the model.

In simpler regression models, an alternative form of the AIC formula exists:

AIC = n \ln\left(\frac{SSE}{n}\right) + 2(k+1)

Where:

(n) is the sample size.
(SSE) is the sum of squared errors, a measure of the discrepancy between the observed data and the model's predictions.
(k) is the number of predictor terms in the model, so (k+1) is the total number of parameters including the intercept.³¹

Interpreting the Akaike Information Criterion

When using the Akaike information criterion for model selection, the model with the lowest AIC value is generally preferred among a set of candidate models. This is because a lower AIC signifies a better balance between how well the model explains the observed data (represented by the likelihood) and its complexity (penalized by the number of parameters). The AIC estimates the relative amount of information a model loses when representing the data-generating process. Therefore, a model with a lower AIC is considered to have lost less information and is of higher quality. However, it is important to note that AIC provides a relative ranking, not an absolute measure of a model's "goodness."³⁰ Differences in AIC values should be interpreted within context, guiding the selection of the most suitable predictive modeling approach for the specific data analysis task.²⁹

Hypothetical Example

Imagine a financial analyst at "InvestCorp" is tasked with predicting the quarterly earnings per share (EPS) for a company. The analyst has three different statistical models built using historical financial data:

Model A: A simple linear regression analysis using only the company's past revenue as a predictor.
Model B: A multiple linear regression model incorporating past revenue, industry growth rate, and a market sentiment index.
Model C: A more complex time series analysis model that includes lagged values of EPS, revenue, and an autoregressive component, resulting in many parameters.

After fitting each model to the same dataset, the analyst calculates the AIC for each:

Model A: AIC = 150
Model B: AIC = 120
Model C: AIC = 135

Based on these AIC values, Model B (AIC = 120) would be chosen as the best approximating model. While Model C is more complex, its higher AIC suggests that the added complexity does not sufficiently improve its fit to justify the increase in parameters, potentially leading to overfitting. Model A, despite its simplicity, has a much higher AIC, indicating it likely underfits the data.

Practical Applications

The Akaike information criterion finds widespread application in various quantitative fields, including finance and economics, where model selection is crucial. In financial modeling, AIC can be used to compare different forecasting models for stock prices, commodity trends, or macroeconomic indicators.²⁸ For instance, an analyst might use AIC to select the optimal lag order in autoregressive integrated moving average (ARIMA) models when predicting stock returns, balancing accuracy with model simplicity.²⁷

Beyond forecasting, AIC is employed in risk management to select models that best explain market volatility or credit risk. In algorithmic trading, it can help in choosing the most efficient model for generating trading signals, minimizing the risk of model error. The criterion is also valuable in economic modeling to determine the appropriate structure of econometric models, such as those analyzing the relationship between interest rates and inflation, or comparing different models of asymmetric price transmission.²⁶(https://www.academicjournals.org/journal/JDAE/article-abstract/5BF5AE319983)

Limitations and Criticisms

While widely used, the Akaike information criterion has certain limitations and criticisms. A primary point is that AIC does not provide an absolute assessment of model quality; it only indicates which model is superior relative to others within the set being considered.²⁵ Consequently, if all models in the candidate set are poor, AIC will still select the "best" among them, even if it's not truly a good model for the data.²⁴

Another criticism is that AIC tends to favor more complex models than other criteria, such as the Bayesian information criterion (BIC), especially with larger datasets.²³ This preference for complexity can sometimes lead to the selection of models that are still prone to overfitting, even with the penalty term.²² The reliability of AIC also depends on the underlying assumptions of the models being compared, such as the normality of errors. Violations of these assumptions can lead to misleading conclusions.²¹ Furthermore, in some advanced statistical contexts, such as comparing mixture models, AIC has been found to underestimate the expected Kullback-Leibler divergence, potentially leading to the selection of overly complex models.²⁰ Researchers are often advised to use AIC in conjunction with other statistical tests and cross-validation for robust model evaluation.¹⁹

Akaike Information Criterion vs. Bayesian Information Criterion

The Akaike information criterion (AIC) and the Bayesian information criterion (BIC) are both widely used for model selection by penalizing model complexity while rewarding goodness of fit. However, they differ significantly in their theoretical foundations and the magnitude of their complexity penalties.

Feature	Akaike Information Criterion (AIC)	Bayesian Information Criterion (BIC)
Penalty Term	(2k) (a constant penalty)	(k \ln(n)) (penalty increases with sample size)¹⁸
Objective	Aims to select the model that best approximates the unknown true data-generating process, focusing on prediction accuracy.¹⁷	Aims to select the "true model" from the candidate set, focusing on consistency.¹⁶
Preference	Tends to select more complex models.¹⁵	Tends to select simpler, more parsimonious models, especially with large datasets.¹⁴
Consistency	Not asymptotically consistent (does not guarantee selection of the true model even with infinite data).¹³	Asymptotically consistent (will select the true model if it is in the candidate set as sample size approaches infinity).¹¹, ¹²
Application	Often preferred for forecasting and situations where the true model is unlikely to be in the candidate set.⁹, ¹⁰	Often preferred for hypothesis testing and when a true model is believed to exist within the candidate set.⁸

The main confusion between the two arises from their similar formulas, but the difference in the penalty term's dependence on sample size leads to differing behaviors. AIC’s constant penalty means it can be more prone to selecting models with more parameters as the dataset grows, while BIC’s increasing penalty for complexity makes it favor simpler models in larger samples.

##⁷ FAQs

What does a low AIC value mean?

A low Akaike information criterion (AIC) value indicates that a statistical model is of higher quality relative to other models being considered. It suggests that the model loses less information when approximating the true data-generating process and provides a good balance between fitting the data and maintaining simplicity.

##⁶# Can AIC be used to compare any two models?
AIC can be used to compare different statistical models as long as they are fitted to the exact same dataset and the likelihood function is computed on that same data. It is particularly valuable because it can compare models that are not "nested," meaning one model is not simply a reduced version of another.

##⁴, ⁵# Is a lower AIC always better?
Yes, generally, a lower AIC value is indicative of a better model. The goal when using the Akaike information criterion for model selection is to find the model among the candidates that minimizes this value, as it suggests the least amount of information loss.

##³# Does AIC account for overfitting?
Yes, the Akaike information criterion is designed to help prevent overfitting. It includes a penalty term that increases with the number of parameters in the model. This penalty discourages overly complex models that might fit the training data perfectly but would not generalize well to new, unseen data.

##²# How does AIC relate to model accuracy?
AIC is an estimator of prediction error and, thus, the relative quality of statistical models. While it's computed using the training data, asymptotically, minimizing AIC is equivalent to minimizing the out-of-sample one-step forecast mean squared error for time series models, making it a useful criterion for selecting models for forecasting.¹