Bayesian information criterion

The Bayesian information criterion (BIC) is a statistical measure used in statistical modeling to select the best model among a finite set of candidate models. It is a key tool in econometrics and other quantitative fields, balancing a model's goodness of fit with its complexity to avoid overfitting. The BIC penalizes models with more parameters, favoring simpler, more parsimonious explanations that are more likely to generalize well to new data points.

History and Origin

The Bayesian information criterion, also known as the Schwarz Information Criterion (SIC), was developed by Gideon E. Schwarz and published in a 1978 paper titled "Estimating the Dimension of a Model." Schwarz derived the BIC as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model.²⁶ This development provided a formal framework for model selection based on Bayesian principles, offering a competitor to the Akaike Information Criterion (AIC), which was introduced earlier.

Key Takeaways

The Bayesian information criterion (BIC) is a statistical tool used for comparing and selecting models.
It penalizes models for complexity (number of parameters) and rewards them for goodness of fit (likelihood).
A lower BIC value indicates a preferred model, striking a better balance between fit and complexity.
The BIC is particularly useful for large sample size datasets and aims to identify the "true" model among candidates.
It is widely applied in various fields, including finance, for building robust predictive models.

Formula and Calculation

The formula for the Bayesian information criterion (BIC) is:

\text{BIC} = k \ln(n) - 2 \ln(\hat{L})

Where:

( k ) = The number of parameters estimated by the model. This represents the model's complexity.
( n ) = The number of data points or observations in the dataset (i.e., the sample size).
( \hat{L} ) = The maximized value of the model's likelihood function. This term reflects how well the model fits the data.

The first term, ( k \ln(n) ), is the penalty for model complexity, which increases with the number of parameters and the sample size. The second term, ( -2 \ln(\hat{L}) ), measures the model's lack of fit to the data; a smaller negative log-likelihood (meaning a larger likelihood) indicates a better fit.

Interpreting the Bayesian Information Criterion

When comparing multiple models, the model with the lowest Bayesian information criterion value is generally preferred.²⁵ A lower BIC suggests that the model offers an optimal balance between explaining the observed data well and maintaining simplicity. The criterion's penalty for complexity, which is influenced by the sample size, means that as more observations become available, the BIC increasingly favors more parsimonious models.²⁴ This characteristic helps in avoiding overfitting, ensuring that the selected model is not just fitting random noise but capturing the underlying patterns in the data.

Hypothetical Example

Imagine a financial analyst comparing two different predictive models for stock prices over 200 daily observations (( n = 200 )).

Model A: A simpler model with 3 parameters and a maximized likelihood function (( \hat{L}_A )) of ( 0.005 ).
Model B: A more complex model with 8 parameters and a maximized likelihood function (( \hat{L}_B )) of ( 0.008 ).

Let's calculate the BIC for each:

For Model A:

\text{BIC}_A = 3 \ln(200) - 2 \ln(0.005) \\ \text{BIC}_A = 3 \times 5.298 - 2 \times (-5.298) \\ \text{BIC}_A = 15.894 + 10.596 \\ \text{BIC}_A = 26.49

For Model B:

\text{BIC}_B = 8 \ln(200) - 2 \ln(0.008) \\ \text{BIC}_B = 8 \times 5.298 - 2 \times (-4.828) \\ \text{BIC}_B = 42.384 + 9.656 \\ \text{BIC}_B = 52.04

In this hypothetical example, Model A has a lower BIC (26.49) compared to Model B (52.04). Therefore, despite Model B having a slightly higher likelihood (better fit), the BIC suggests that Model A is the preferred choice due to its greater simplicity and a more favorable balance between fit and complexity. This indicates that Model A is less likely to suffer from overfitting.

Practical Applications

The Bayesian information criterion is widely applied in quantitative finance and econometrics for model selection. It helps analysts and researchers choose the most appropriate statistical or predictive models when faced with multiple options.

Common applications include:

Regression analysis: Selecting the optimal set of independent variables in a regression model to explain financial phenomena, such as asset returns or macroeconomic indicators.²³
Time series analysis and forecasting: Determining the appropriate order of autoregressive (AR) or moving average (MA) components in models used for predicting stock market indices or other financial time series. Using BIC helps ensure that forecasting methods do not become overly tailored to past market behavior.²²
Risk models and asset pricing models: In evaluating different risk models, BIC assists in selecting models that balance explanatory power with complexity, which is crucial for robust risk management and portfolio construction.²¹
High-frequency trading: BIC can be applied to complex datasets in high-frequency trading to improve predictive performance by selecting models with an optimal balance of explanatory variables.²⁰

The BIC's emphasis on parsimony makes it valuable in situations where the goal is to identify a "true" underlying model of data generation, providing a more robust framework for developing sound investment strategies.¹⁹

Limitations and Criticisms

Despite its utility, the Bayesian information criterion has several limitations and criticisms that practitioners should consider. One common critique is that BIC can be overly conservative, particularly when dealing with smaller sample size datasets. Its penalty term, which increases with the logarithm of the sample size, can lead to the selection of overly simplistic models that might underfit the data.¹⁸,¹⁷

Another limitation is the assumption that the "true" model is among the set of candidate models being considered. If the actual underlying data-generating process is not represented within the evaluated models, BIC may still select the "best" among the incorrect options.¹⁶,¹⁵ Furthermore, while BIC is consistent (meaning it will select the true model as the sample size grows infinitely large, assuming the true model is in the candidate set), it is not necessarily efficient in terms of predictive accuracy, especially when the true model is not among the candidates.¹⁴,¹³ This means that a model chosen by BIC might not always yield the best forecasting performance, especially in scenarios where predictive ability is the primary objective rather than identifying the precise underlying data-generating process.¹²

Bayesian Information Criterion vs. Akaike Information Criterion

The Bayesian information criterion (BIC) and the Akaike Information Criterion (AIC) are both widely used criteria for model selection, but they operate under different statistical philosophies and have distinct properties. Both criteria balance the goodness of fit of a model with a penalty for its complexity, using the maximized likelihood function and the number of parameters.

The primary difference lies in their penalty terms. BIC imposes a stricter penalty for the number of parameters, especially as the sample size increases. The BIC penalty is ( k \ln(n) ), whereas the AIC penalty is ( 2k ). This means that for ( n > 7 ), BIC's penalty is larger than AIC's, leading BIC to favor simpler models.¹¹ AIC is often preferred for situations where the goal is to achieve the best predictive performance or minimize prediction error, as it asymptotically selects the model that minimizes mean squared error of prediction.¹⁰,⁹ In contrast, BIC is preferred when the objective is to identify the "true" underlying model from a set of candidates, as it is consistent—it is guaranteed to select the true model with increasing sample size if the true model is among the options., ⁸I⁷n financial econometrics, the choice between AIC and BIC often depends on whether the primary aim is prediction (favoring AIC) or theoretical interpretation and identifying a parsimonious model (favoring BIC).

⁶## FAQs

What does a lower BIC value mean?

A lower BIC value indicates a preferred model. It suggests that the model provides a better balance between fitting the data accurately and keeping the number of parameters (and thus complexity) low.

⁵### When should I use BIC instead of AIC?
You should generally consider using BIC when your goal is to identify the "true" underlying model from a set of candidates, especially with a large sample size. BIC's stricter penalty for complexity makes it more likely to select simpler, more parsimonious models. If your primary goal is to achieve the best out-of-sample prediction, the Akaike Information Criterion (AIC) might be more appropriate.

⁴### Can BIC be used for any type of model?
The Bayesian information criterion is broadly applicable for model selection among a finite set of statistical models. It is commonly used in regression analysis, time series analysis, and other contexts where models are fitted using maximum likelihood estimation. While its original derivation had specific assumptions, its use has generalized beyond those initial constraints.

³### Does BIC always select the "best" model?
BIC aims to select the "true" model when the true model is among the candidates and the sample size is large enough. However, like any statistical criterion, it has limitations. If the true model is not in the set of models being considered, or if the sample size is small, BIC might select a model that is overly simplistic or not the best for predictive performance.,[²¹](https://pmc.ncbi.nlm.nih.gov/articles/PMC3366160/)