Likelihood function

What Is Likelihood Function?

The likelihood function is a fundamental concept in statistical modeling and a cornerstone of parameter estimation, particularly within Quantitative Finance. It quantifies how well a given statistical model explains observed data by expressing the probability of observing that data given different values for the model's parameters. While related to probability, the likelihood function is specifically viewed as a function of the parameters, with the observed data treated as fixed⁴³.

When working with a statistical model that contains unknown parameters, the likelihood function allows analysts to evaluate the "plausibility" or "support" that the observed data provides for different parameter values. The higher the likelihood, the more probable it is to observe the given data under those specific parameter settings⁴². The most common application of the likelihood function is in Maximum Likelihood Estimation (MLE), a powerful method used across many analytical fields⁴¹.

History and Origin

The formal concept of the likelihood function and its pivotal role in statistical inference were largely established by Sir Ronald A. Fisher in the early 20th century. While rudimentary forms of the idea existed earlier, Fisher was instrumental in developing the methodology and nomenclature that became central to modern statistics³⁹, ⁴⁰.

Fisher, a British statistician and geneticist, introduced the term "likelihood" in papers published in 1921 and 1922, distinguishing it from the traditional concept of probability. His work aimed to provide a more rigorous framework for statistical inference, moving away from methods like "inverse probability" that he found problematic³⁷, ³⁸. The method of maximum likelihood, which relies on maximizing the likelihood function, was a direct outcome of his pioneering efforts. Fisher's contributions revolutionized the field, laying much of the groundwork for modern hypothesis testing and estimation techniques. For a comprehensive historical perspective on this concept, "The Epic Story of Maximum Likelihood" provides significant detail on Fisher's development of the theory.³³, ³⁴, ³⁵, ³⁶

Key Takeaways

The likelihood function measures the plausibility of different parameter values given observed data.
It is fundamental to Maximum Likelihood Estimation, where the goal is to find parameter values that maximize this function.
Unlike a probability distribution, the likelihood function treats parameters as variables and data as fixed.
The concept was formally introduced and developed by Ronald A. Fisher in the early 20th century.
Working with the logarithm of the likelihood function, known as the log-likelihood, is often preferred for computational simplicity.

Formula and Calculation

For a set of observed data (x = (x_1, x_2, \ldots, x_n)) drawn from a probability distribution with an unknown parameter (\theta), the likelihood function, denoted as (L(\theta | x)), is defined as:

For discrete random variables, it is the joint probability mass function:
$L(\theta | x_1, \ldots, x_n) = P(X_1=x_1, \ldots, X_n=x_n | \theta)$

For continuous random variables, it is the joint probability density function:
$L(\theta | x_1, \ldots, x_n) = f(x_1, \ldots, x_n | \theta)$

If the observations are independent and identically distributed (i.i.d.), the likelihood function can be expressed as the product of the individual probability density (or mass) functions:
$L(\theta | x_1, \ldots, x_n) = \prod_{i=1}^{n} f(x_i | \theta)$

In practice, it is often more convenient to work with the log-likelihood function, denoted as (\ell(\theta | x)) or (l(\theta)), which is the natural logarithm of the likelihood function:
$\ell(\theta | x) = \log L(\theta | x) = \sum_{i=1}^{n} \log f(x_i | \theta)$

Maximizing the log-likelihood function is equivalent to maximizing the likelihood function because the logarithm is a monotonically increasing transformation³¹, ³². This transformation converts products into sums, simplifying differentiation and computation²⁹, ³⁰.

Interpreting the Likelihood Function

The interpretation of the likelihood function centers on the idea of "support" or "plausibility." A higher value of the likelihood function for a particular parameter value (\theta_1) compared to another parameter value (\theta_2) suggests that the observed data were more likely to have occurred if the true parameter was (\theta_1) than if it was (\theta_2)²⁸. It does not, however, represent a probability distribution over the parameters themselves.

For example, if (L(\theta_1 | \text{data}) = 2 \times L(\theta_2 | \text{data})), it means the observed data is twice as likely under parameter (\theta_1) as it is under parameter (\theta_2). This provides a basis for selecting the parameter value that best "explains" the observed data, which is the core principle behind Maximum Likelihood Estimation. In Bayesian statistics, the likelihood function is combined with a prior probability distribution to derive a posterior probability distribution for the parameters.

Hypothetical Example

Consider a hypothetical investment analyst trying to estimate the average daily return ((\mu)) of a particular stock, assuming that daily returns follow a normal probability distribution with a known standard deviation ((\sigma)) of 1%. The analyst observes three days of returns: 0.5%, 1.2%, and -0.3%.

To estimate (\mu) using the likelihood function, the analyst considers different possible values for (\mu). The probability density function for a single normal observation (x_i) is given by:
$f(x_i | \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right)$

Since the observations are independent, the likelihood function for the three observations is the product of their individual probability densities:
$L(\mu | x_1, x_2, x_3, \sigma) = f(x_1 | \mu, \sigma) \times f(x_2 | \mu, \sigma) \times f(x_3 | \mu, \sigma)$

Let's evaluate the likelihood for two possible values of (\mu): 0.6% and 0.8%.

For (\mu = 0.6%):

(x_1 = 0.5%): (f(0.005 | 0.006, 0.01))
(x_2 = 1.2%): (f(0.012 | 0.006, 0.01))
(x_3 = -0.3%): (f(-0.003 | 0.006, 0.01))

The analyst calculates the product of these three values to get (L(0.006 | 0.005, 0.012, -0.003, 0.01)).

For (\mu = 0.8%):

(x_1 = 0.5%): (f(0.005 | 0.008, 0.01))
(x_2 = 1.2%): (f(0.012 | 0.008, 0.01))
(x_3 = -0.3%): (f(-0.003 | 0.008, 0.01))

The analyst calculates (L(0.008 | 0.005, 0.012, -0.003, 0.01)). By comparing these two likelihood values, the analyst can determine which of the two (\mu) values provides a better explanation for the observed returns. The goal of Maximum Likelihood Estimation would be to find the (\mu) that yields the highest possible likelihood.

Practical Applications

The likelihood function is integral to numerous applications in financial modeling, econometrics, and quantitative finance. Its ability to estimate model parameters that best fit observed data makes it a powerful tool for a wide range of analytical tasks.

Key applications include:

Option Pricing Model Calibration: The likelihood function is used to calibrate parameters for complex financial models, such as stochastic volatility models, which are crucial for pricing options. By maximizing the likelihood of observed historical stock prices, analysts can estimate parameters efficiently²⁶, ²⁷.
Risk Assessment: In risk assessment, the likelihood function helps estimate the parameters of probability distributions that model financial asset returns, aiding in quantifying potential risks associated with various investment strategies²⁵.
Portfolio Optimization: It assists in estimating parameters for models used in portfolio optimization, enabling investors to construct portfolios that balance risk and return based on estimated market dynamics²³, ²⁴.
Econometric Modeling: In econometrics, likelihood-based methods are widely used to estimate parameters for various models, including linear regression, logistic regression, and time-series models like ARIMA, especially when strong assumptions about the data generating process can be made²⁰, ²¹, ²².
Credit Default Probability Assessment: The likelihood function can be applied to assess the probability of credit defaults by estimating parameters in credit risk models¹⁹.

The versatility of the likelihood function allows for its application in fitting models to observed data by finding parameter values that make the observed data most probable. This rigorous framework is highly valued in quantitative finance for its ability to enable data-driven parameter estimation and predictive analysis¹⁷, ¹⁸. Further details on its application in financial model building are available in relevant academic discussions.¹⁶

Limitations and Criticisms

Despite its wide applicability and desirable properties, the likelihood function, particularly in the context of Maximum Likelihood Estimation, has certain limitations and criticisms:

Model Specification Dependence: The effectiveness of likelihood-based methods heavily relies on the assumption that the chosen statistical model is correct. If the underlying probability distribution is misspecified, the resulting parameter estimates can be biased or inaccurate¹⁴, ¹⁵.
Sensitivity to Outliers: The likelihood function can be sensitive to unusual or extreme data points (outliers). Since the method aims to maximize the probability of observing the given data, outliers can disproportionately influence the parameter estimates, skewing results¹³.
Small Sample Sizes: Maximum Likelihood Estimation generally performs best with large sample sizes. For small datasets, estimates can be unstable or biased, and the desirable asymptotic properties (like consistency and efficiency) may not fully hold¹⁰, ¹¹, ¹². Research indicates that for small samples, the mean can be underestimated and standard deviations overestimated⁸, ⁹.
Computational Complexity: For complex models or those with many parameters, maximizing the likelihood function can be computationally intensive and may require iterative numerical methods, making the analysis slower and more complex⁶, ⁷. While software exists to mitigate this, the underlying mathematical challenges remain.
Overfitting Risk: If the model is overly flexible relative to the amount of data, the likelihood function can be maximized to "memorize" the noise in the data rather than the underlying signal, leading to overfitting. This means the model performs well on existing data but poorly on new, unseen data⁵.
Lack of Prior Information Integration (in pure MLE): In its pure frequentist form, MLE does not directly incorporate prior beliefs or existing knowledge about the parameters. While Bayesian statistics addresses this by integrating a prior probability distribution, classical MLE focuses solely on the likelihood of the data given the parameters.

These limitations highlight the importance of careful model selection, data quality assessment, and a nuanced interpretation of results when employing likelihood-based methods in financial modeling and other analytical contexts.

Likelihood Function vs. Probability Density Function

While closely related, the likelihood function and the probability density function (or probability mass function) serve distinct conceptual purposes:

Feature	Likelihood Function	Probability Density Function (PDF) / Probability Mass Function (PMF)
Variables	Data are fixed, parameters are variables	Parameters are fixed, data are variables
Purpose	Evaluates how well a model (with specific parameters) explains observed data	Describes the probability of observing particular data points given a model with fixed parameters
Output	A measure of "plausibility" or "support" for parameter values	A measure of probability (for PMF) or probability density (for PDF) for data values
Sum/Integral	Does not necessarily sum or integrate to 1	Integrates to 1 (for PDF) or sums to 1 (for PMF) over all possible data values
Notation	(L(\theta	x)) or (L(\theta; x))

The primary point of confusion arises because the mathematical form of the likelihood function is derived directly from the PDF or PMF of the assumed probability distribution. However, their interpretations are inverted: the PDF answers "what is the probability of this data given these parameters?", while the likelihood function answers "what are the most plausible parameters given this observed data?".

FAQs

1. Why is it called "likelihood" and not "probability"?

Ronald Fisher deliberately chose the term "likelihood" to distinguish it from probability. While both involve assessing the chances of an event, a probability distribution describes the chance of observing certain data given known parameters. In contrast, the likelihood function assesses the plausibility of different parameter values given already observed data. It is a measure of support for parameter values, not a probability distribution over those parameters⁴.

2. Can the likelihood function ever be zero or negative?

The likelihood function, being derived from a probability distribution (which outputs non-negative values), will always be non-negative. It can be zero if the observed data are impossible under the given parameter values. However, it will never be negative. The log-likelihood function, which is the logarithm of the likelihood, can be negative, particularly when the likelihood values are between 0 and 1 (as is often the case for probability densities)³.

3. How is the likelihood function used in investment analysis?

In investment analysis and financial modeling, the likelihood function is primarily used to estimate the unknown parameters of models. For example, it can help determine the parameters of a model describing asset returns, volatility, or credit risk. By finding the parameter values that maximize the likelihood function, analysts can obtain the "best fit" parameters for their models based on historical data. This is crucial for tasks like risk assessment and portfolio optimization ¹, ².