Maximum likelihood

What Is Maximum Likelihood?

Maximum likelihood is a fundamental method of parameter estimation in statistics. It identifies the parameters of a probability distribution that make the observed data most probable. Within the broader field of statistical inference, maximum likelihood offers a systematic approach to fitting a statistical model to a dataset. The core idea behind maximum likelihood is to find the model parameters that maximize the likelihood function, which quantifies how likely the observed data are under different parameter values.

History and Origin

The method of maximum likelihood was developed by the British statistician Ronald Fisher. While he discussed numerical procedures as early as 1912, Fisher formally introduced and extensively analyzed the method in his seminal 1922 paper, "On the Mathematical Foundations of Theoretical Statistics." His work laid the groundwork for much of modern data analysis, providing a rigorous framework for estimating unknown parameters from observed data. Fisher's contributions revolutionized the field of statistics by emphasizing the importance of a systematic approach to estimation, building upon the concept of "likelihood" as a measure of how well a particular set of parameters explains the observed data.⁴ More information on Fisher's life and work is available through the MacTutor History of Mathematics Archive.

Key Takeaways

Maximum likelihood is a method for estimating the parameters of a statistical model.
It seeks to find the parameter values that make the observed data most probable.
The method involves maximizing the likelihood function, which represents the probability of observing the given data for various parameter sets.
Maximum likelihood estimators possess desirable statistical properties, such as consistency, asymptotic normality, and efficiency under certain conditions.
It is widely used across various quantitative fields, including econometrics and quantitative finance.

Formula and Calculation

The objective of maximum likelihood estimation is to maximize the likelihood function, denoted as (L(\theta | x)), where (x) represents the observed data and (\theta) represents the unknown parameters of the model. For independent and identically distributed (i.i.d.) observations (x_1, x_2, \ldots, x_n), the likelihood function is the product of the individual probability density functions (PDFs) or probability mass functions (PMFs) evaluated at the observed data points:

$L(\theta | x_1, \ldots, x_n) = \prod_{i=1}^{n} f(x_i | \theta)$

Often, it is computationally more convenient to work with the log-likelihood function, (ll(\theta | x)), because the logarithm transforms the product into a sum, simplifying differentiation:

$ll(\theta | x) = \log L(\theta | x) = \sum_{i=1}^{n} \log f(x_i | \theta)$

To find the maximum likelihood estimate (MLE) for (\theta), one typically:

Writes down the likelihood function or, more commonly, the log-likelihood function.
Takes the partial derivative of the log-likelihood function with respect to each parameter in (\theta).
Sets these derivatives equal to zero.
Solves the resulting system of equations for (\theta).
These equations are often referred to as the likelihood equations. In some cases, numerical optimization methods are required to solve for the parameters. The process yields the estimated parameters, which are central to various forms of financial modeling.

Interpreting the Maximum Likelihood

Interpreting the output of maximum likelihood estimation involves understanding that the resulting parameter values are those that provide the best "fit" for the observed data, in the sense that they maximize the probability of seeing that specific dataset. The estimates obtained through maximum likelihood are point estimates, meaning they provide single, most plausible values for the underlying parameters of the stochastic processes generating the data.

While maximum likelihood estimates offer the most probable parameter values, they do not inherently provide a measure of uncertainty. For this, practitioners often rely on asymptotic properties of maximum likelihood estimators, such as the construction of confidence intervals or the application of hypothesis testing. The estimated parameters can then be used to make predictions, simulate future outcomes, or further analyze the characteristics of the data-generating process.

Hypothetical Example

Consider an investment manager who wants to estimate the average daily return ((\mu)) and the volatility ((\sigma)) of a particular stock, assuming daily returns follow a normal distribution. She collects 20 days of historical daily return data.

Suppose the observed daily returns (as percentages) are:
-0.5, 0.2, 1.1, -0.3, 0.8, 0.0, -0.1, 0.5, 0.7, -0.2, 0.9, -0.4, 0.3, 1.0, -0.6, 0.4, 0.6, -0.0, 1.2, -0.1

The probability density function for a normal distribution is:
$f(x | \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$

The log-likelihood function for (n) i.i.d. observations is:
$ll(\mu, \sigma | x) = \sum_{i=1}^{n} \left( -\frac{1}{2} \log(2\pi) - \log(\sigma) - \frac{(x_i-\mu)^2}{2\sigma^2} \right)$

To find the maximum likelihood estimates for (\mu) and (\sigma), the manager would take the partial derivatives of (ll(\mu, \sigma | x)) with respect to (\mu) and (\sigma), set them to zero, and solve. For the normal distribution, the maximum likelihood estimate for the mean (\mu) is simply the sample mean, and for the variance (\sigma^2), it is the sample variance (with (n) in the denominator, not (n-1)).

If the sample mean of the 20 returns is calculated as 0.28% and the sample standard deviation (using (n)) is 0.57%, these values would be the maximum likelihood estimates for (\mu) and (\sigma), respectively. These estimates could then inform portfolio optimization strategies.

Practical Applications

Maximum likelihood is a cornerstone of quantitative analysis across various financial disciplines:

Econometrics: It is extensively used to estimate parameters in econometric models, such as regression analysis models (e.g., linear regression with normally distributed errors, logistic regression) and time series analysis models (e.g., ARIMA, GARCH models for financial volatility).
Forecasting: Financial institutions employ maximum likelihood to estimate parameters of models used for macroeconomic and financial forecasting, including inflation and interest rates. For instance, the Federal Reserve Bank of San Francisco has explored the use of artificial intelligence and machine learning tools, which often incorporate maximum likelihood principles, for inflation forecasts.³ The performance of inflation forecasts by the Federal Open Market Committee (FOMC) also relies on robust statistical modeling.²
Risk Management: Maximum likelihood helps estimate parameters for various risk models, such as those for value-at-risk (VaR) or credit risk, by fitting distributions to historical loss data.
Derivative Pricing: Parameters for sophisticated derivative pricing models, especially those involving stochastic volatility or jump processes, are often estimated using maximum likelihood techniques.

Limitations and Criticisms

Despite its widespread use and desirable properties, maximum likelihood has several limitations:

Computational Complexity: For complex models or large datasets, finding the maximum likelihood estimate can be computationally intensive, often requiring numerical optimization algorithms. These algorithms may struggle with convergence or get trapped in local optima, particularly in high-dimensional parameter spaces.¹
Model Specification: Maximum likelihood estimates are only as good as the underlying model specification. If the chosen statistical model does not accurately represent the true data-generating process, the estimates may be biased or inefficient.
Sensitivity to Outliers: Like many statistical methods, maximum likelihood can be sensitive to outliers in the data, which may disproportionately influence the estimated parameters.
Asymptotic Properties: While maximum likelihood estimators are known for their strong asymptotic properties (meaning they perform well with large sample sizes), their performance with small sample sizes may not be optimal.
No Prior Information: The pure maximum likelihood approach does not inherently incorporate prior beliefs or existing knowledge about the parameters. It relies solely on the information contained within the observed data.

Maximum Likelihood vs. Bayesian Inference

Maximum likelihood and Bayesian inference are two distinct paradigms for statistical inference, often confused due to their shared goal of estimating unknown parameters.

The key difference lies in their philosophical approach to parameters. Maximum likelihood treats the parameters as fixed but unknown quantities and seeks the single set of parameter values that maximizes the probability of observing the given data. It focuses on the likelihood of the data given the parameters, (P(\text{data} | \text{parameters})).

In contrast, Bayesian inference treats parameters as random variables and aims to determine their posterior probability distribution given the observed data. This approach combines the likelihood of the data (similar to maximum likelihood) with a "prior" distribution, which quantifies initial beliefs about the parameters before observing any data. Bayesian inference then updates these prior beliefs using Bayes' theorem to arrive at a posterior distribution, (P(\text{parameters} | \text{data})). While maximum likelihood provides a point estimate, Bayesian inference provides an entire distribution for the parameters, offering a richer understanding of uncertainty.

FAQs

What is the goal of maximum likelihood estimation?

The primary goal of maximum likelihood estimation is to find the specific values for the parameters of a statistical model that make the observed data as probable as possible. It aims to identify the parameter settings that best "explain" the data.

Is maximum likelihood always the best estimation method?

While maximum likelihood estimators have desirable properties such as consistency and efficiency (for large samples), they are not always the "best" in every scenario. Their effectiveness depends on the accuracy of the underlying model assumptions, the sample size, and the specific context of the data analysis problem. In some cases, other methods like method of moments or Bayesian inference might be preferred, especially if strong prior information is available or if small sample sizes are a concern.

Can maximum likelihood be used for qualitative data?

Yes, maximum likelihood can be used for qualitative (categorical) data. For example, in logistic regression analysis, which models the probability of a binary outcome (e.g., success/failure), maximum likelihood is the standard method for estimating the coefficients. The underlying probability distribution for categorical data, such as the Bernoulli or multinomial distribution, is used to construct the likelihood function.