Maximum likelihood estimation

What Is Maximum Likelihood Estimation?

Maximum Likelihood Estimation (MLE) is a fundamental statistical method within Quantitative Finance used to estimate the parameters of an assumed probability distribution, given a set of observed data. The core principle behind maximum likelihood estimation is to find the parameter values that make the observed data most probable under the chosen statistical model. This process involves maximizing a likelihood function to identify the parameters that best explain the data's generation. The output of this method is known as the maximum likelihood estimate. MLE is a widely adopted technique in statistical inference due to its intuitive logic and broad applicability across various data and model types.

History and Origin

The method of maximum likelihood estimation is primarily attributed to the British statistician and geneticist Sir Ronald Fisher. While earlier hints of the concept appeared in the work of mathematicians like Carl Friedrich Gauss and Pierre-Simon Laplace, Fisher formalized and extensively developed the approach. He first introduced the numerical procedure in a 1912 paper while still an undergraduate and later coined the term "likelihood" in 1921.¹⁰ By 1922, Fisher formally presented the "maximum likelihood" method, establishing it as a cornerstone of modern statistics.⁹ His contributions laid the framework for contemporary statistical science, influencing numerous fields, including experimental design and parameter estimation.

Key Takeaways

Maximum Likelihood Estimation (MLE) is a statistical method for estimating the parameters of a probability distribution based on observed data.
It operates by identifying the parameter values that maximize the likelihood of obtaining the observed data.
MLE is widely used in various fields, including quantitative finance, for tasks such as financial modeling and risk assessment.
The method is known for properties like consistency and efficiency for large sample sizes.
Computation typically involves maximizing the log-likelihood function, often requiring numerical optimization techniques.

Formula and Calculation

The objective of maximum likelihood estimation is to find the parameter(s) (\theta) that maximize the likelihood function (L(\theta | \mathbf{x})), where (\mathbf{x} = (x_1, x_2, \ldots, x_n)) represents the observed data.

For independent and identically distributed (i.i.d.) observations, the likelihood function is the product of the probability density (or mass) functions for each observation:

L(\theta | \mathbf{x}) = \prod_{i=1}^{n} f(x_i | \theta)

In practice, it is often more convenient to work with the natural logarithm of the likelihood function, known as the log-likelihood function, because it transforms products into sums, simplifying differentiation:

\ell(\theta | \mathbf{x}) = \log L(\theta | \mathbf{x}) = \sum_{i=1}^{n} \log f(x_i | \theta)

To find the maximum likelihood estimates, one typically takes the derivative of the log-likelihood function with respect to the parameter(s) (\theta), sets it to zero, and solves for (\theta):

\frac{\partial \ell(\theta | \mathbf{x})}{\partial \theta} = 0

This process identifies the parameter values that maximize the likelihood that the assumed model produced the observed data.

Interpreting the Maximum Likelihood Estimation

Interpreting the results of maximum likelihood estimation involves understanding that the estimated parameters are those which render the observed data most plausible under the assumed statistical model. For example, if a financial analyst uses MLE to estimate the mean and standard deviation of asset returns, the resulting estimates represent the mean and standard deviation of a chosen probability distribution that would have been most likely to generate the historical return data observed.

Maximum likelihood estimates provide point estimates for parameters, meaning they yield single values rather than ranges. The interpretation relies on the underlying model assumption. If the assumed model (e.g., normal distribution, Poisson distribution) is a good fit for the actual data-generating process, then the maximum likelihood estimates are expected to be accurate and possess desirable statistical properties such as consistency and efficiency as the sample size increases.

Hypothetical Example

Consider an investment manager who wants to estimate the average daily return ((\mu)) of a new stock and its daily return variability ((\sigma)). They assume that the stock's daily returns follow a normal distribution. Over 10 trading days, the observed returns are: 0.005, -0.002, 0.010, 0.001, -0.005, 0.003, 0.007, -0.001, 0.004, 0.008.

To use maximum likelihood estimation, the manager would define the likelihood function for these observed returns based on the normal distribution's probability density function. Since the observations are assumed independent, the joint likelihood is the product of individual probabilities. To simplify, they would work with the log-likelihood.

The probability density function for a normal distribution is given by:

f(x | \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)

The log-likelihood function for (n) i.i.d. normal observations is:

\ell(\mu, \sigma | \mathbf{x}) = -\frac{n}{2}\log(2\pi) - n\log(\sigma) - \sum_{i=1}^{n} \frac{(x_i-\mu)^2}{2\sigma^2}

The manager would then take the partial derivatives of this log-likelihood function with respect to (\mu) and (\sigma), set them to zero, and solve the resulting equations. For the normal distribution, the maximum likelihood estimate for the mean (\mu) turns out to be the sample mean, and for the variance (\sigma^2), it is the sample variance (though a slightly biased version that improves with larger samples).

Using the given data, the sample mean would be calculated, providing the maximum likelihood estimate for (\mu). Similarly, the sample variance would yield the estimate for (\sigma^2). These estimates are the values for the normal distribution's parameters that make the occurrence of the observed 10 daily returns most probable. This approach helps the manager derive estimated parameters that best fit their data.

Practical Applications

Maximum likelihood estimation is a versatile statistical tool with widespread applications across quantitative finance and other data-intensive fields. In investing and markets, MLE is employed for:

Option Pricing Model Calibration: MLE can be used to estimate the parameters of complex models, such as stochastic volatility models, which are crucial for accurately pricing derivatives. By fitting these models to observed market prices, analysts can derive parameters that reflect current market conditions.⁸
Risk Management: It assists in estimating parameters for Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR) models, helping institutions quantify and manage potential financial losses.
Time Series Analysis: In time series modeling, maximum likelihood estimation is used to determine optimal coefficients for models like ARIMA (Autoregressive Integrated Moving Average) and to calibrate regime-switching models that identify different market states.⁷
Regression Analysis: MLE is fundamental in fitting various regression models, including logistic regression for predicting binary outcomes (e.g., probability of default) and Poisson regression for count data.⁶ These applications are essential for understanding relationships between financial variables and forecasting future events.

The ability of maximum likelihood estimation to provide a rigorous framework for fitting models to observed data by finding parameter values that make the observed data most probable makes it an indispensable tool for data-driven decision-making in financial markets.

Limitations and Criticisms

Despite its wide applicability and desirable properties, maximum likelihood estimation has several limitations. One notable drawback is its sensitivity to the correct specification of the statistical model. If the assumed probability distribution underlying the data is incorrect, the maximum likelihood estimates can be biased and misleading.⁵ This means that even with a large amount of data, the estimates may not converge to the true parameter values.

Another limitation is the potential for bias when dealing with small sample sizes. While maximum likelihood estimators are asymptotically unbiased (meaning their bias approaches zero as the sample size grows), they can exhibit significant bias in smaller datasets.⁴ For instance, the MLE for population variance tends to underestimate the true variance in small samples.³

Computational challenges can also arise, particularly for complex models with numerous parameters or when the likelihood function has multiple local maxima. Finding the global maximum of the likelihood function often requires iterative numerical optimization methods, which can be computationally intensive and sensitive to initial starting values.² Furthermore, MLE can be sensitive to outliers in the data, as extreme values can disproportionately influence the maximization process.¹

Maximum Likelihood Estimation vs. Method of Moments

Maximum Likelihood Estimation (MLE) and the Method of Moments are both techniques used for parameter estimation in statistical models, but they operate on different principles.

Maximum Likelihood Estimation seeks to find the parameters that maximize the likelihood function, essentially selecting the parameters that make the observed data most probable under the assumed statistical model. It requires specifying the probability distribution of the data. Its strength lies in its asymptotic properties, such as consistency, efficiency, and asymptotic normality for large sample sizes.

The Method of Moments, on the other hand, estimates parameters by equating sample moments (e.g., sample mean, sample variance) to their theoretical counterparts (population moments) and solving the resulting equations. This method does not require explicit knowledge of the full probability distribution beyond what is needed to derive the theoretical moments. While often simpler computationally, Method of Moments estimators may not be as statistically efficient as maximum likelihood estimators, particularly for smaller sample sizes or when the assumed distribution is well-defined.

The core confusion often arises because, in some common cases (e.g., estimating the mean and variance of a normal distribution), the maximum likelihood estimates numerically coincide with the Method of Moments estimates. However, this is not universally true, and MLE generally offers more desirable statistical properties under broad conditions, especially when the underlying data distribution is correctly specified.

FAQs

What does "likelihood" mean in Maximum Likelihood Estimation?

In Maximum Likelihood Estimation, "likelihood" refers to how probable the observed data is, given specific values for the parameters of a statistical model. It quantifies how well a particular set of parameter values explains the observed data. It's not a probability itself, but rather a measure of plausibility.

Why is the log-likelihood function often used instead of the likelihood function?

The log-likelihood function is used primarily for computational convenience. Taking the logarithm transforms the product of probabilities into a sum of logarithms, which is much easier to differentiate and maximize, especially for complex models. Since the logarithm is a monotonically increasing function, maximizing the log-likelihood is equivalent to maximizing the original likelihood function.

Is Maximum Likelihood Estimation always the best method for parameter estimation?

While Maximum Likelihood Estimation is a powerful and widely used method with many desirable properties like consistency and efficiency, it is not always the "best" in every scenario. Its performance can be affected by small sample sizes, model misspecification, and sensitivity to outliers. Other methods, such as Bayesian estimation or robust statistics, might be more appropriate depending on the specific characteristics of the data and the estimation problem.

How does Maximum Likelihood Estimation relate to machine learning?

Maximum Likelihood Estimation is a foundational concept in many machine learning algorithms. It is used in algorithms such as logistic regression analysis, neural networks, and generative models to estimate model parameters. The goal is often to find the parameters that maximize the probability of the training data given the model, which aligns directly with the principles of MLE.