Maximum likelihood schatzung

What Is Maximum Likelihood Estimation?

Maximum likelihood estimation (MLE) is a powerful statistical method used to estimate the parameters of a statistical model. Within the broader field of statistical estimation and quantitative finance, MLE seeks to find the parameter values that make the observed data most probable. It achieves this by maximizing a specific mathematical function known as the likelihood function. The core idea behind maximum likelihood estimation is to choose the parameters that best "fit" the data, in the sense that they maximize the probability of observing the data given the chosen model. This method is a cornerstone in data analysis for various disciplines, including economics, engineering, and finance.

History and Origin

The method of maximum likelihood estimation has roots tracing back to early statisticians, but it was rigorously developed and popularized by Sir Ronald Aylmer Fisher in the early 20th century. Fisher, a British statistician and geneticist, first presented the numerical procedure in 1912, though he coined the term "likelihood" later in 1921 and formally introduced the method of maximum likelihood in 1922⁷, ⁸. Fisher's work provided a unified approach to parameter estimation, demonstrating several desirable properties of maximum likelihood estimators under certain conditions. His contributions laid much of the foundation for modern statistical theory, impacting fields far beyond his initial work in agriculture and genetics.

Key Takeaways

Maximum likelihood estimation (MLE) is a method for estimating the parameters of a statistical model.
It works by identifying the parameter values that maximize the likelihood function, thereby making the observed data most probable.
MLE provides a consistent and asymptotically efficient approach to parameter estimation under suitable conditions.
It is widely used across various quantitative fields, including econometrics and financial modeling.
Despite its advantages, MLE can be sensitive to model misspecification and requires sufficient sample size for optimal performance.

Formula and Calculation

At its core, maximum likelihood estimation involves defining a likelihood function (L(\theta | x)), which is the joint probability density function (PDF) or probability mass function (PMF) of the observed data points (x_1, x_2, \ldots, x_n), treated as a function of the unknown parameters (\theta).

For independent and identically distributed (i.i.d.) observations, the likelihood function is the product of the individual probability densities:

$L(\theta | x_1, \ldots, x_n) = \prod_{i=1}^{n} f(x_i | \theta)$

Often, it is computationally more convenient to work with the log-likelihood function, denoted as (\ell(\theta | x)), which converts the product into a sum, simplifying the differentiation process required for optimization:

$\ell(\theta | x_1, \ldots, x_n) = \log L(\theta | x_1, \ldots, x_n) = \sum_{i=1}^{n} \log f(x_i | \theta)$

To find the maximum likelihood estimates (\hat{\theta}), one takes the partial derivatives of the log-likelihood function with respect to each parameter, sets them to zero, and solves the resulting system of equations. These equations are known as the likelihood equations. In many cases, these equations do not have a closed-form solution and require numerical methods to solve.

Interpreting Maximum Likelihood Estimation

Interpreting the results of maximum likelihood estimation revolves around understanding that the chosen parameters are those that provide the best "explanation" for the observed data, assuming the underlying statistical model is correct. The estimates obtained are point estimates, representing the most likely single values for the true but unknown population parameters.

For example, if using MLE to estimate the mean and variance of a normal distribution from a sample, the resulting estimates are those specific mean and variance values that would make it most probable to draw the exact sample observed. While MLE estimators possess desirable properties like consistency (converging to the true parameter as sample size increases) and asymptotic efficiency (achieving the lowest possible variance among consistent estimators for large samples), their direct numerical interpretation depends heavily on the specific model and context. Researchers also often use these estimates to perform hypothesis testing on the parameters.

Hypothetical Example

Consider a scenario where a financial analyst wants to estimate the average daily return ((\mu)) of a particular stock, assuming its daily returns follow a normal distribution with a known standard deviation ((\sigma)). While in practice (\sigma) is also estimated, for simplicity, let's assume (\sigma = 0.01) (1%). The analyst observes the following daily returns for a week: ([0.005, 0.012, -0.003, 0.008, 0.001]).

The probability density function for a single observation (x_i) from a normal distribution is:
$f(x_i | \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right)$

The log-likelihood function for the five observed returns would be:
$\ell(\mu | x_1, \ldots, x_5) = \sum_{i=1}^{5} \log \left( \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right) \right)$
$\ell(\mu | x_1, \ldots, x_5) = \sum_{i=1}^{5} \left( -\frac{1}{2}\log(2\pi\sigma^2) - \frac{(x_i - \mu)^2}{2\sigma^2} \right)$

To find the MLE for (\mu), the analyst would take the derivative of (\ell) with respect to (\mu) and set it to zero:
$\frac{\partial \ell}{\partial \mu} = \sum_{i=1}^{5} \frac{x_i - \mu}{\sigma^2} = 0$
Since (\sigma^2) is a positive constant, this simplifies to:
$\sum_{i=1}^{5} (x_i - \mu) = 0$
$\sum_{i=1}^{5} x_i - 5\mu = 0$
$\hat{\mu}_{MLE} = \frac{1}{5} \sum_{i=1}^{5} x_i$

In this specific case of a normal distribution with known variance, the maximum likelihood estimate of the mean is simply the sample average. For the given returns, (\hat{\mu}_{MLE} = (0.005 + 0.012 - 0.003 + 0.008 + 0.001) / 5 = 0.023 / 5 = 0.0046). This means that given the observed returns, an average daily return of 0.46% makes the occurrence of these specific returns most probable.

Practical Applications

Maximum likelihood estimation is a foundational technique with extensive practical applications across various quantitative fields, especially in finance and economics. In financial modeling, it is employed for a wide range of tasks, including the estimation of parameters for complex stochastic processes that describe asset prices, interest rates, or volatilities⁶. For instance, MLE can be used to calibrate parameters in option pricing models, such as the Heston model for stochastic volatility.

Another significant application is in estimating the equity risk premium, which is the expected return on the aggregate stock market in excess of the risk-free rate. Researchers utilize maximum likelihood methods to account for information contained in dividends and prices, often leading to more robust estimates compared to simple historical averages⁵. Furthermore, MLE is crucial in econometrics for estimating the coefficients of regression models, particularly for discrete choice models like probit and logit, where the dependent variable is binary or categorical. It is also used in advanced macroeconomic models, such as Dynamic Stochastic General Equilibrium (DSGE) models, to estimate parameters that govern economic behavior and policy implications⁴.

Limitations and Criticisms

While maximum likelihood estimation is a widely used and powerful technique, it is not without limitations and criticisms. One primary concern is its sensitivity to model misspecification. MLE assumes that the chosen statistical model correctly represents the underlying data-generating process. If the model is incorrect (e.g., assuming a normal distribution when the data is actually fat-tailed), the estimates can be biased or inconsistent, leading to inaccurate conclusions³.

Another limitation is the requirement for a sufficiently large sample size for the desirable asymptotic properties of MLE (like consistency and efficiency) to hold. In small samples, maximum likelihood estimators can exhibit significant bias, and their theoretical optimality may not apply². Computational complexity can also be a challenge, particularly for models with many parameters or complex likelihood functions, often requiring iterative numerical methods that can be slow or sensitive to starting values. Additionally, MLE can be sensitive to outliers in the data, as extreme values can disproportionately influence the maximization of the likelihood function¹. Despite these drawbacks, careful model selection and validation, along with robust statistical practices, can mitigate many of these issues.

Maximum Likelihood Estimation vs. Least Squares Estimation

Maximum likelihood estimation (MLE) and least squares estimation (LSE) are both fundamental methods for parameter estimation, but they operate on different principles and are suited for different contexts.

Feature	Maximum Likelihood Estimation (MLE)	Least Squares Estimation (LSE)
Principle	Maximizes the probability of observing the given data (likelihood).	Minimizes the sum of squared residuals (errors).
Assumptions	Requires specification of the full probability distribution of the data.	Primarily assumes linearity, homoscedasticity, and uncorrelated errors for OLS.
Goal	Finds parameters that make the observed data most "likely" under a specific model.	Finds parameters that provide the "best fit" to the data by minimizing prediction errors.
Applicability	Applicable to a wide range of models, including those with non-normal errors, discrete outcomes, and complex stochastic processes.	Primarily used for linear regression models. Extensions exist (e.g., weighted least squares).
Relationship	If the errors are independently and normally distributed, OLS estimates for linear regression are equivalent to MLE estimates.	LSE is a special case of MLE under the assumption of normal and i.i.d. errors.

The key confusion often arises in linear regression. When the errors in a linear regression model are assumed to be independent and identically distributed normal random variables, the parameters estimated by ordinary least squares (OLS) are exactly the same as those estimated by maximum likelihood. However, MLE is a more general framework that can be applied to a much broader class of statistical models where the distributional assumptions are different (e.g., logistic regression for binary outcomes, Poisson regression for count data), whereas LSE's direct application is typically limited to models that predict a continuous outcome and assume certain properties of the errors.

FAQs

What is the main goal of Maximum Likelihood Estimation?

The main goal of maximum likelihood estimation is to find the values of a model's parameters that maximize the likelihood function. This effectively means finding the parameter values under which the observed data is most probable.

Why is the log-likelihood function often used instead of the likelihood function?

The log-likelihood function is often used because it simplifies the mathematical calculations involved in finding the maximum. Taking the logarithm transforms products into sums, which are easier to differentiate, especially when dealing with many data points. Maximizing the log-likelihood is equivalent to maximizing the likelihood, as the logarithm is a monotonically increasing function.

What are some common applications of MLE in finance?

In finance, maximum likelihood estimation is used for tasks such as estimating parameters for asset pricing models, calibrating stochastic processes for option pricing, modeling credit risk, and forecasting financial time series. It's also integral to many advanced econometric models used in financial research.

Does MLE always provide the best estimates?

Under ideal conditions (correct model specification, large sample size), maximum likelihood estimators possess desirable properties like consistency, asymptotic normality, and asymptotic efficiency (meaning they have the lowest possible variance among consistent estimators). However, if the model is misspecified, the sample size is small, or the data contains significant outliers, the estimates may not be optimal or reliable.

Is MLE sensitive to outliers?

Yes, maximum likelihood estimation can be sensitive to outliers because the likelihood function's maximization process is influenced by all observed data points. Extreme values can significantly skew the parameter estimates, potentially leading to a poor fit for the majority of the data. Robust estimation methods might be considered in the presence of significant outliers.