Weighted least squares

What Is Weighted Least Squares?

Weighted least squares (WLS) is a statistical modeling technique used in regression analysis to account for differing levels of precision or reliability among data points. It falls under the broader umbrella of quantitative finance and statistical modeling. Unlike ordinary least squares (OLS), which assumes that all observations contribute equally and have uniform variance of errors, weighted least squares assigns a "weight" to each observation. This weight inversely relates to the variance of its error term, meaning more precise or reliable observations are given greater influence in determining the model's parameter estimation. Weighted least squares is particularly useful when dealing with heteroscedasticity, a condition where the dispersion of residuals is not constant across all levels of the independent variables.

History and Origin

The concept of least squares, from which weighted least squares originates, traces its roots to the early 19th century. The method was independently developed by mathematicians Adrien-Marie Legendre and Carl Friedrich Gauss. Legendre was the first to publish the method in 1805 in his "Nouvelles méthodes pour la détermination des orbites des comètes" (New Methods for the Determination of Comet Orbits). However, Carl Friedrich Gauss later claimed to have used the method as early as 1795 for astronomical calculations, particularly in predicting the orbit of the asteroid Ceres, though he did not publish his findings until 1809. Ga⁴uss's contribution was significant as he connected the least squares method with principles of probability theory, including the normal distribution. The extension to weighted least squares emerged as practitioners recognized that not all observations carry equal informational value or possess identical error characteristics, leading to the need to assign different "weights" to observations in the minimization process.

Key Takeaways

Weighted least squares (WLS) is a regression technique that assigns different weights to individual observations.
It is primarily used to address heteroscedasticity, where the variance of the error terms is not constant.
WLS gives more precise observations (those with smaller error variance) greater influence in the model fitting.
By accounting for unequal variances, WLS provides more efficient and reliable parameter estimates than standard ordinary least squares (OLS) when heteroscedasticity is present.
The effectiveness of WLS heavily depends on accurately determining or estimating the appropriate weights for each data point.

Formula and Calculation

The objective of weighted least squares is to minimize the sum of the squared residuals, where each residual is multiplied by a corresponding weight. If (y_i) represents the observed dependent variable, (\mathbf{x}_i) is the vector of independent variable values for the (i)-th observation, and (\boldsymbol{\beta}) is the vector of regression coefficients, the weighted least squares objective function is:

$\min_{\boldsymbol{\beta}} \sum_{i=1}^{n} w_i (y_i - \mathbf{x}_i^\top \boldsymbol{\beta})^2$

Where:

(n) is the number of observations.
(w_i) is the weight assigned to the (i)-th observation.
((y_i - \mathbf{x}_i^\top \boldsymbol{\beta})) is the residual for the (i)-th observation.

In matrix notation, the estimated coefficients for weighted least squares ((\hat{\boldsymbol{\beta}}_{\text{WLS}})) are given by:

$\hat{\boldsymbol{\beta}}_{\text{WLS}} = (\mathbf{X}^\top \mathbf{W} \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{W} \mathbf{y}$

Where:

(\mathbf{X}) is the design matrix containing the independent variables.
(\mathbf{W}) is an (n \times n) diagonal matrix where the diagonal elements are the weights (w_i). Typically, the weights are inversely proportional to the variance of the error term for each observation, i.e., (w_i = 1/\sigma_i^2).
(\mathbf{y}) is the vector of dependent variable observations.

This formula ensures that observations with higher variances (and thus smaller weights) have less influence on the final coefficient estimates, while those with lower variances (and larger weights) have greater influence.

Interpreting the Weighted Least Squares

Interpreting the results of a weighted least squares regression is similar to interpreting an OLS regression, with the added nuance of considering the impact of the weights. The estimated coefficients still represent the change in the dependent variable for a one-unit change in the corresponding independent variable, holding other variables constant. However, these coefficients are now derived from a process that prioritizes more reliable observations.

When evaluating a WLS model, it is crucial to understand why certain observations were weighted differently. If the weights correctly reflect the inverse of the error variances, then the WLS estimates are more efficient—meaning they have smaller standard errors—compared to OLS estimates under heteroscedasticity. This translates to more precise and trustworthy prediction intervals and hypothesis tests. Examining residual plots after applying WLS can help confirm if the weighting scheme effectively addressed the heteroscedasticity, often showing a more uniform spread of residuals.

Hypothetical Example

Consider a financial analyst modeling the relationship between a company's advertising expenditure and its quarterly sales. The analyst collects sales data from various companies, but some companies provide sales figures based on very granular, real-time tracking, while others only offer estimated or less precise quarterly reports.

If the analyst were to use ordinary least squares, all sales data points would be treated equally, potentially leading to a biased model because the less precise data would unduly influence the regression line.

Instead, the analyst decides to use weighted least squares. For companies providing highly accurate, real-time sales data, they assign a higher weight (e.g., proportional to the inverse of a small estimated variance). For companies providing less precise, estimated sales data, they assign a lower weight (proportional to the inverse of a larger estimated variance).

By applying WLS, the regression algorithm will "listen" more to the more reliable data, producing a sales-to-advertising relationship that is less distorted by the noise in the less accurate observations. This approach would result in a more accurate model of how advertising impacts sales, which could then be used for more reliable forecasting and budgeting decisions.

Practical Applications

Weighted least squares finds numerous applications in quantitative finance and time series analysis where data quality or volatility varies.

Financial Modeling: WLS is extensively used to improve the accuracy and reliability of financial models, especially when dealing with heteroscedasticity and non-constant variance in financial data. For example, in modeling stock returns, the volatility of returns often changes over time (periods of high market uncertainty versus stable periods). WLS can assign lower weights to data points from high-volatility periods, thus yielding more robust estimates.
³Portfolio Optimization: In portfolio optimization and asset allocation, WLS can estimate the optimal weights of different assets in a portfolio. By assigning weights to assets based on their expected returns and covariance, it helps portfolio managers achieve investment objectives more effectively.
Risk Analysis and Stress Testing: WLS can be applied in risk analysis to estimate the potential impact of extreme events on financial portfolios. Scenarios can be weighted based on their likelihood and potential impact, allowing risk managers to identify vulnerabilities.
Return Predictability: Academic research has explored the use of weighted least squares in analyzing return predictability regressions, demonstrating its ability to yield more efficient estimates compared to OLS, especially when dealing with varying signal-to-noise ratios in financial data.

Li²mitations and Criticisms

Despite its advantages in handling heteroscedasticity, weighted least squares has its limitations and faces certain criticisms:

Unknown Weights: The primary challenge with WLS is that the true error variances (and thus the ideal weights) are rarely known in real-world applications. They must be estimated, which introduces an additional layer of uncertainty. If the estimated weights are inaccurate, especially in small samples or when the weight estimates for extreme data points are based on limited observations, the WLS results can be less reliable than OLS.
¹Sensitivity to Outliers: Like other least squares methods, WLS can be sensitive to outliers. If an outlier is mistakenly assigned a high weight due to an incorrect assumption about its error variance, it can disproportionately influence the model's coefficients, potentially leading to misleading predictions.
Interpretation Complexity: While the coefficients are still interpretable, the rationale behind the chosen weighting scheme must be clearly understood and justified. Transforming the original variables or weighting observations can make direct interpretation of the coefficients more challenging, as they are now conditional on the specific weighting applied.
Assumption of Uncorrelated Errors: WLS typically assumes that while variances may differ, the errors themselves are uncorrelated across observations. If errors are correlated (e.g., in some time series data), Generalized Least Squares (GLS) or more complex methods might be required for optimal parameter estimation.

Weighted Least Squares vs. Ordinary Least Squares

Weighted least squares (WLS) and Ordinary Least Squares (OLS) are both methods for estimating the unknown parameters in a linear regression analysis. The fundamental difference lies in their assumptions about the error terms.

Feature	Ordinary Least Squares (OLS)	Weighted Least Squares (WLS)
Assumption	Assumes homoscedasticity: constant variance of errors for all observations.	Addresses heteroscedasticity: allows for non-constant variance of errors across observations.
Error Treatment	Treats all prediction errors (residuals) equally in the minimization process.	Assigns different weights to errors, giving less reliable observations lower influence.
Weighting	Implicitly assigns equal weights (or a weight of 1) to all observations.	Explicitly assigns weights, typically inversely proportional to the estimated variance of each observation's error.
Efficiency	Inefficient and produces biased standard errors when heteroscedasticity is present.	More efficient and produces unbiased standard errors when heteroscedasticity is present and weights are correctly specified.
Application	Suitable when error variance is constant.	Preferred when error variance varies across observations, such as in financial data or survey data with varying precision.

While OLS is simpler to implement and interpret, its assumptions are often violated in real-world data points, particularly in fields like finance. WLS offers a more robust solution by explicitly incorporating varying data quality or error characteristics into the model fitting process, leading to more reliable inferences under such conditions.

FAQs

When should I use Weighted Least Squares instead of Ordinary Least Squares?

You should consider using weighted least squares when the assumption of homoscedasticity (constant error variance) is violated, meaning the variability of the error terms changes across different levels of your independent variables. This is common in financial data, survey data, or when observations have different levels of precision.

How are the "weights" determined in Weighted Least Squares?

Ideally, the weights are inversely proportional to the true variance of the error for each observation. For instance, if one observation's error variance is twice another's, it would receive half the weight. In practice, these true variances are often unknown and must be estimated, typically through a preliminary Ordinary Least Squares regression analysis and then regressing the squared residuals against the independent variables.

Can Weighted Least Squares fix all problems with my regression model?

No, weighted least squares primarily addresses issues related to heteroscedasticity (unequal error variances). It does not correct for other common regression problems such as omitted variable bias, multicollinearity, or misspecification of the functional form of the model. It also does not necessarily make the model robust to severe outliers if the weights themselves are influenced by these outliers.