Sample variance

What Is Sample Variance?

Sample variance is a statistical measure that quantifies the spread of data points in a sample relative to its mean. It is a fundamental concept within quantitative analysis and serves as an estimator for the variance of an entire population when only a subset of data is available. By indicating how far individual data points deviate from the average, sample variance provides insight into the volatility or dispersion of a dataset. In finance, understanding sample variance is crucial for assessing the risk associated with investment returns.

History and Origin

The concept of variance, from which sample variance is derived, has its roots in the late 19th and early 20th centuries, largely attributed to the pioneering work of Karl Pearson. Pearson, often referred to as one of the founders of modern statistics, formalized many statistical techniques still widely used today. His contributions included developing the standard deviation and variance as key measures of data dispersion.¹⁶,¹⁵ Pearson's work laid the foundation for applying statistical analysis to various fields, including biology, genetics, and eventually finance. His development of these tools provided a rigorous mathematical framework for quantifying the spread of data, which was essential for the advancement of empirical research and probabilistic modeling.¹⁴

Key Takeaways

Sample variance measures the average squared deviation of individual data points from the sample mean.
It is used to estimate the variability of a larger population when only a sample is observed.
A higher sample variance indicates greater dispersion or volatility within the dataset.
It is a key component in calculating standard deviation, another common measure of data spread.
In finance, sample variance helps assess investment risk and is a foundational element in portfolio theory.

Formula and Calculation

The formula for sample variance is a critical component of its utility, especially for ungrouped data. It accounts for the fact that a sample typically underestimates the true variability of the population, hence the use of (n-1) in the denominator, known as Bessel's correction, to provide an unbiased estimator of the population variance.¹³

The formula for sample variance ((s^2)) is:

s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}

Where:

(s^2) = Sample variance
(x_i) = Each individual data point in the sample
(\bar{x}) = The sample mean (the average of all data points in the sample)
(n) = The total number of data points in the sample
(\sum) = Summation symbol, indicating the sum of all squared differences

The numerator calculates the sum of the squared differences between each data point and the sample mean.¹² This squaring ensures that both positive and negative deviations contribute positively to the measure of spread and prevents them from canceling each other out.

Interpreting the Sample Variance

Interpreting sample variance involves understanding that it represents the average squared distance of each observation from the mean. A larger value for sample variance indicates that the individual data points are widely dispersed around the mean, suggesting higher volatility or risk in financial contexts. Conversely, a smaller sample variance implies that data points are clustered closely around the mean, indicating lower dispersion.

For instance, in analyzing investment returns, a high sample variance would suggest that returns have fluctuated significantly over the observed period. This might appeal to a risk-seeking investor but could deter a risk-averse individual. When comparing two investment options, the one with a lower sample variance for its historical returns would generally be considered less volatile, assuming similar expected returns. While useful, sample variance is expressed in squared units, which can make direct interpretation challenging. For practical interpretation, its square root, the standard deviation, is often preferred as it is in the same units as the original data.

Hypothetical Example

Consider a hypothetical investor, Sarah, who is evaluating two different stocks, Stock A and Stock B, based on their monthly returns over the last six months. She wants to use sample variance to understand the historical volatility of each stock.

Stock A Monthly Returns: 2%, 5%, -1%, 3%, 4%, 1%
Stock B Monthly Returns: 8%, -3%, 10%, -5%, 7%, 2%

Step 1: Calculate the mean ((\bar{x})) for each stock.

For Stock A:
(\bar{x}_A = (2 + 5 - 1 + 3 + 4 + 1) / 6 = 14 / 6 \approx 2.33%)

For Stock B:
(\bar{x}_B = (8 - 3 + 10 - 5 + 7 + 2) / 6 = 19 / 6 \approx 3.17%)

Step 2: Calculate the squared deviation for each data point ((x_i - \bar{x})^2)).

For Stock A:

((2 - 2.33)^{2 = (-0.33)}2 \approx 0.11)
((5 - 2.33)^{2 = (2.67)}2 \approx 7.13)
((-1 - 2.33)^{2 = (-3.33)}2 \approx 11.09)
((3 - 2.33)^{2 = (0.67)}2 \approx 0.45)
((4 - 2.33)^{2 = (1.67)}2 \approx 2.79)
((1 - 2.33)^{2 = (-1.33)}2 \approx 1.77)

Sum of squared deviations for Stock A (\approx 0.11 + 7.13 + 11.09 + 0.45 + 2.79 + 1.77 = 23.34)

For Stock B:

((8 - 3.17)^{2 = (4.83)}2 \approx 23.33)
((-3 - 3.17)^{2 = (-6.17)}2 \approx 38.07)
((10 - 3.17)^{2 = (6.83)}2 \approx 46.65)
((-5 - 3.17)^{2 = (-8.17)}2 \approx 66.75)
((7 - 3.17)^{2 = (3.83)}2 \approx 14.67)
((2 - 3.17)^{2 = (-1.17)}2 \approx 1.37)

Sum of squared deviations for Stock B (\approx 23.33 + 38.07 + 46.65 + 66.75 + 14.67 + 1.37 = 190.84)

Step 3: Apply the sample variance formula.

For Stock A ((n-1 = 6-1 = 5)):
(s^2_A = 23.34 / 5 = 4.668)

For Stock B ((n-1 = 6-1 = 5)):
(s^2_B = 190.84 / 5 = 38.168)

Conclusion: Stock B has a significantly higher sample variance (38.168) compared to Stock A (4.668). This indicates that Stock B's monthly returns have been much more dispersed around its average return, signifying greater historical volatility and potentially higher risk for Sarah.

Practical Applications

Sample variance plays a vital role in various areas of finance and investing:

Risk Assessment: It is a core measure of volatility for individual securities and portfolios. Investors use sample variance to gauge the potential fluctuations in investment returns and assess the inherent risk of an asset. Assets with higher sample variance are generally considered riskier.¹¹,¹⁰
Portfolio Management: In modern portfolio theory, sample variance (or its square root, standard deviation) is used to calculate portfolio variance, which is crucial for optimizing asset allocation. By combining assets with different levels of volatility and correlations, managers aim to achieve desired risk-return profiles.
Performance Evaluation: Analysts may compare the sample variance of an investment's returns against a benchmark or other similar investments to understand its relative stability. While not a standalone measure of performance, it contributes to a holistic view of an investment's characteristics.
Financial Modeling: Sample variance is a key input in many financial models, including those used for pricing options (like the Black-Scholes model) and simulating future market scenarios (e.g., Monte Carlo simulations).
Regulatory Compliance: Regulatory bodies, such as the U.S. Securities and Exchange Commission (SEC), require certain financial disclosures that indirectly rely on measures of variability and risk. While the SEC does not typically mandate specific sample variance calculations for public reporting, the underlying principles of risk quantification influence how firms analyze and report their financial health and operations. The SEC's rulemaking process emphasizes transparency and investor protection, often requiring disclosures related to financial risks.⁹,⁸

Limitations and Criticisms

Despite its widespread use, sample variance has several limitations and criticisms, particularly in the context of financial risk management:

Symmetry Assumption: Sample variance treats both positive (upside) and negative (downside) deviation from the mean equally. However, investors are typically more concerned with downside risk—the potential for losses—than with positive deviations. This symmetrical treatment can lead to an incomplete picture of an investment's true risk profile.
⁷ Reliance on Historical Data: Sample variance is calculated using past observations. While historical data can be indicative, it does not guarantee future performance. Rapidly changing market conditions or unforeseen events can render historical sample variance less predictive of future volatility.
⁶ Sensitivity to Outliers: Because sample variance involves squaring the deviations, extreme data points (outliers) can disproportionately influence the result, potentially misrepresenting the typical spread of the data.
Units of Measurement: The result of sample variance is in squared units of the original data, which can make it less intuitive for direct interpretation compared to the standard deviation. If returns are measured in percentage points, the variance is in "squared percentage points."
Assumptions of Normality: Many financial models that use variance implicitly assume that returns follow a normal probability distribution. However, financial returns often exhibit "fat tails" (more frequent extreme events) and skewness (asymmetrical distributions), meaning a normal distribution assumption may not accurately capture true market behavior., Fo⁵r⁴ this reason, alternative measures like Expected Shortfall (Conditional Value-at-Risk) are sometimes preferred for capturing extreme tail risk.

##³ Sample Variance vs. Standard Deviation

Sample variance and standard deviation are closely related measures of dispersion, both derived from the same set of data points. The key difference lies in their calculation and interpretation. Sample variance is the average of the squared deviations from the mean of a sample. It provides a numerical value representing the overall spread, but its units are squared, making it less intuitive for direct understanding. Standard deviation, on the other hand, is simply the square root of the sample variance. This transformation brings the measure back to the original units of the data, making it more interpretable. For example, if a stock's returns are in percentages, its sample variance will be in "percentage squared," while its standard deviation will be in percentages. Therefore, standard deviation is more commonly reported and understood as a measure of volatility in finance and capital markets.

FAQs

What does a high sample variance indicate?

A high sample variance indicates that the data points in your sample are widely spread out from the average (mean). In finance, this implies greater volatility or unpredictability in investment returns.

Why is (n-1) used in the sample variance formula instead of (n)?

The use of (n-1) (known as Bessel's correction) in the denominator is to provide an unbiased estimator of the true population variance. When working with a sample, the sample mean is used instead of the true population mean, which can slightly underestimate the true spread of the data. Dividing by (n-1) corrects for this bias, making the sample variance a more accurate estimate of population volatility.

##²# Can sample variance be negative?
No, sample variance cannot be negative. This is because the calculation involves squaring the deviation of each data point from the mean. Squaring any real number (positive or negative) always results in a non-negative value. Therefore, the sum of squared deviations will always be non-negative, and since (n-1) is also positive, the sample variance will always be non-negative.

How is sample variance used in modern portfolio theory?

In modern portfolio theory, sample variance is a fundamental measure of an individual asset's risk. It is also used in conjunction with covariance to calculate the overall risk (variance) of a diversified portfolio. This helps in making informed decisions about asset allocation to achieve desired risk-return profiles.

Is sample variance the same as population variance?

No, sample variance and population variance are distinct. Population variance measures the spread of all data points in an entire population. Sample variance, conversely, is an estimate of the population variance calculated from a subset (sample) of that population. While related, they use slightly different formulas due to the statistical adjustment required when inferring population characteristics from a sample.¹