Skewed data

What Is Skewed Data?

Skewed data refers to a distribution of data points that is not symmetrical around its mean. In a perfectly symmetrical distribution, such as a normal distribution, data points are evenly distributed on both sides of the mean, meaning the mean, median, and mode are all equal. However, in skewed data, the data points tend to cluster more on one side, causing the "tail" of the distribution to be longer on either the left or right side. This concept is fundamental in statistical analysis and quantitative finance, as it reveals important characteristics about the underlying data beyond simple central tendency or variance. Understanding skewed data is crucial for accurate descriptive statistics and informed decision-making in financial contexts, particularly in risk management.

History and Origin

The concept of skewness, as a measure of the asymmetry of a probability distribution, gained prominence with the foundational work of statisticians in the late 19th and early 20th centuries. Karl Pearson, a pivotal figure in the development of modern statistics, formalized the concept of statistical moments, including skewness and kurtosis. His work in applying mathematical methods to biological and social data led to the systematic development of descriptive statistics. Pearson introduced various statistical tools to describe data quantitatively, including measures of data dispersion like standard deviation and variance.¹⁰,⁹

Key Takeaways

Skewed data indicates an asymmetrical distribution where values are not evenly distributed around the mean.
Positive skew (right-skewed) means the tail extends to the right, often implying a few high values pulling the mean higher than the median.
Negative skew (left-skewed) means the tail extends to the left, often implying a few low values pulling the mean lower than the median.
Understanding skewness is vital in finance for assessing risk, particularly in asset returns, where non-normal distributions are common.
Skewness, along with other statistical moments, helps provide a more complete picture of data characteristics beyond just average and volatility.

Formula and Calculation

Skewness is a measure of the asymmetry of a probability distribution. The most common method for calculating skewness is Pearson's moment coefficient of skewness, often referred to as the third standardized moment.

The formula for sample skewness (g1) is:

g_1 = \frac{n}{(n-1)(n-2)} \sum_{i=1}^{n} \left( \frac{x_i - \bar{x}}{s} \right)^3

Where:

(x_i) = individual data point
(\bar{x}) = the sample mean of the data
(s) = the sample standard deviation of the data
(n) = the number of data points in the sample

Alternatively, the population skewness ((\gamma_1)) is defined using moments:

\gamma_1 = \frac{\mu_3}{\sigma^3}

Where:

(\mu_3) = the third central moment, which is (E[(X - \mu)^3]) (Expected value of the cubed difference from the mean)
(\sigma) = the standard deviation of the population

Interpreting Skewed Data

Interpreting skewed data involves examining the shape of its probability distribution:

Zero Skew: If the skewness value is approximately zero, the distribution is symmetrical. In this case, the mean, median, and mode are all equal or very close to each other. A perfect normal distribution has zero skew.
Positive Skew (Right-Skewed): A positive skewness value indicates that the tail of the distribution extends to the right. This means there are a few unusually high values that pull the mean to the right of the median and mode. For positively skewed data, typically Mode < Median < Mean. An example might be household income, where a few very high earners pull the average income higher than what most people earn.
Negative Skew (Left-Skewed): A negative skewness value indicates that the tail of the distribution extends to the left. This means there are a few unusually low values that pull the mean to the left of the median and mode. For negatively skewed data, typically Mean < Median < Mode. An example could be the age of death in a developed country, where most people live to an old age, but a few early deaths extend the left tail.

In financial contexts, understanding skewness provides insight into the likelihood of extreme positive or negative outcomes, which is crucial for investment analysis.

Hypothetical Example

Consider a small hypothetical investment fund's monthly returns over six months: 2%, 3%, 2.5%, 1%, 0.5%, 20%.

Calculate the Mean:
(\bar{x} = \frac{(2 + 3 + 2.5 + 1 + 0.5 + 20)}{6} = \frac{29}{6} \approx 4.83%)
Order the Data to Find Median:
0.5%, 1%, 2%, 2.5%, 3%, 20%
The median is the average of the two middle values: (\frac{(2 + 2.5)}{2} = 2.25%)
Identify the Mode:
There is no repeating mode in this small dataset.
Observe the Skew:
The mean (4.83%) is significantly higher than the median (2.25%). This is because of the single outlier return of 20%. If we were to calculate the full skewness formula, it would yield a large positive value, indicating significant positive skew. This suggests that while most returns were modest, there was one exceptionally high return. In a real portfolio management scenario, such skewness helps investors understand the nature of potential returns.

Practical Applications

Skewed data has several important practical applications in finance and economics:

Asset Returns Analysis: Financial asset returns, such as stock prices or bond yields, rarely follow a perfect normal distribution. They often exhibit skewness, indicating a higher probability of either large positive or large negative returns. For instance, equity returns frequently show negative skew, meaning there's a higher chance of moderate gains but a small chance of very large losses. This insight is critical for asset allocation and determining risk appetites.⁸,⁷,⁶
Option Pricing: Skewness is a key factor in the pricing of options. The phenomenon of "volatility skew" or "volatility smile" in options markets reflects the market's expectation of future price movements that deviate from a normal distribution. For example, out-of-the-money put options on equity indices are often more expensive than out-of-the-money call options, implying a market fear of sharp downside movements (negative skew).⁵
Risk Modeling: In financial modeling and risk management, incorporating skewness into models provides a more realistic assessment of potential losses or gains. Traditional risk measures that assume normal distributions can underestimate tail risks (extreme events) if the actual distribution is significantly skewed. The Federal Reserve Bank of San Francisco, for example, publishes data on Treasury Yield Skewness as an indicator of future interest rate risks, demonstrating the real-world utility of this measure.⁴
Credit Risk: For analyzing credit risk, distributions of default rates or loan losses are often highly positively skewed, meaning a high probability of many small losses and a small probability of very large, catastrophic losses.

Limitations and Criticisms

While highly informative, analyzing skewed data and the skewness measure itself comes with certain limitations and criticisms:

Sensitivity to Outliers: Skewness can be highly sensitive to extreme values or outliers in a dataset. A single unusually high or low data point can significantly impact the skewness coefficient, potentially distorting the perceived asymmetry of the overall distribution.
Interpretation with Kurtosis: Skewness should ideally be interpreted in conjunction with kurtosis, another statistical moment that measures the "tailedness" or "peakedness" of a distribution. A distribution can be skewed but also have fat tails (high kurtosis), meaning extreme events are more likely than a normal distribution, regardless of the skew. Ignoring kurtosis when analyzing skewness can lead to an incomplete or misleading understanding of the data's risk profile, especially in portfolio management.³,²
Predictive Power: While skewness can offer insights into past data, its predictive power for future returns or market behavior is not always consistent. Markets are dynamic, and past skewness does not guarantee future outcomes. Some research suggests that while skewness is a significant indicator of returns, especially across asset classes, its relationship with future returns can be complex and may not always align with intuitive preferences.¹
Data Requirements: Calculating reliable skewness requires a sufficient amount of data. For small sample sizes, the calculated skewness might not accurately represent the true underlying population distribution.

Skewed Data vs. Kurtosis

While both skewed data and kurtosis describe the shape of a data distribution, they measure different aspects of that shape. Skewness quantifies the asymmetry of the distribution. It tells us if the data points are concentrated more on one side of the mean, resulting in a longer tail on either the left or right. A positive skew means the tail is on the right (more extreme high values), and a negative skew means the tail is on the left (more extreme low values).

Kurtosis, on the other hand, measures the "tailedness" or "peakedness" of a distribution relative to a normal distribution. It describes the likelihood of extreme values (outliers) in either tail. A distribution with high kurtosis (leptokurtic) has fatter tails and a more pronounced peak than a normal distribution, indicating a higher probability of extreme events. Conversely, a distribution with low kurtosis (platykurtic) has thinner tails and a flatter peak. While skewness is about the direction of asymmetry, kurtosis is about the extremity of deviations from the center, regardless of direction. Both are considered "higher moments" in descriptive statistics and provide a more comprehensive understanding of data than just mean and standard deviation.

FAQs

What does "positively skewed" mean?

Positively skewed data, also known as right-skewed, means the tail of the distribution extends to the right. This indicates that most of the data points are clustered on the lower end of the range, but there are a few unusually high values that pull the mean towards the right. In such a distribution, the mean is typically greater than the median.

Why is skewed data important in finance?

Skewed data is particularly important in finance because financial returns often do not follow a symmetrical normal distribution. Understanding the skewness of asset returns helps investors assess the likelihood of extreme positive or negative outcomes. For instance, a negatively skewed return distribution for a stock implies a higher probability of small gains and a lower, but significant, probability of large losses, which is critical for risk management and investment decision-making.

Can a distribution have zero skewness but not be a normal distribution?

Yes, a distribution can have zero skewness and still not be a normal distribution. Zero skewness only indicates symmetry around the mean. However, a symmetrical distribution can still differ from a normal distribution in its kurtosis (how peaked or flat it is, and how thick its tails are). For example, a uniform distribution or a Laplace distribution are symmetrical but have different kurtosis values than a normal distribution.