Sample median

What Is Sample Median?

The sample median is the middle value in a numerically ordered data set of observations. As a key measure of central tendency within descriptive statistics and data analysis, the sample median divides a collection of data points into two equal halves, meaning 50% of the values fall below it and 50% fall above it. Unlike the arithmetic mean, the sample median is particularly resistant to the influence of extreme values or outliers in the data.

History and Origin

The concept of the median as a statistical measure of central tendency has roots stretching back centuries, though its formal recognition and widespread use in statistics are more recent. Early mentions of a "middle" value can be traced to various fields, but Edward Wright, a mathematician, is often credited with proposing its use in 1599 in the context of calculating variance in a compass.⁶ Later, in the 19th century, figures like Francis Galton championed the median for its utility, especially when dealing with skewed distributions or data influenced by extreme observations.⁵ Its acceptance grew as statisticians sought more robust measures that could accurately represent the "typical" value in diverse data sets, particularly those with asymmetrical frequency distribution.

Key Takeaways

The sample median represents the exact middle value of a data set when arranged in ascending or descending order.
It is less susceptible to the distortion caused by outliers or extreme values compared to the sample mean.
The calculation method for the sample median differs slightly depending on whether the data set contains an odd or even number of observations.
It is a fundamental measure of central tendency and is widely used in economic reporting, financial analysis, and other fields.
The sample median provides a robust indicator of the typical value in skewed wealth distribution or income data.

Formula and Calculation

To calculate the sample median, the first step is always to arrange the data points in ascending or descending order. The calculation then depends on the total number of observations in the data set, denoted as (n).

For an odd number of observations:
If (n) is odd, the sample median is the middle value in the ordered list. Its position is given by:

\text{Median Position} = \frac{n+1}{2}

The value at this position is the sample median.

For an even number of observations:
If (n) is even, there are two middle values in the ordered list. The sample median is the average of these two middle values. Their positions are given by:

\text{First Middle Position} = \frac{n}{2} \\ \text{Second Middle Position} = \frac{n}{2} + 1

The sample median is then the average of the values at these two positions.

Interpreting the Sample Median

Interpreting the sample median involves understanding its position as the true midpoint of a data set. When examining a series of financial returns or investment performance figures, the sample median indicates the point where half of the observations fall below it and half fall above. This characteristic makes it especially useful when the data might contain unusually high or low values, as these extremes do not distort the median's representation of the typical value. For instance, in market data, a few exceptionally volatile days would heavily influence the mean return, but the median return would still reflect the more typical daily performance. It provides a clearer picture of the general distribution, helping analysts understand the inherent variability within the data without being swayed by exceptional events or measurement anomalies.

Hypothetical Example

Consider a small investment portfolio's annual returns over seven years: 5%, 8%, -2%, 15%, 7%, 6%, and 9%.

To find the sample median, first arrange the returns in ascending order:
-2%, 5%, 6%, 7%, 8%, 9%, 15%

Since there are seven observations ((n=7)), an odd number, the sample median is the middle value. The position is ((7+1)/2 = 4).

The value at the 4th position is 7%. Therefore, the sample median annual return for this portfolio performance is 7%. This suggests that in half of the years, the portfolio earned 7% or less, and in the other half, it earned 7% or more. This measure offers a straightforward view of the typical return, especially given the presence of a negative return, an outlier that might disproportionately affect other statistical measures.

Practical Applications

The sample median finds extensive use across various financial and economic contexts due to its robustness to extreme values.

Economic Reporting: Government agencies and economists frequently use median household income and median wealth data to describe the financial well-being of a population. This provides a more accurate representation of typical income or wealth, as it is not skewed by the disproportionately high incomes or assets of a small percentage of extremely wealthy individuals. For example, the Federal Reserve provides detailed tables on the distribution of household wealth in the U.S., using the median as a primary indicator.⁴ Similarly, the St. Louis Federal Reserve provides historical data on real median household income in the United States.³ This data is also utilized in legal frameworks, such as the U.S. Trustee Program's means testing for bankruptcy, which relies on Census Bureau median income data.²
Real Estate: Median home prices are widely reported to give a typical value for properties in a given area. This is preferred over the mean price because a few very expensive or inexpensive homes can significantly distort the average, making the median a more reliable indicator for buyers and sellers in investment analysis.
Salary and Wage Analysis: Companies and labor organizations often cite median salaries or wages to reflect compensation levels more realistically, minimizing the impact of a few highly paid executives or low-wage part-time employees.
Investment Analysis: While the mean is common for investment returns, the sample median can offer insight into the typical return of an asset or portfolio, especially when dealing with data sets that might have occasional, exceptionally large gains or losses that could skew the arithmetic mean.

Limitations and Criticisms

While the sample median offers significant advantages, particularly its robustness to outliers, it also has certain limitations.

One criticism is that the sample median does not incorporate the magnitude of all data points in its calculation, only their position. This means it might not fully reflect the overall distribution or variability of the data. For instance, two different data sets could have the same median but vastly different ranges or densities of values. This can lead to a less complete understanding of the data's spread compared to measures like standard deviation.

Additionally, for small data sets, the sample median might not be as stable as the mean. In statistical analysis, while the median is generally robust to single extreme values, its efficiency (how well it estimates the true population median) can be lower than that of the mean for perfectly symmetric distributions without outliers. However, its strength lies precisely when distributions are skewed or contain problematic observations. Research highlights its utility as a robust estimator in various statistical applications, including meta-analysis, where it can provide a more reliable measure of central tendency in the presence of small-study effects and outliers.¹

Sample Median vs. Sample Mean

The sample median and the sample mean are both measures of central tendency, but they convey different information about a data set. The sample mean, often referred to as the "average," is calculated by summing all data points and dividing by the number of observations. It uses every value in the data set in its calculation, making it sensitive to extreme values or outliers. A single very large or very small number can significantly pull the mean away from the bulk of the data.

In contrast, the sample median is the middle value in an ordered data set. Its position, rather than the numerical magnitude of all values, is what determines it. Consequently, the sample median is unaffected by how far outliers are from the rest of the data, providing a more robustness measure for skewed distributions. For example, in typical income data, where a small number of extremely high earners exist, the median income will be a more accurate reflection of what a typical person earns than the mean income, which would be inflated by the outliers. However, the sample mean is preferred for symmetric distributions without outliers, as it incorporates all data and is often more efficient for statistical inference in such cases. When a distribution exhibits skewness, the mean is pulled in the direction of the skew, while the median remains closer to the center of the majority of the data.

FAQs

Why is the sample median often preferred over the mean in certain financial contexts?

The sample median is preferred when a data set contains outliers or is significantly skewed, as it is not distorted by these extreme values. For instance, in wealth distribution or real estate prices, the median gives a more realistic picture of the "typical" value than the mean.

How does the sample median relate to percentiles?

The sample median is equivalent to the 50th percentile. This means that 50% of the data points in a sorted list are below the median, and 50% are above it. It's a specific type of quantile, which divides a frequency distribution into equal parts.

Can a data set have no sample median?

No. Every numerical data set that can be ordered will have a sample median. If there's an odd number of observations, it's the single middle value. If there's an even number, it's the average of the two middle values.

What is the difference between sample median and sample mode?

The sample median is the middle value of an ordered data set. The mode is the value that appears most frequently in a data set. A data set can have one mode (unimodal), multiple modes (multimodal), or no mode if all values appear with the same frequency.