Skip to main content
← Back to M Definitions

Median

LINK_POOL:

What Is Median?

The median is a statistical measure representing the middle value in a dataset when the values are arranged in ascending or descending order. It is a key concept within descriptive statistics, a branch of mathematics used to summarize and describe the features of a dataset. Unlike other measures of central tendency such as the mean, the median is less affected by extremely large or small values, often referred to as outliers. This makes the median a robust indicator of the typical value, particularly useful in financial data analysis where data distributions can be uneven.

History and Origin

The concept of the median has roots tracing back to the 16th century. Edward Wright, in his 1599 book on navigation, suggested using a middle value in a series of observations to determine location, believing it to be the most likely correct value. The formal term "median" (valeur médiane) was first used in 1843 by French mathematician Antoine Augustin Cournot, referring to the value that divides a probability distribution into two equal halves., 8Later, Gustav Theodor Fechner popularized the median in the formal analysis of sociological and psychological phenomena, further establishing its role as a significant statistical measure.

Key Takeaways

  • The median is the middle value of a dataset when ordered numerically.
  • It is a robust measure of central tendency, less influenced by extreme values or outliers.
  • The median effectively divides a dataset into two equal halves, with 50% of values falling above it and 50% falling below.
  • It is particularly useful for analyzing skewed distributions where the mean might be misleading.
  • The median is a key metric in various financial and economic analyses, including income and housing price reporting.

Formula and Calculation

Calculating the median depends on whether the dataset contains an odd or even number of values.

Step 1: Order the Data
Arrange all data points in ascending order.

Step 2: Find the Middle Value

  • For an odd number of data points (n): The median is the value at the position (\frac{n+1}{2}).

    Example: For the dataset {10, 15, 20, 25, 30}, n = 5. The median is at position (\frac{5+1}{2} = 3), which is 20.

  • For an even number of data points (n): The median is the average of the two middle values, found at positions (\frac{n}{2}) and (\frac{n}{2} + 1).

    Example: For the dataset {10, 15, 20, 25, 30, 35}, n = 6. The median is the average of the values at positions (\frac{6}{2} = 3) (which is 20) and (\frac{6}{2} + 1 = 4) (which is 25).

    Median=Value at (n2)th position+Value at (n2+1)th position2\text{Median} = \frac{\text{Value at } (\frac{n}{2})\text{th position} + \text{Value at } (\frac{n}{2} + 1)\text{th position}}{2}

    In this example, the median is (\frac{20+25}{2} = 22.5).

The median can also be understood as the 50th percentile of a dataset, dividing the data into lower and upper halves. It is also the second of the three quartiles.

Interpreting the Median

The median offers a clear interpretation: it is the point where half the observed values fall below it and half fall above it. This makes it an intuitive measure for understanding the "typical" value within a dataset, especially when dealing with variables like income distribution or asset prices. For example, if the median household income in a region is $75,000, it means that 50% of households earn less than $75,000 and 50% earn more. This interpretation provides a practical benchmark for evaluating economic conditions or individual financial standing. The median helps in grasping the general magnitude and spread of financial financial metrics without being skewed by a few extremely high or low data points.

Hypothetical Example

Consider an investment club with five members who report their annual portfolio returns: 5%, 8%, 12%, 18%, and 100%.

  1. Order the returns: First, arrange the returns in ascending order: 5%, 8%, 12%, 18%, 100%.
  2. Identify the middle value: Since there are five data points (an odd number), the median is the middle value. In this ordered list, the third value is 12%.

Therefore, the median portfolio return for this investment club is 12%. While the average (mean) return would be significantly inflated by the 100% return, the median provides a more representative picture of a "typical" return for the majority of the members, illustrating its resistance to outliers. This makes it valuable for understanding the general performance without distortion.

Practical Applications

The median is widely used across various financial domains due to its robustness against extreme values.

  • Real Estate: In the housing market, the median sales price of homes is a frequently cited statistic. For instance, the Federal Reserve Bank of St. Louis (FRED) publishes median sales prices for houses sold in the United States. This is preferred over the mean because a few very high-priced luxury homes would heavily inflate the average, giving a misleading impression of affordability and market trends for typical homes.
    7* Income and Wealth: Government agencies like the U.S. Census Bureau extensively use median household income to report economic well-being. This provides a more accurate reflection of the typical household's financial standing, as it is not distorted by the incomes of a small number of extremely wealthy individuals.
    6* Investment Analysis: In investment analysis, the median can be used to analyze returns on a set of mutual funds or the performance of a portfolio, offering a clearer view when some investments have exceptionally high or low returns. It helps to understand the typical performance without being skewed by a few extreme outcomes.
  • Compensation and Salaries: Companies and labor organizations often use median salaries to benchmark compensation levels, providing a more realistic figure than the mean, which could be skewed by a few highly paid executives.

Limitations and Criticisms

While the median is a valuable central tendency measure, it has certain limitations. One primary criticism is that it does not consider the precise value of every observation in the dataset, potentially leading to a loss of information, especially compared to the mean. 5This characteristic means that the median does not reflect the magnitude of differences between data points, only their relative position. For example, two datasets with the same median can have vastly different ranges or distributions.
4
Furthermore, the median is not as amenable to complex mathematical calculations as the mean, which limits its use in certain advanced statistical tests and modeling. 3When data is highly symmetrical or normally distributed, the mean, median, and mode will be very close, and the mean might offer more statistical power. However, when data exhibits strong skewness, relying solely on the median without considering other measures or the overall distribution can lead to an incomplete understanding of the data. 2As FasterCapital notes, the median can be misleading when used without other measures of central tendency or dispersion.
1

Median vs. Mean

The median and the mean are both fundamental measures of central tendency, but they convey different aspects of a dataset and are best suited for different situations.

FeatureMedianMean (Arithmetic Average)
DefinitionThe middle value in an ordered dataset.The sum of all values divided by the number of values.
Sensitivity to OutliersHighly resistant to outliers.Highly sensitive to outliers; extreme values can significantly pull the mean towards them.
Data DistributionPreferred for skewed distributions, where data is not symmetrical (e.g., income, asset valuation).Best for symmetrical or normally distributed data, where values are evenly spread around the center.
Information UsedUses positional information; only considers the middle value(s).Uses all data points in its calculation, reflecting the total magnitude of the values.
CalculationRequires ordering the data, then finding the middle element(s).Requires summing all values and dividing.
InterpretationRepresents the "typical" value, where half the data is below and half is above.Represents the "average" value, or the balancing point of the dataset.

Confusion often arises when interpreting data, particularly in financial contexts where distributions are frequently skewed. For example, if discussing average CEO compensation, the mean might be astronomically high due to a few outliers with extraordinary earnings. In contrast, the median CEO compensation would provide a more realistic picture of what a "typical" CEO earns, as it is unaffected by these extreme figures. For effective risk management and informed decision-making, understanding when to apply the median versus the mean is crucial for accurate data analysis.

FAQs

When is the median a better measure than the mean?

The median is generally a better measure than the mean when the dataset contains outliers or is significantly skewed (asymmetrical). For example, when analyzing household incomes or home prices, the median provides a more representative "typical" value because it is not distorted by a few extremely high or low values.

Can the median be used for non-numeric data?

The median can be used for ordinal data, which is categorical data that has a meaningful order or ranking, even if the intervals between values are not uniform. For instance, customer satisfaction ratings (e.g., "poor," "fair," "good," "excellent") could have a median, as they can be ordered. However, it cannot be used for nominal data, which has no inherent order (e.g., colors, types of assets).

Is the median always a value within the dataset?

If the dataset has an odd number of values, the median will always be one of the values in the dataset. If the dataset has an even number of values, the median is calculated as the average of the two middle values, and this result may or may not be one of the original data points.

How does the median relate to percentiles?

The median is equivalent to the 50th percentile. This means that 50% of the data points in the set fall below the median, and 50% fall above it. It is also the second quartile, dividing the lower 50% from the upper 50% of the data.