What Is Median?
Median is a statistical measure representing the middle value in an ordered data set. When a series of numbers is arranged in ascending or descending order, the median is the value that separates the higher half from the lower half of the data, making it a key component of descriptive statistics. This measure is particularly useful within data analysis and quantitative analysis to understand the central tendency of a distribution, especially when the data might contain extreme values or outliers. Unlike the arithmetic mean, the median is not influenced by these extreme values, offering a more representative central point for skewed distributions.
History and Origin
The concept of the median has roots tracing back centuries, with early astronomers and mathematicians recognizing the utility of a middle-point value in a series of observations. Edward Wright, a mathematician working on compass variation, is credited with one of the earliest documented proposals for its use in 159918, 19. He suggested that among many observations, the "middlemost" was most likely to be closest to the truth17.
Later, in 1843, Antoine Augustin Cournot first used the term "valeur médiane" to describe the value dividing a probability distribution into two equal halves. Francis Galton, a prominent statistician of the late nineteenth century, further popularized the English term "median" in 1881, having previously used "middle-most value" and "medium".16 His work contributed significantly to establishing the median as a formal part of data analysis.
Key Takeaways
- The median is the middle value in a numerically ordered data set.
- It effectively divides a data set into two equal halves, with 50% of values above it and 50% below.
- The median is robust to outliers, meaning extreme values do not significantly distort it.
- It is often preferred over the mean for skewed distributions, such as income or wealth distribution data.
- Calculation requires ordering the data first, then identifying the central value(s).
Formula and Calculation
The calculation of the median depends on whether the total number of observations (n) in the data set is odd or even.
Step 1: Order the Data
First, arrange all data points in ascending (or descending) order.
Step 2: Determine the Position of the Median
-
For an odd number of observations (n):
The median is the value at the central position.
For example, in the data set {3, 5, 8, 11, 15}, n=5. The median position is (5+1)/2 = 3. The 3rd value is 8, so the median is 8. -
For an even number of observations (n):
The median is the average of the two middle values.
For example, in the data set {10, 12, 15, 18, 20, 22}, n=6. The median positions are 6/2 = 3 and (6/2)+1 = 4. The 3rd value is 15, and the 4th value is 18. The median is (15 + 18) / 2 = 16.5.
These calculations provide a precise asset valuation when dealing with ordered numerical data.
Interpreting the Median
Interpreting the median provides insight into the typical value within a data set, especially when data is not symmetrically distributed. For instance, in discussions of household income inequality, the median household income is frequently cited as a more accurate representation of the financial standing of the "middle" household than the mean. This is because a small number of extremely high incomes would inflate the mean, but not the median.
When evaluating numerical information in financial analysis, understanding the median's position relative to other measures of central tendency, such as the mean, can reveal important characteristics about the data's skewness. If the median is lower than the mean, it suggests a right-skewed distribution (more high values pulling the mean up), common in income or real estate price data. Conversely, if the median is higher than the mean, the distribution is left-skewed.
Hypothetical Example
Consider a small start-up company with the following annual salaries for its seven employees (in USD):
$40,000, $45,000, $50,000, $55,000, $60,000, $70,000, $300,000
To find the median salary:
- Order the data: The salaries are already ordered from lowest to highest.
{40,000, 45,000, 50,000, 55,000, 60,000, 70,000, 300,000} - Count the number of observations (n): There are 7 employees, so n=7.
- Calculate the median position: Since n is odd, the median position is ((n+1)/2 = (7+1)/2 = 4).
- Identify the median value: The value at the 4th position in the ordered list is $55,000.
Thus, the median salary for this company is $55,000. This figure offers a realistic representation of a typical salary, as the outlier salary of $300,000 for the CEO does not disproportionately influence it. This contrasts with the mean salary, which would be significantly higher ($88,571.43), potentially giving a misleading impression of general employee compensation. This example highlights the median's effectiveness in providing an accurate picture of employee compensation, crucial for considerations like risk assessment related to employee satisfaction.
Practical Applications
The median is widely applied in various financial contexts due to its resistance to extreme values, making it particularly valuable when analyzing skewed data.
- Income and Wealth Studies: Government bodies and researchers frequently use median income and median household wealth to track economic well-being and identify trends in income inequality. For example, the U.S. Census Bureau and the Federal Reserve regularly publish median household income and wealth data, offering crucial economic indicators that reflect the financial status of the typical American household.14, 15 Similarly, the OECD uses median disposable income to compare living standards across its member countries.13
- Real Estate Markets: Median home prices are a standard metric used to describe housing markets. Unlike the average, the median price is not skewed by a few extremely expensive or inexpensive properties, offering a more accurate reflection of what a typical home might cost in a given area.
- Investment Performance: While often using mean returns, median returns can provide a useful alternative perspective for investment performance over varying periods, especially for portfolios or funds that may have experienced significant, one-time gains or losses that would skew an average.
- Salary and Compensation Analysis: Businesses and economists use median salaries to benchmark compensation levels for different job roles or industries. This provides a fair comparison that isn't distorted by a few highly paid executives or very low-paid entry-level positions. The U.S. Bureau of Labor Statistics, for instance, provides median weekly earnings for various occupations.12
- Portfolio Management: In portfolio management, analysts might look at the median size of holdings or the median asset allocation across a group of similar portfolios to identify common characteristics or deviations.
Limitations and Criticisms
Despite its advantages, the median has certain limitations that warrant consideration. One primary criticism is that the median does not utilize all observations in a data set for its calculation, only focusing on the middle value(s).9, 10, 11 This means that information about the magnitude of the values beyond the median, especially the extreme values, is not incorporated into the measure itself.7, 8 For instance, two vastly different data sets, such as {1, 2, 3} and {1, 2, 100000}, would have the same median (2) despite their significant differences in overall sum and distribution spread.6
Furthermore, the median is less amenable to certain mathematical and algebraic calculations compared to the mean.4, 5 This can make it less suitable for advanced statistical analyses where a measure that incorporates all data points and their precise values is necessary. For example, if the goal is to determine the total sum of all values in a data set, knowing only the median and the number of observations is insufficient, whereas the mean easily allows for such a calculation.2, 3 The median may also be less reliable for small data set sizes, where its representativeness can be limited.1
Median vs. Mean
The median and the mean are both measures of central tendency, but they approach the concept of "average" differently, making each suitable for different analytical contexts.
Feature | Median | Mean |
---|---|---|
Definition | The middle value of an ordered data set. | The arithmetic average (sum of all values divided by the count). |
Sensitivity to Outliers | Not affected by extreme values. | Highly affected by extreme values. |
Uses All Data Points | Only considers the position of the middle value(s). | Utilizes every value in its calculation. |
Data Distribution | Preferred for skewed distributions (e.g., income, home prices). | Preferred for symmetric or normally distributed data. |
Algebraic Properties | Less amenable to further mathematical calculations. | Highly amenable to further mathematical calculations. |
Common Application | Typical value in skewed data, like median household income. | Average value, like average test scores or stock returns. |
The choice between median and mean often depends on the nature of the data set and the objective of the financial analysis. When extreme values are present and a representative "middle" is desired, the median is often the more appropriate measure.
FAQs
What is the primary advantage of using the median?
The primary advantage of the median is its robustness to outliers. Unlike the mean, extreme values in a data set do not disproportionately influence the median, making it a more accurate representation of the "typical" value in skewed distributions, such as income inequality data.
When should the median be used instead of the mean?
The median should be used instead of the mean when the data set contains extreme values or is highly skewed. Common examples include analyzing household incomes, real estate prices, or wealth distribution, where a few very high or very low values could significantly distort the mean and give a misleading impression of the central tendency.
Can the median be used for non-numeric data?
The median can be used for ordinal data, which is categorical data that has a meaningful order (e.g., education levels: high school, bachelor's, master's). However, it cannot be used for nominal data, which is categorical data without a specific order (e.g., types of investments), as ordering is a prerequisite for calculating the median.