Skip to main content
← Back to H Definitions

Histogram

What Is a Histogram?

A histogram is a graphical representation that organizes a group of data points into user-specified ranges. It is a fundamental tool in statistical analysis used to visualize the frequency distribution of numerical data, providing insights into its underlying distribution. By grouping data into "bins" or intervals, a histogram shows how many data points fall within each range, with the height of each bar representing the frequency of observations in that bin. This visual summary helps in understanding the shape, spread, and central tendency of a data set.

History and Origin

The term "histogram" was first introduced by British mathematician and statistician Karl Pearson on November 18, 1891, during his Gresham lecture series.8 Pearson coined the term to describe a "time-diagram" intended for historical purposes, such as charting time periods.7 His work was foundational in developing modern statistical tools and in visualizing continuous data, contrasting with the prevailing belief that all natural data followed a normal distribution.6 The concept of using graphical methods to summarize data, however, has earlier roots, with William Playfair credited for creating various graphical methods, including bar charts.5

Key Takeaways

  • A histogram visually represents the frequency distribution of numerical data.
  • Data is divided into continuous intervals called "bins," and the height of each bar indicates the frequency of observations within that bin.
  • Histograms help identify patterns such as skewness, outliers, and the overall shape of the data's distribution.
  • They are widely used in data visualization for exploratory data analysis.
  • The choice of bin width significantly impacts the appearance and interpretability of a histogram.

Formula and Calculation

While there isn't a single "formula" for a histogram itself, its construction involves two primary steps: defining bins and counting frequencies.

  1. Determine the Range of Data: Find the minimum and maximum values in your data set.
  2. Choose the Number of Bins (or Bin Width): This is a crucial step. The number of bins, denoted as (k), or the bin width, determines the granularity of the histogram. Common rules of thumb exist for (k), such as Sturges' formula or Scott's rule, but often this is chosen based on the data and insights desired.
    • Sturges' Formula:
      k=1+log2nk = 1 + \log_2 n
      Where:
      • (k) = number of bins
      • (n) = number of data points in the sample
    • Bin Width Calculation:
      Bin Width=Maximum ValueMinimum Valuek\text{Bin Width} = \frac{\text{Maximum Value} - \text{Minimum Value}}{k}
  3. Count Frequencies: For each bin, count how many data points fall within its defined range. This count is the frequency for that bin.

For example, if a data set has values from 10 to 100, and you choose 9 bins, each bin would cover a range of 10 (e.g., 10-20, 20-30, etc.). The histogram would then graphically represent how many data points fall into each of these 10-unit intervals.

Interpreting the Histogram

Interpreting a histogram involves analyzing its shape, center, and spread to understand the characteristics of the underlying quantitative data. The shape reveals whether the distribution is symmetric (like a normal distribution, often bell-shaped), skewed (tails extending to one side, indicating skewness), bimodal (having two peaks), or uniform. The center can be estimated by observing where the majority of the data lies, often correlating with the mean or median. The spread indicates the variability of the data, showing whether data points are tightly clustered or widely dispersed.4 For instance, a wider histogram implies greater variability. Observing outliers or gaps in the histogram can also point to unusual data points or separate groupings within the data.

Hypothetical Example

Consider an investor analyzing the daily percentage returns of a particular stock over the past year. Instead of looking at 252 individual daily returns, they can construct a histogram to understand the typical range and frequency of these returns.

  1. Collect Data: Gather 252 daily percentage returns for the stock.
  2. Determine Range: Suppose the lowest return was -5% and the highest was +7%.
  3. Choose Bins: The investor decides to use 12 bins, each covering a 1% range (e.g., -5% to -4%, -4% to -3%, ..., +6% to +7%).
  4. Count Frequencies:
    • Bin 1 (-5% to -4%): 5 days
    • Bin 2 (-4% to -3%): 10 days
    • ...
    • Bin 7 (0% to 1%): 60 days (most frequent)
    • ...
    • Bin 12 (+6% to +7%): 2 days

The resulting histogram would show a tall bar for the 0% to 1% range, indicating that the stock most frequently experienced small positive daily returns. Shorter bars at the extremes would highlight less frequent large gains or losses. This provides a quick visual summary of the stock's return distribution without needing to inspect every single data point.

Practical Applications

Histograms are widely applied across various domains in finance and economics due to their effectiveness in illustrating data distribution.

  • Market Analysis: Traders and analysts use histograms to examine the distribution of stock prices, trading volumes, or volatility. This can help identify common price ranges or periods of unusual activity.
  • Risk Management: In risk management, histograms can illustrate the distribution of potential losses in a portfolio, aiding in the assessment of Value at Risk (VaR).
  • Economic Indicators: Economists frequently use histograms to visualize the distribution of economic data, such as income levels, unemployment rates, or wage growth across different populations. For example, the FRED Blog from the St. Louis Federal Reserve has used histograms to illustrate the distribution of wage growth, showing how it varies across different demographic groups.3
  • Quantitative Finance: For quantitative analysts, histograms are essential for understanding the characteristics of financial time series data, aiding in model validation and parameter estimation.
  • Quality Control: In business operations, histograms are a key tool for quality control, illustrating the distribution of product measurements or process times to identify variations and defects.

Limitations and Criticisms

Despite their utility, histograms have certain limitations. A primary criticism revolves around the arbitrary choice of bin width. Different bin widths can lead to vastly different visual interpretations of the same data set, potentially obscuring important features or exaggerating minor fluctuations. For instance, a very narrow bin width might show too much detail, appearing "noisy," while a very wide bin width might smooth out important patterns, losing granularity.2 This subjectivity can make objective comparisons challenging and can potentially lead to misinterpretations if not carefully considered. Furthermore, histograms are best suited for large datasets of continuous data. They can be less effective for small datasets, where the shape might not be clearly defined, or for discrete data with few unique values. They also do not retain the individual data points, meaning specific values cannot be retrieved from the visual representation.

Histogram vs. Bar Chart

While both histograms and bar charts use bars to represent data, they are fundamentally different in their purpose and the type of data they display.

FeatureHistogramBar Chart
Data TypeDisplays the frequency distribution of continuous numerical data.Compares categorical or discrete data.
X-axisRepresents continuous intervals or "bins" of numerical values. Bars are typically adjacent.Represents distinct categories or discrete items. Bars are typically separated by gaps.
Bar MeaningThe area of the bar (or height, if bin widths are equal) is proportional to the frequency.The height of the bar represents the value or frequency for a specific category.
PurposeTo show the shape, spread, and central tendency of a distribution.To compare quantities across different categories.

The core distinction is that a histogram shows how a single continuous variable is distributed, whereas a bar chart compares different categories or discrete variables.

FAQs

What is a "bin" in a histogram?

A "bin" in a histogram refers to a specific interval or range of values on the horizontal axis. All data points that fall within this defined range are counted, and that count determines the height of the bar for that bin. For example, if you have a bin from 0 to 10, any data value between 0 and 10 (exclusive of the upper bound, typically) would be included in that bin's count.

How does bin width affect a histogram?

The choice of bin width significantly impacts the appearance and interpretability of a histogram.1 A narrow bin width can reveal fine details and irregularities but might also make the histogram look "noisy" and difficult to interpret. Conversely, a wide bin width can smooth out details, potentially hiding important patterns or features within the data distribution. Selecting an appropriate bin width is crucial for an effective visual summary.

Can a histogram show the average?

While a histogram does not explicitly show the numerical value of the mean or median, it provides a visual sense of where the center of the data set lies. For instance, if the histogram is symmetric and bell-shaped (approximating a normal distribution), the peak of the histogram will often be close to the mean. For skewed distributions, the mean will be pulled towards the longer tail.