Family wise error rate

What Is Family Wise Error Rate?

The family wise error rate (FWER) is the probability of making at least one incorrect rejection of a null hypothesis when conducting multiple hypothesis testing simultaneously. It falls under the broader field of statistical inference, a core component of quantitative analysis in finance. When multiple statistical tests are performed on a single dataset or across related datasets, the probability of observing a false positive, or a Type I error, increases. The family wise error rate specifically addresses this aggregate probability, aiming to control the likelihood of concluding that an effect exists when, in reality, it does not.

History and Origin

The concept of the family wise error rate arose from the challenges posed by the multiple comparisons problem in statistics. As researchers began performing more simultaneous statistical tests, it became apparent that maintaining a desired alpha level for each individual test did not guarantee the same overall error rate for the entire set, or "family," of tests. The notion of a "familywise error rate" was developed by statistician John Tukey in 1953 to address this phenomenon, with further related concepts like experimentwise error rate proposed by Ryan in 1959. This increased attention to the problem led to the development of various procedures aimed at controlling the family wise error rate, such as the Bonferroni correction, which was formally applied to multiple comparisons by Olive Jean Dunn building on earlier work by Carlo Emilio Bonferroni.⁹, ¹⁰

Key Takeaways

The family wise error rate (FWER) is the probability of making at least one Type I error across a family of statistical tests.
As the number of simultaneous statistical tests increases, the unadjusted family wise error rate also increases, leading to a higher chance of false positives.
Controlling the FWER is crucial to maintain the reliability and validity of conclusions drawn from multiple hypothesis tests.
Methods like the Bonferroni correction and Holm's method are commonly used to control the family wise error rate.
Controlling FWER can sometimes reduce the statistical power of individual tests, increasing the risk of Type II errors (false negatives).

Formula and Calculation

The family wise error rate (FWER) can be approximated for a set of m independent tests, each conducted at an individual statistical significance level of $\alpha$, by the formula:

FWER \leq 1 - (1 - \alpha)^m

Where:

( FWER ) is the family wise error rate.
( \alpha ) is the significance level for each individual test.
( m ) is the total number of independent tests or comparisons being performed.

For example, if an analyst performs 10 independent tests, each with an alpha level of 0.05, the unadjusted FWER would be:

FWER \leq 1 - (1 - 0.05)^{10} \approx 0.401

This indicates that there is approximately a 40.1% chance of making at least one Type I error across the 10 tests, significantly higher than the individual 5% error rate.⁸

To control the FWER, common procedures adjust the individual alpha level. The Bonferroni correction, for instance, adjusts the individual alpha level to $\alpha / m$. So, for 10 tests and a desired overall FWER of 0.05, each individual test would need to be performed at a significance level of ( 0.05 / 10 = 0.005 ).

Interpreting the Family Wise Error Rate

Interpreting the family wise error rate involves understanding the trade-off between identifying true effects and avoiding false alarms. A low FWER indicates a reduced likelihood of concluding that relationships or effects exist when they do not. This is particularly important in fields where incorrect conclusions can lead to significant consequences, such as in clinical trials or investment decision-making.

When the family wise error rate is not controlled, statistical analyses are susceptible to the "look-elsewhere effect" or data dredging. This phenomenon implies that if enough tests are performed, some will inevitably appear statistically significant purely by chance, even if no real underlying effect is present. This can lead to spurious findings and misallocation of resources. By controlling FWER, researchers and analysts aim to increase confidence that any significant findings are genuine across the entire set of investigations. This helps in building robust confidence interval estimates.

Hypothetical Example

Consider a financial analyst examining a new investment strategy that involves backtesting 20 different trading signals over the past five years. Each signal is designed to predict market movements, and the analyst wants to determine if any of them generate statistically significant returns above a benchmark. The analyst sets an individual p-value threshold of 0.05 for each signal.

If the analyst were to run 20 separate tests without adjusting for multiple comparisons, the family wise error rate would be much higher than 0.05. Using the formula ( FWER \leq 1 - (1 - 0.05)^{20} \approx 0.6415 ), there's over a 64% chance of finding at least one signal appearing significantly profitable by random chance, even if none of them are truly effective.

To control the FWER at a more acceptable level, say 0.05, the analyst could apply a Bonferroni correction. This would require each individual test to be significant at an alpha level of ( 0.05 / 20 = 0.0025 ). While this makes it harder for any single signal to be deemed significant, it drastically reduces the probability of a false positive across the entire set of 20 tests, providing a more reliable basis for further investment strategy development.

Practical Applications

The family wise error rate is a critical consideration in various real-world scenarios, particularly where multiple comparisons are inherent:

Financial Modeling and Research: In financial modeling, analysts might test numerous factors or indicators to identify those that predict asset prices or economic outcomes. Controlling FWER prevents spurious correlations from being mistaken for causal relationships, enhancing the reliability of predictive models. For example, when running a Monte Carlo simulation with many variables, FWER control helps ensure the robustness of the simulated outcomes.
Algorithmic Trading: Developing and optimizing algorithmic trading strategies often involves testing hundreds or thousands of parameter combinations. Without addressing the family wise error rate, algorithms might be built on backtest results that are merely artifacts of chance, leading to poor live trading performance.
Regulatory Compliance and Risk Management: Regulatory bodies and internal risk management departments may use statistical tests to identify potential market manipulation, fraud, or compliance breaches. Ensuring a controlled FWER is vital to avoid falsely flagging legitimate activities, which could lead to unnecessary investigations and reputational damage.
Academic Research: In academic studies across various disciplines, including finance and economics, researchers frequently test multiple hypotheses. Proper control of the family wise error rate is a cornerstone of rigorous research, preventing the publication of findings that are not truly reproducible. The failure to account for multiple comparisons has been highlighted as a contributor to the "reproducibility crisis" in some scientific fields. A classic example illustrating the dangers of not correcting for multiple comparisons involves a study that found "brain activity" in a dead salmon during an fMRI scan.⁶, ⁷ This humorous yet stark example underscores how the problem can lead to absurd conclusions.⁵

Limitations and Criticisms

While controlling the family wise error rate is essential for maintaining statistical rigor, methods designed to do so, such as the Bonferroni correction, are not without their limitations. A primary criticism is that these methods can be overly conservative, especially when the number of tests (m) is large or when the tests are positively correlated.⁴

This conservatism increases the likelihood of a Type I error, meaning they reduce the statistical power of individual tests. In practical terms, an overly conservative correction might cause researchers to miss genuinely significant findings (false negatives), leading to a failure to identify real effects or relationships. For instance, in drug discovery or financial innovation, missing a true effect due to stringent FWER control could delay beneficial advancements.

Critics argue that strict FWER control might be too rigid in exploratory analyses where the goal is to identify potential leads for further investigation rather than confirm definitive findings.³ In such cases, alternative approaches, like controlling the False Discovery Rate, might be more appropriate as they offer a better balance between Type I and Type II errors.²

Family Wise Error Rate vs. False Discovery Rate

The family wise error rate (FWER) and the False Discovery Rate (FDR) are both measures used to address the challenge of multiple comparisons in statistical analysis, but they differ in what they aim to control.

Feature	Family Wise Error Rate (FWER)	False Discovery Rate (FDR)
Definition	Probability of making at least one Type I error (false positive) in a family of tests.	Expected proportion of false positives among all rejected null hypotheses.
Control Goal	Minimize the chance of making any false discoveries.	Control the proportion of false discoveries among all positive findings.
Conservatism	More conservative; lowers individual test significance levels more aggressively.	Less conservative; offers more statistical power.
Primary Use Case	When making a single false positive is highly undesirable (e.g., clinical trials for new drugs, legal proceedings).	In exploratory research with many tests where some false positives are acceptable for the sake of finding true positives (e.g., genomic studies, large-scale financial data mining).

The key difference lies in their focus: FWER aims to prevent any false positives, making it a very strict control. In contrast, FDR allows for a certain proportion of false positives among the discoveries, which can be more suitable for large-scale analyses where identifying as many true effects as possible is prioritized, even at the cost of some incorrect findings.¹

FAQs

Why is Family Wise Error Rate important in finance?

The family wise error rate is important in finance because analysts often conduct numerous statistical tests simultaneously, such as when backtesting trading strategies, evaluating various investment factors, or identifying anomalies in financial data. Without controlling the FWER, there's a significantly higher chance of finding spurious correlations or seemingly profitable strategies that are merely due to random chance, leading to poor investment decisions or misguided financial modeling.

How can the Family Wise Error Rate be controlled?

The family wise error rate can be controlled using various methods, with the Bonferroni correction being one of the most common. This method adjusts the alpha level for each individual test by dividing the desired overall alpha by the total number of tests. Other, less conservative methods include Holm's method, Šidák correction, and Tukey's Honestly Significant Difference (HSD) test. The choice of method depends on the specific characteristics of the data and the research question.

What happens if the Family Wise Error Rate is not controlled?

If the family wise error rate is not controlled, the probability of making at least one Type I error (a false positive) increases substantially with the number of tests performed. This can lead to misleading conclusions, such as identifying non-existent patterns in market data, implementing ineffective trading strategies based on random noise, or misallocating resources due to false positive signals. This issue is often referred to as the multiple comparisons problem or the "look-elsewhere effect."