Multivariate chart

What Is a Multivariate Chart?

A multivariate chart is a statistical tool used within statistical process control (SPC) to monitor multiple correlated process characteristics or variables simultaneously. Unlike traditional control charts that track individual metrics, a multivariate chart accounts for the relationships and interactions between different variables, providing a more holistic view of a process's stability and performance within the broader field of data analysis. This approach is crucial when the variables are interdependent, as analyzing them in isolation can lead to misleading conclusions about the overall state of the system²⁴. The primary goal of a multivariate chart is to detect shifts or changes in the collective behavior of these variables, enabling timely interventions to maintain quality control and minimize process variability.

History and Origin

The foundation of modern statistical process control, which includes multivariate charting, can be traced back to the work of Walter A. Shewhart at Bell Laboratories in the 1920s. Shewhart is credited with developing the concept of control charts and distinguishing between common and special causes of variation, laying the groundwork for monitoring industrial processes²³,. His pioneering work in 1924 introduced the statistical charts that became fundamental for quality improvement²².

While Shewhart's initial work focused on univariate control, the need to monitor multiple related variables became apparent. The generalization of these concepts to multivariate settings was significantly advanced by Harold Hotelling. In 1931, Hotelling introduced the Hotelling's T-squared statistic, a multivariate generalization of the Student's t-statistic, which became a cornerstone in multivariate statistical analysis and forms the basis for many multivariate control charts²¹,²⁰,¹⁹. This development allowed statisticians and engineers to analyze multiple variables collectively, considering their correlation structure, a critical step toward the development and widespread adoption of the multivariate chart.

Key Takeaways

A multivariate chart monitors multiple, often correlated, process variables simultaneously.
It provides a collective assessment of process stability, accounting for interdependencies between variables.
The primary benefit is detecting shifts that might go unnoticed when monitoring variables individually.
Common types include Hotelling's T-squared chart and multivariate exponentially weighted moving average (MEWMA) charts.
Applications span various industries, including manufacturing, healthcare, and finance.

Formula and Calculation

The most widely used multivariate chart for monitoring the process mean is based on Hotelling's T-squared statistic ($T^2$). This statistic is a multivariate generalization of the squared univariate t-statistic and is used to test whether the mean vector of a multivariate sample is equal to a known population mean vector or another sample mean vector.

For a sample of observations $x_1, x_2, ..., x_n$ with $p$ variables each, the Hotelling's $T^2$ statistic is calculated as:

T^2 = n (\bar{x} - \mu_0)' S^{-1} (\bar{x} - \mu_0)

Where:

$n$: The sample size (number of observations in a subgroup).
$\bar{x}$: The sample mean vector (a vector of the means for each of the $p$ variables).
$\mu_0$: The hypothesized population mean vector (the target or in-control mean vector for the $p$ variables).
$S^{-1}$: The inverse of the sample covariance matrix. This matrix captures the variances of each variable on its diagonal and the covariances between pairs of variables off-diagonal, indicating how variables move together.

The upper control limit (UCL) for a Hotelling's $T^{2$ chart is often derived from the F-distribution, given the relationship between $T}2$ and the F-distribution in hypothesis testing,¹⁸.

For Phase I (analysis of historical data to establish control limits):

UCL = \frac{p(n-1)(N-1)}{N(n-p)} F_{\alpha, p, n-p}

For Phase II (real-time monitoring with established limits):

UCL = \frac{p(N+1)(N-n)}{Nn(N-p)} F_{\alpha, p, N-p}

Where:

$p$: The number of variables being monitored.
$n$: Subgroup size.
$N$: Total number of observations or subgroups in the Phase I data.
$F_{\alpha, p, df}$: The critical value from the F-distribution with $\alpha$ significance level and specified degrees of freedom.

The calculation of the covariance matrix is central to understanding the interrelationships of the variables being monitored.

Interpreting the Multivariate Chart

Interpreting a multivariate chart involves observing the plotted statistic (e.g., $T^2$) relative to its upper control limit (UCL). As long as the plotted points remain below the UCL, the process is considered to be in a state of statistical control with respect to the multiple variables being monitored. A point plotting above the UCL, however, signals that the process is out of control, indicating that at least one, or a combination of, the variables has shifted significantly from its expected state¹⁷.

Unlike univariate charts, a single out-of-control signal on a multivariate chart does not immediately reveal which specific variable or combination of variables caused the signal¹⁶. Further investigation, often involving supplementary univariate charts or decomposition methods, is required to pinpoint the root cause of the shift. This makes a multivariate chart an excellent initial detection tool, prompting a deeper dive into the individual components when an issue is flagged. The chart's scale is typically unrelated to the individual scales of the variables it monitors, reflecting the aggregated multivariate distance from the target.

Hypothetical Example

Imagine a technology company that manufactures smart devices. The quality of these devices depends on two critical characteristics: battery life (measured in hours) and processor speed (measured in GHz). These two characteristics are often correlated; for instance, higher processor speed might typically consume battery life faster.

A quality control team wants to monitor these two factors jointly using a multivariate chart. They collect daily samples of 10 devices, measuring both battery life ($X_1$) and processor speed ($X_2$) for each. Over several weeks, they gather initial data to establish the in-control process.

Suppose the established in-control mean vector is $\mu_0 = \begin{pmatrix} 120 \ 2.5 \end{pmatrix}$ (120 hours battery life, 2.5 GHz processor speed) and the historical covariance matrix $S$ has been calculated.

On a particular day, the average battery life for a sample of 10 devices is $\bar{x}_1 = 115$ hours, and the average processor speed is $\bar{x}_2 = 2.6$ GHz. Individually, these values might still fall within typical univariate control limits for battery life and processor speed. However, when the team calculates the Hotelling's $T^2$ statistic using these new sample means and the established covariance matrix, the combined deviation from the target might exceed the upper control limit of the multivariate chart.

For instance, if the calculated $T^2$ value for this day is 15, and the established UCL for their multivariate chart is 10, the chart would signal an out-of-control condition. This indicates that while each factor might seem acceptable on its own, their combined performance is abnormal, potentially due to an unexpected interaction or a common underlying issue affecting both. The team would then investigate further to identify the specific cause, perhaps a batch of faulty components affecting both battery and processor performance. This example highlights how a multivariate chart can detect systemic issues that individual performance metrics might miss.

Practical Applications

Multivariate charts find wide-ranging applications in various sectors, extending beyond their traditional use in manufacturing and industrial quality management.

In finance, multivariate charts are employed for risk management and portfolio optimization. They can monitor multiple financial metrics simultaneously, such as stock prices, trading volumes, and credit risk indicators, allowing financial analysts to detect unusual patterns or shifts that could signal market instability or changes in investment risk¹⁵,¹⁴. For example, a multivariate chart could track the joint behavior of several asset classes within an investment portfolio, helping to identify when the portfolio's overall risk profile deviates from its target, even if individual asset volatilities remain within acceptable ranges.

In healthcare, these charts can monitor patient outcomes or treatment effectiveness by considering multiple physiological parameters or patient attributes simultaneously¹³. For instance, a hospital might use a multivariate chart to track blood pressure, heart rate, and oxygen saturation for patients in a critical care unit, detecting early signs of patient deterioration that might not be evident from any single vital sign alone.

In environmental monitoring, multivariate charts can track various pollutants or environmental factors in conjunction, identifying deviations that indicate an emerging ecological problem. For instance, a multivariate chart could monitor water temperature, pH levels, and dissolved oxygen levels in a body of water to detect pollution events.

These applications underscore the multivariate chart's utility in complex systems where multiple interrelated factors influence outcomes. By providing a collective assessment, they enable proactive decision-making and continuous improvement across diverse fields.

Limitations and Criticisms

While multivariate charts offer significant advantages for monitoring complex processes, they also come with certain limitations and criticisms. One primary challenge lies in their interpretation, especially when an out-of-control signal occurs. Unlike a univariate chart, which directly points to a deviation in a single variable, a multivariate chart's signal doesn't immediately identify which specific variable or combination of variables caused the alarm¹². This ambiguity means that further diagnostic tools or supplementary univariate charts are often necessary to pinpoint the root cause, adding an extra layer of analysis.

Another criticism is that the visual representation of a multivariate chart can be less intuitive than a simple Shewhart chart, particularly because the scale on the multivariate chart does not correspond directly to the scale of any individual variable being monitored¹¹. This can make it challenging for non-statisticians to grasp the full implications of a plotted point.

Furthermore, if the assumption of multivariate normality for the data is significantly violated, the statistical validity of some multivariate charts, such as Hotelling's $T^2$ chart, can be compromised¹⁰. While robust methods exist, practitioners must be aware of the underlying assumptions. Relying solely on a single multivariate statistic like the generalized variance (a measure of multivariate dispersion) can also be misleading if it doesn't adequately capture the complex interdependencies within the data distribution ⁹. In scenarios with a large number of correlated variables, the complexity of managing and interpreting the underlying statistical models for these charts can also be a practical challenge.

Multivariate Chart vs. Univariate Chart

The fundamental distinction between a multivariate chart and a univariate chart lies in the number and nature of the variables they monitor.

Feature	Multivariate Chart	Univariate Chart
Variables Monitored	Multiple correlated variables simultaneously	A single variable
Focus	Collective behavior and interrelationships of variables	Individual performance of a single variable
Sensitivity	More sensitive to small shifts in the overall process, especially when variables are correlated	Sensitive to shifts in a single variable
False Alarms	Maintains a low overall false alarm rate for the entire system of variables⁸,⁷	Increased risk of false alarms when many individual charts are monitored⁶
Interpretation	Signals overall out-of-control condition; requires further analysis to identify specific variable at fault⁵	Directly identifies which variable is out of control
Complexity	Higher mathematical and interpretative complexity due to covariance structures	Simpler to understand and interpret

The main point of confusion often arises when multiple variables in a process are actually interdependent. In such cases, using several individual univariate charts can be misleading. A process might appear to be in control based on each individual chart, but the combined effect of small, simultaneous shifts in correlated variables could indicate a significant problem that only a multivariate chart would detect⁴. For instance, if temperature and pressure in a manufacturing process are correlated, a slight, acceptable increase in temperature coupled with a slight, acceptable decrease in pressure might collectively signal a process deviation, which a multivariate chart would capture but two separate univariate charts might miss.

FAQs

What is the primary benefit of using a multivariate chart?

The primary benefit of a multivariate chart is its ability to monitor multiple variables simultaneously, taking into account their interrelationships. This allows for the detection of process shifts that might go unnoticed if variables were monitored individually, especially when small changes in several correlated variables combine to create a significant deviation.

When should I use a multivariate chart instead of multiple univariate charts?

You should use a multivariate chart when the variables you are monitoring are correlated or interdependent, and their collective behavior is important to assess. If analyzing variables individually might miss critical interactions or lead to an increased rate of false alarms, a multivariate chart provides a more accurate and efficient overall picture of process stability.

What is Hotelling's T-squared statistic?

Hotelling's T-squared statistic is a generalization of the familiar Student's t-test used in multivariate statistical analysis. It measures the statistical distance between a multivariate sample mean and a hypothesized population mean, considering the variance and covariance structure of the data. It is a cornerstone for many multivariate control charts, including the T-squared chart itself.

Can a multivariate chart tell me which specific variable caused a problem?

No, a multivariate chart typically signals that the overall process, considering all monitored variables, is out of control. It does not directly indicate which specific variable or combination of variables is responsible for the out-of-control signal³. To identify the root cause, you usually need to perform additional diagnostic analysis, such as examining supplementary univariate charts or using decomposition techniques.

Are multivariate charts used in finance?

Yes, multivariate charts are used in finance for purposes such as risk assessment, portfolio monitoring, and detecting unusual patterns in financial markets. They can track multiple financial indicators like stock prices, interest rates, and commodity prices simultaneously to assess the overall health or risk profile of an investment or market segment²,¹.