Lesson 4: Multivariate Normal Distribution


This lesson is concerned with the multivariate normal distribution. Just as the univariate normal distribution tends to be the most important statistical distribution in univariate statistics, the multivariate normal distribution is the most important distribution in multivariate statistics.

The question one might ask is, "Why is the multivariate normal distribution so important?" There are three reasons why this might be so:

  1. Mathematical Simplicity. It turns out that this distribution is relatively easy to work with, so it is easy to obtain multivariate methods based on this particular distribution.
  2. Multivariate version of the Central Limit Theorem. You might recall in the univariate course that we had a central limit theorem for the sample mean for large samples of random variables. A similar result is available in multivariate statistics that says if we have a collection of random vectors X1, X2,...,Xn that are independent and identically distributed, then the sample mean vector, \(\bar{x}\), is going to be approximately multivariate normally distributed for large samples.
  3. Many natural phenomena may also be modeled using this distribution, just as in the univariate case.

Learning objectives & outcomes

Upon completion of this lesson, you should be able to:

4.1 - Comparing Distribution Types

Univariate Normal Distributions

Before defining the multivariate normal distribution we will visit the univariate normal distribution. A random variable X is normally distributed with mean μ and variance σ2 if it has the probability density function of X as:

\(\phi(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\{-\frac{1}{2\sigma^2}(x-\mu)^2\}\)

This results in the usual bell-shaped curve that you will see throughout statistics. In this expression, you will see the squared difference between the variable x and its mean, μ. This value will be minimum when x is equal to μ. The quantity -2σ-2(x - μ) will take its largest value when x is equal to μ or likewise, since the exponential function is a monotone function, the normal density takes a maximum value when x is equal to μ.

The variance σ2 defines the spread of the distribution about that maximum. If σ2 is large then the spread is going to be large, otherwise if the σ2 value is small then the spread will be small.

As shorthand notation we may use the expression below:

\(X \sim N(\mu, \sigma^2)\)

indicating that X is distributed according to (denoted by the wavey symbol 'tilde') a normal distribution (denoted by N), with mean μ and variance σ2.

Multivariate Normal Distributions

If we have a p x 1 random vector X that is distributed according to a multivariate normal distribution with population mean vector μ and population variance-covariance matrix Σ, then this random vector, X, will have the joint density function as shown in the expression below:


The |Σ| denotes determinant of the variance-covariance matrix Σ and inside the exponent Σ raised to the minus one is just the inverse of the variance-covariance matrix Σ. Again, this distribution will take maximum values when the vector X is equal to the mean vector μ, and decrease around that maximum.

If p is equal to 2, then we have just a bivariate normal distribution and this will yield a bell-shaped curve but now in three dimensions.

The shorthand notation, similar to the univariate version above, is

\(\textbf{X} \sim N(\mathbf{\mu},\Sigma)\)

We use the expression the vector X 'is distributed as' multivariate normal with mean vector μ and variance-covariance matrix Σ.

Some things to note about the multivariate normal distribution:

1. The following term appearing inside the exponent of the multivariate normal distribution is a quadratic form:


This particular quadratic form is also called the squared Mahalanobis distance between the random vector x and the mean vector μ.

2. If the variables are uncorrelated then the variance-covariance matrix will be a diagonal matrix with variances of the individual variables appearing on the main diagonal of the matrix and zeros everywhere else:

\(\Sigma = \left(\begin{array}{cccc}\sigma^2_1 & 0 & \dots & 0\\ 0 & \sigma^2_2 & \dots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & \sigma^2_p \end{array}\right)\)

In this case the multivariate normal density function simplifies to the expression below:

\(\phi(\textbf{x}) = \prod_{j=1}^{p}\frac{1}{\sqrt{2\pi\sigma^2_j}}\exp\{-\frac{1}{2\sigma^2_j}(x_j-\mu_j)^2\}\)

Note that the product term, given by 'captial' pi, (Π), acts very much like the summation sign, but instead of adding we multiply over the elements ranging from j to p. Inside this product is the familiar univariate normal distribution where the random variables are subscripted by j. In this case, the elements of the random vector, X1, X2,., Xp, are going to be independent random variables.

3. We could also consider linear combinations of the elements of a multivariate normal random variable as shown in the expression below:

\(Y = \sum_{j=1}^{p}c_jX_j =\textbf{c}'\textbf{X}\)

Note that to define a linear combination, the random variables Xneed not be uncorrelated. The coefficients cj are chosen arbitrarily, specific values are selected according to the problem of interest and so is influenced very much by subject matter knowledge. Looking back at the Women's Nutrition Survey Data, for example, we selected the coefficients to obtain the total intake of vitamins A and C.

Now suppose that the random vector X is multivariate normal with mean μ and variance-covariance matrix Σ.

\(\textbf{X} \sim N(\mathbf{\mu},\Sigma)\)

Then Y is normally distributed with mean:

\(\textbf{c}'\mathbf{\mu} = \sum_{j=1}^{p}c_j\mu_j\)

and variance:

\(\textbf{c}'\Sigma \textbf{c} =\sum_{j=1}^{p}\sum_{k=1}^{p}c_jc_k\sigma_{jk}\)

See previous lesson to review the computation of the population mean of a linear combination of random variables.

In summary, Y is normally distributed with mean c transposed μ and variance c transposed Σ times c.

\(Y \sim N(\textbf{c}'\mathbf{\mu},\textbf{c}'\Sigma\textbf{c}')\)

As we have seen before, these quantities may be estimated using sample estimates of the population parameters.

Other Useful Results for the Multivariate Normal

For variables with a multivariate normal distribution with mean vector μ and covariance matrix Σ, some useful facts are:

Example 3 - Linear Combination of the Cholesterol Measurements

Measurements were taken on n = heart-attack patients on their cholesterol levels. For each patient, measurements were taken 0, 2, and 4 days following the attack. Treatment was given to reduce cholesterol level. The sample mean vector is:

Variable  Mean
X1 = 0-Day 259.5
X2 = 2-Day 230.8
X3 = 4-Day 221.5

The covariance matrix is

  0-Day 2-Day 4-day
0-Day 2276 1508 813
2-Day 1508 2206 1349
4-Day 813 1349 1865

Suppose that we are interested in the difference X1 – X2, the difference between the 0-day and the 2-day measurements. We can write the linear combination of interest as

\(\textbf{a}'\bar{\textbf{x}} = \left(\begin{array}{ccc}1 & -1 & 0 \end{array}\right)
\left(\begin{array}{c}x_1\\x_2\\x_3 \end{array}\right)\)

The mean value for the difference is \(\left(\begin{array}{ccc}1 & -1 & 0 \end{array}\right)\left(\begin{array}{c}259.5\\230.8\\221.5 \end{array}\right) = 28.7\).

The variance is \( \left(\begin{array}{ccc}1 & -1 & 0 \end{array}\right) \left(\begin{array}{ccc}2276 & 1508 & 813\\ 1508 & 2206 & 1349\\ 813 & 1349 & 1865 \end{array}\right) \left(\begin{array}{c}1\\-1\\0 \end{array}\right)= \left(\begin{array}{ccc}768 & -698 & -536 \end{array}\right)\left(\begin{array}{c}1\\-1\\0 \end{array}\right) = 1466\).

If we assume the three measurements have a multivariate normal distribution, then the distribution of the difference X1 – X2 has a univariate normal distribution.

4.2 - Example: Bivariate Normal Distribution

To further understand the multivariate normal distribution it is helpful to look at the bivariate normal distribution. Here our understanding is facilitated by being able to draw pictures of what this distribution looks like.

We have just two variables, X1 and X2 and that these are bivariately normally distributed with mean vector components μ1 and μ2 and variance-covariance matrix takes the form as shown below:

\(\left(\begin{array}{c}X_1\\X_2 \end{array}\right) \sim N \left[\left(\begin{array}{c}\mu_1\\ \mu_2 \end{array}\right), \left(\begin{array}{cc}\sigma^2_1 & \rho \sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma^2_2 \end{array}\right)\right]\)

In this case we have the variances for the two variables on the diagonal and on the off-diagonal we have the covariance between the two variables. This covariance is equal to the correlation times the product of the two standard deviations. The determinant of the variance-covariance matrix is simply equal to the product of the variances times 1 minus the squared correlation.

\(|\Sigma| = \sigma^2_1\sigma^2_2(1-\rho^2)\)

The inverse of the variance-covariance matrix takes the form below:

\(\Sigma^{-1} = \frac{1}{\sigma^2_1\sigma^2_2(1-\rho^2)} \left(\begin{array}{cc}\sigma^2_2 & -\rho \sigma_1\sigma_2 \\ -\rho\sigma_1\sigma_2 & \sigma^2_1 \end{array}\right)\)

Substituting in the expressions for the determinant and the inverse of the variance-covariance matrix we obtain, after some simplification, the joint probability density function of (X1, X2) for the bivariate normal distribution as shown below:

\(\phi(x_1,x_2) = \frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\exp\{-\frac{1}{2(1-\rho^2)}\left[\left(\frac{x_1-\mu_1}{\sigma_1}\right)^2-2\rho \left(\frac{x_1-\mu_1}{\sigma_1}\right) \left(\frac{x_2-\mu_2}{\sigma_2}\right) + \left(\frac{x_2-\mu_2}{\sigma_2}\right)^2 \right]\}\)

The links to the following three plots are plots of the bivariate distribution for the various values for the correlation row.

The first plot shows the case where the correlation ρ is equal to zero. This special case is called the circular normal distribution. Here, we have a perfectly symmetric bell-shaped curve in three dimensions.
As ρ increases that bell-shaped curve becomes flattened on the 45 degree line. So for ρ equals 0.7 we can see that the curve extends out towards minus 4 and plus 4 and becomes flattened in the perpendicular direction.
Increasing ρ to 0.9 the curve becomes broader and the 45 degree line and even flatter still in the perpendicular direction.

These three curves were produced using the SAS program normplot.sas shown below. The desired correlation is specified in the third line of the SAS code (here at 0.9). No other changes are required to run this program. It would be a good idea to try this program for various values of r between -1 and 1 to explore how the shape of the normal distribution varies with the correlation.

SAS Program

Note that this code assumes that the variances are both equal to one.


Click on the arrow in the window below to see how you can use Minitab to create plots of the bivariate distribution. You will need the formula that is found in the text file phi_equation_r=0.7.txt.

minitab dialog box

Bivariate plots using Minitab

4.3 - Exponent of Multivariate Normal Distribution

Recall the Multivariate Normal Density function below:

\(\phi(\textbf{x}) = \left(\frac{1}{2\pi}\right)^{p/2}|\Sigma|^{-1/2}\exp\{-\frac{1}{2}(\textbf{x}-\mathbf{\mu})'\Sigma^{-1}(\textbf{x}-\mathbf{\mu})\}\)

You will note that this density function, φ(x), only depends on x through the squared Mahalanobis distance:


This is the equation for a hyper-ellipse centered at μ.

For a bivariate normal, where p = 2 variables, we have an ellipse as shown in the plot below:

Useful Facts about the Exponent Component    \( (\textbf{x}-\mathbf{\mu})'\Sigma^{-1}(\textbf{x}-\mathbf{\mu})\) 

If we define a specific hyper-ellipse by taking the squared Mahalanobis distance equal to a critical value of the chi-square distribution with p degrees of freedom and evaluate this at α, then the probability that the random value X will fall inside the ellipse is going to be equal to 1 - α.

\(\text{Pr}\{(\textbf{x}-\mathbf{\mu})'\Sigma^{-1}(\textbf{x}-\mathbf{\mu}) \le \chi^2_{p,\alpha}\}=1-\alpha\)

This particular ellipse is called the (1 - α) x 100% prediction ellipse for a multivariate normal random vector with mean vector μ and variance-covariance matrix Σ.

Calculating Mahalanobis Distance With SAS

SAS does not provide Mahalanobis distance directly, but we can compute them using principal components. The steps are:

  1. Determine principal components for the correlation matrix of the x-variables.
  2. Standardize the principal component scores so that each principal component has standard deviation = 1. For each component, this is done by dividing the scores by the square root of the eigenvalue. In SAS, use the STD option as part of the PROC PRINCOMP command to automate this standard deviation.
  3. For each observation, calculate d2 = sum of squared standardized principal components scores. This will equal the squared Mahalanobis distance.

Example - Calculating and Printing Mahalonobis Distances in SAS

Suppose we have four x-variables, called x1, x2, x3, x4, and they have already been read into SAS. The following SAS code (mahalonobis.sas) will determine standardized principal components and calculate Mahalanobis distances (the printout will include observation numbers). Within the DATA step, the “uss(of prin1-prin4)” function calculates the uncorrected sum of squares for the variables prin1-prin4. This value will be computed for each observation in the “pcout” data set. The result of the DATA step will be a SAS data set named “mahal” that will include the original variables, the standardized principal component scores (named prin1-prin4) and the Mahalanobis distance (named dist2).

SAS program


Click on the graphic or the link below to walk through how to find Mahalonobis distances using Minitab.

minitab dialog box

Finding Mahalonobis distances using Minitab

4.4 - Multivariate Normality and Outliers

Q-Q Plot for Evaluating Multivariate Normality and Outliers

The variable \(d^2 = (\textbf{x}-\mathbf{\mu})'\Sigma^{-1}(\textbf{x}-\mathbf{\mu})\) has a chi-square distribution with p degrees of freedom, and for “large” samples the observed Mahalanobis distances have an approximate chi-square distribution. This result can be used to evaluate (subjectively) whether a data point may be an outlier and whether observed data may have a multivariate normal distribution.

A Q-Q plot can be used to picture the Mahalanobis distances for the sample. The basic idea is the same as for a normal probability plot. For multivariate data, we plot the ordered Mahalanobis distances versus estimated quantiles (percentiles) for a sample of size n from a chi-squared distribution with p degrees of freedom. This should resemble a straight-line for data from a multivariate normal distribution. Outliers will show up as points on the upper right side of the plot for which the Mahalanobis distance is notably greater than the chi-square quantile value.

Determining the Quantiles

Example: Q-Q Plot for Board Stiffness Data

This example reproduces Example 4.14 in the text (page 187) . For each of n = 30 boards, there are p = 4 measurements of board stiffness. Each measurement was done using a different method.

A SAS plot of the Mahalanobis distances is given below. The distances are on the vertical and the chi-square quantiles are on the horizontal. At the right side of the plot we see an upward bending. This indicates possible outliers (and a possible violation of multivariate normality). In particular, the final point has d2 ≈ 16 whereas the quantile value on the horizontal is about 12.5. The next-to-last point on the plot might also be an outlier. A printout of the distances, before they were ordered for the plot, shows that the two possible outliers are boards 16 and 9, respectively.


The SAS code (Q_Qplot.sas) below is used to produce the graph just given follows.

SAS programThe data step reads the dataset.

The proc princomp calculates the principal components and stores the standardized principal components in a dataset named pcresult.

The next data step calculates the Mahalanobis distances and keeps them in a dataset named mahal.

The proc print will print the distances (with observation numbers).

Next we sort the mahal dataset in order of the distances. This is to prepare for the Q-Q plot.

In the next data step we compute estimated quantiles of a chi-square distribution with df = 4. In the prb = line, the value 30 is the sample size and in the cinv function the value 4 is the df (because we have 4 variables.

Finally, the gplot procedure plots distances versus chi-square quantiles.


Click on the graphic or the link below to walk through how to produce a QQ plot for the borad stiffness dataset using Minitab.

minitab dialog box

Producing a QQ plot using Minitab

4.5 - Eigenvalues and Eigenvectors

The next thing that we would like to be able to do is to describe the shape of this ellipse mathematically so that we can understand how the data are distributed in multiple dimensions under a multivariate normal. To do this we first must define the eigenvalues and the eigenvectors of a matrix.

In particular we will consider the computation of the eigenvalues and eigenvectors of a symmetric matrix A as shown below:

\(\textbf{A} = \left(\begin{array}{cccc}a_{11} & a_{12} & \dots & a_{1p}\\ a_{21} & a_{22} & \dots & a_{2p}\\ \vdots & \vdots & \ddots & \vdots\\ a_{p1} & a_{p2} & \dots & a_{pp} \end{array}\right)\)

Note: we would call the matrix symmetric if the elements aij are equal to aji for each i and j.

Usually A is taken to be either the variance-covariance matrix Σ, or the correlation matrix, or their estimates S and R, respectively.

Eigenvalues and eigenvectors are used for:

For the present we will be primarily concerned with eigenvalues and eigenvectors of the variance-covariance matrix.

First of all let's define what these terms are...


Definition: If we have a p x p matrix A we are going to have p eigenvalues, λ1, λ2 ... λp. They are obtained by solving the equation given in the expression below:


On the left-hand side, we have the matrix A minus λ times the Identity matrix. When we calculate the determinant of the resulting matrix, we end up with a polynomial of order p. Setting this polynomial equal to zero, and solving for λ we obtain the desired eigenvalues. In general, we will have p solutions and so there are p eigenvalues, not necessarily all unique.

Definition: The corresponding eigenvectors e1, e2, ...,ep are obtained by solving the expression below:

\((\textbf{A}-\lambda_j\textbf{I})\textbf{e}_j = \mathbf{0}\)

Here, we have the difference between the matrix A minus the jth eignevalue times the Identity matrix, this quantity is then multiplied by the jth eigenvector and set it all equal to zero. This will obtain the eigenvector ej associated with eigenvalue λj.

Note: This does not have a generally have a unique solution. So, to obtain a unique solution we will often require that ej transposed ej is equal to 1. Or, if you like, the sum of the square elements of ej is equal to 1.

\(\textbf{e}'_j\textbf{e}_j = 1\)

Also note that eigenvectors corresponding to different eigenvalues are orthogonal. In situations, where two (or more) eigenvalues are equal, corresponding eigenvectors will still be orthogonal.

Example: Consider the 2 x 2 matrix

To illustrate these calculations consider the correlation matrix R as shown below:

\(\textbf{R} = \left(\begin{array}{cc} 1 & \rho \\ \rho & 1 \end{array}\right)\)

Then, using the definition of the eigenvalues, we must calculate the determinant of R - λ times the Identity matrix.

So, R in the expression above is given in blue, and the Identity matrix follows in red, and λ here is the eigenvalue that we wish to solve for. Carrying out the math we end up with the matrix with 1 - λ on the diagonal and ρ on the off-diagonal. Then calculating this determinant we obtain 1 - λ squared minus ρ2:

\(\left|\begin{array}{cc}1-\lambda & \rho \\ \rho & 1-\lambda \end{array}\right| = (1-\lambda)^2-\rho^2 = \lambda^2-2\lambda+1-\rho^2\)

Setting this expression equal to zero we end up with the following...

\( \lambda^2-2\lambda+1-\rho^2=0\)

To solve for λ we use the general result that any solution to the second order polynomial below:

\(ay^2+by+c = 0\)

is given by the following expression:

\(y = \frac{-b\pm \sqrt{b^2-4ac}}{2a}\)

Here, a = 1, b = 2 (the term that precedes λ and c is equal to 1 - ρ2. Substituting these terms in the equation above, we obtain that λ must be equal to 1 plus or minus the correlation ρ.

\(\lambda = \frac{2 \pm \sqrt{2^2-4(1-\rho^2)}}{2} = 1\pm\sqrt{1-(1-\rho^2)} = 1 \pm \rho\)

Here we will take the following solutions:

\( \begin{array}{ccc}\lambda_1 & = & 1+\rho \\ \lambda_2 & = & 1-\rho \end{array}\)

Next, to obtain the corresponding eigenvectors, we must solve a system of equations below:

\((\textbf{R}-\lambda\textbf{I})\textbf{e} = \mathbf{0}\)

This equals R - λ times I, this product multiplied by the eigenvector quantity times the eigenvector e equal to 0. Or in other words, this is translated for this specific problem in the expression below:

\(\left\{\left(\begin{array}{cc}1 & \rho \\ \rho & 1 \end{array}\right)-\lambda\left(\begin{array}{cc}1 &0\\0 & 1 \end{array}\right)\right \}\left(\begin{array}{c} e_1 \\ e_2 \end{array}\right) = \left(\begin{array}{c} 0 \\ 0 \end{array}\right)\)

This simplifies as follows:

\(\left(\begin{array}{cc}1-\lambda & \rho \\ \rho & 1-\lambda \end{array}\right) \left(\begin{array}{c} e_1 \\ e_2 \end{array}\right) = \left(\begin{array}{c} 0 \\ 0 \end{array}\right)\)

Yielding a system of two equations with two unknowns:

\(\begin{array}{lcc}(1-\lambda)e_1 + \rho e_2 & = & 0\\ \rho e_1+(1-\lambda)e_2 & = & 0 \end{array}\)

Note that this does not have a unique solution. If (e1, e2) is one solution, then a second solution can be obtained by multiplying the first solution by any non-zero constant c, i.e., (ce1, ce2). Therefore, we will require the additional condition that the sum of the squared values of e1 and e2 are equal to 1.

\(e^2_1+e^2_2 = 1\)

Consider the first equation:

\((1-\lambda)e_1 + \rho e_2 = 0\)

Solving this equation for e2 and we obtain the following:

\(e_2 = -\frac{(1-\lambda)}{\rho}e_1\)

Substituting this into \(e^2_1+e^2_2 = 1\)  we get the following:

\(e^2_1 + \frac{(1-\lambda)^2}{\rho^2}e^2_1 = 1\)

Recall that \(\lambda = 1 \pm \rho\). In either case we end up finding that \((1-\lambda)^2 = \rho^2\), so that the expression above simplifies to:

\(2e^2_1 = 1\)

Or, in other words:

\(e_1 = \frac{1}{\sqrt{2}}\)

Using the expression for e2 which we obtained above,

\(e_2 = -\frac{1-\lambda}{\rho}e_1\)

we get

\(e_2 = \frac{1}{\sqrt{2}}\) for \(\lambda = 1 \pm \rho\) and \(e_2 = \frac{1}{\sqrt{2}}\) for \(\lambda = 1-\rho\)

Therefore, the two eigenvectors are given by the two vectors as shown below:

\(\left(\begin{array}{c}\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \end{array}\right)\) for \(\lambda_1 = 1+ \rho\) and \(\left(\begin{array}{c}\frac{1}{\sqrt{2}}\\ -\frac{1}{\sqrt{2}} \end{array}\right)\) for \(\lambda_2 = 1- \rho\)

Some properties of the eigenvalues of the variance-covariance matrix are to be considered at this point. Suppose that λ1 through λp are the eigenvalues of the variance-covariance matrix Σ. By definition, the total variation is given by the sum of the variances. It turns out that this is also equal to the sum of the eigenvalues of the variance-covariance matrix. Thus, the total variation is:

\(\sum_{j=1}^{p}s^2_j = s^2_1 + s^2_2 +\dots + s^2_p = \lambda_1 + \lambda_2 + \dots + \lambda_p = \sum_{j=1}^{p}\lambda_j\)

The generalized variance is equal to the product of the eigenvalues:

\(|\Sigma| = \prod_{j=1}^{p}\lambda_j = \lambda_1 \times \lambda_2 \times \dots \times \lambda_p\)

4.6 - Geometry of the Multivariate Normal Distribution

The geometry of the multivariate normal distribution can be investigated by considering the orientation, and shape of the prediction ellipse as depicted in the following diagram:


The (1 - α) x 100% prediction ellipse above is centered on the population means μ1 and μ2.

The ellipse has axes pointing in the directions of the eigenvectors e1, e2, ..., ep. Here, in this diagram for the bivariate normal, the longest axis of the ellipse points in the direction of the first eigenvector e1 and the shorter axis is perpendicular to the first, pointing in the direction of the second eigenvector e2.

The corresponding half-lengths of the axes are obtained by the following expression:

\(l_j = \sqrt{\lambda_j\chi^2_{p,\alpha}}\)

The plot above captures the lengths of these axes within the ellipse.

The volume (area) of the hyper-ellipse is equal to:


Note that this is a function of the square-root of the generalized variance (given by the square root of the determinant of the variance-covariance matrix). Thus, the volume (area) of the prediction ellipse is proportional to the square root of the generalized variance.

In this expression for the volume (area) of the hyper-ellipse, Γ(x) is the gamma function. To compute the gamma function, consider the two special cases:

-p even

\(\Gamma\left(\frac{p}{2}\right) = \left(\frac{p}{2}-1\right)!\)

-p odd

\(\Gamma\left(\frac{p}{2}\right) = \frac{1 \times 3 \times 5 \times \dots \times (p-2) \times \sqrt{\pi}}{2^{(p-1)/2}}\)

We shall illustrate the shape of the multivariate normal distribution using the Wechsler Adult Intelligence Scale data.

4.7 - Example: Wechsler Adult Intelligence Scale

Here we have data on n = 37 subjects taking the Wechsler Adult Intelligence Test . This test is broken up into four different components:

The data are stored in the file named wechslet.txt in five different columns. The first column is the ID number of the subjects, followed by the four component tasks in the remaining four columns.

These data may be analyzed using the sas program wechsler.sas shown below.

SAS Program

inspect program

Walk through the procedures of the program by clicking on the "Inspect" button. You can also launch the program by clicking on the "Launch SAS" button on the right. Just as in previous lessons, marking up a print out of the SAS program is also a good strategy for learning how this program is put together.

The SAS output, (wechsler.lst), gives the results of the data analyses. Because the SAS output is usually a relatively long document, printing these pages of output out and marking them with notes is highly recommended if not required!

In the output, means and standard deviations for all four variables are given at the top of second page of the output. Means are given in the first row. Those means have been copied into the table shown below:


Click on the graphic or the link below to walk through how to produce the covariance matrix for the Wechsler Adult Intelligence Test data using Minitab.

minitab dialog box

Producing the covariance matrix using Minitab

Sample Means:

Picture Completion

Variance-Covariance Matrix

\(\textbf{S} = \left(\begin{array}{rrrr}11.474 & 9.086 & 6.383 & 2.071\\ 9.086 & 12.086 & 5.938 & 0.544\\ 6.383 & 5.938 & 11.090 & 1.791\\ 2.071 & 0.544 & 1.791 & 3.694 \end{array}\right)\)

Here, for example, the variance for Information was 11.474. For Similarities it was 12.086. The covariance between Similarities and Information is 9.086. The total variance, which is the sum of the variances comes out to be 38.344, approximately.

The eigenvalues are given below:

\(\lambda_1 = 26.245\),  \(\lambda_2 = 6.255\),  \(\lambda_3 = 3.932\),  \(\lambda_4 = 1.912\)

and finally at the bottom of the table we have the corresponding eigenvectors. They have been listed here below:

\(\mathbf{e_1}=\left(\begin{array}{r}0.606\\0.605\\0.505\\0.110 \end{array}\right)\),  \(\mathbf{e_2}=\left(\begin{array}{r}-0.218\\-0.496\\0.795\\0.274 \end{array}\right)\),  \(\mathbf{e_3}=\left(\begin{array}{r}0.461\\-0.320\\-0.335\\0.757 \end{array}\right)\),  \(\mathbf{e_4}=\left(\begin{array}{r}-0.611\\0.535\\-0.035\\0.582 \end{array}\right)\)

For example, the eigenvectors corresponding the the eigenvalue 26.245, those elements are 0.606, 0.605, 0.505, and 0.110.

Now, let's consider the shape of the 95% prediction ellipse formed by the multivariate normal distribution whose variance-covariance matrix is equal to the sample variance-covariance matrix which we just obtained.

Recall the formula for the half-lengths of the axis of this ellipse. This is equal to the square root of the eigenvalue times the critical value from a chi-square table.In this case, since we have four variables, this should be chi-square with four degrees of freedom. In this case, if we are going to consider a 95% prediction ellipse, the critical value for chi-square with four degrees of freedom is equal to 9.49 from the statistical table.

For looking at the first and longest axis of a 95% prediction ellipse, we substitute 26.245 for the largest eigenvalue, multiplied by 9.49 and take the square root. We end up with a 95% prediction ellipse with a half-length of 15.782 as shown below:

\(l_1=\sqrt{\lambda_1\chi^2_{4,0.05}} = \sqrt{26.245 \times 9.49} = 15.782\)

The direction of the axis is given by the first eigenvector. Looking at this first eigenvector we can see large positive elements corresponding to the first three variables. In other words, large elements for Information, Similarities, and Arithmetic. This suggests that this particular axis points in the direction specified by e1; that is, increasing values of Information, Similarities, and Arithmetic.

The half-length of second longest axis can be obtained by substituting 6.255 for the second eigenvalue, multiplying this by 9.49, and taking the square root. We obtain a half-length of about 7.7, or about half the length of the first axis.

\(l_2=\sqrt{\lambda_2\chi^2_{4,0.05}} = \sqrt{6.255 \times 9.49} = 7.705\)

So, if you were to picture this particular ellipse you would see that the second axis is about half the length of the first and longest axis.

Looking at the corresponding eigenvector, e2, we can see that this particular axis is pointed in the direction of points in the direction of increasing values for the third value, or Arithmetic and decreasing value for Similarities, the second variable.

Similar calculations can then be carried out for the third longest axis of the ellipse as shown below:

\(l_3=\sqrt{\lambda_3\chi^2_{4,0.05}} = \sqrt{3.931 \times 9.49} = 6.108\)

This third axis has half-length of 6.108, which is not much shorter or smaller than the second axis. It points in the direction of e3; that is, increasing values of Picture Completion and Information, and decreasing values of Similarities and Arithmetic.

The shortest axis has half-length of about 4.260 as show below:

\(l_4=\sqrt{\lambda_4\chi^2_{4,0.05}} = \sqrt{1.912 \times 9.49} = 4.260\)

It points in the direction of e3; that is, increasing values of Similarities and Picture Completion, and decreasing values of Information.

The overall shape of the ellipse can be obtained by comparing the lengths of the various axis. What we have here is basically and ellipse that is the shape of a slightly squashed football.

We can also obtain the volume of the hyper-ellipse using the formula that was given earlier. Again, our critical value from the chi-square, if we are looking at a 95% confidence ellipse, with four degrees of freedom is given at 9.49. Substituting into our expression we have the product of the eigenvalues in the square root. The gamma function is evaluated at 2, and gamma of 2 is simply equal to 1. Carrying out the math we end up with a volume of 15,613.132 as shown below:

\(\begin{array}{lcl} \frac{2\pi^{p/2}}{p\Gamma\left(\frac{p}{2}\right)}(\chi^2_{p,\alpha})^{p/2}|\Sigma|^{1/2} & = & \frac{2\pi^{p/2}}{p\Gamma\left(\frac{p}{2}\right)}(\chi^2_{p,\alpha})^{p/2}\sqrt{\prod_{j=1}^{p}\lambda_j} \\ & = & \frac{2\pi^2}{4\Gamma(2)}(9.49)^2\sqrt{26.245 \times 6.255 \times 3.932 \times 1.912}\\ & = & 444.429 \sqrt{1234.17086}\\ & = & 15613.132\end{array}\)

4.8 - Special Cases: p = 2

To further understand the shape of the multivariate normal distribution, let's return to the special case where we have p = 2 variables.

If ρ = 0, there is zero correlation, and the eigenvalues turn out to be equal to the variances of the two variables. So, for example, the first eigenvalue would be equal to σ21 and the second eigenvalue would be equal to σ22 as shown below:

\(\lambda_1 = \sigma^2_1\) and \(\lambda_2 = \sigma^2_2\)

the corresponding eigenvectors will have elements 1 and 0 for the first eigenvalue and 0 and 1 for the second eigenvalue.

\(\mathbf{e}_1 = \left(\begin{array}{c} 1\\ 0\end{array}\right)\), \(\mathbf{e}_2 = \left(\begin{array}{c} 0\\ 1\end{array}\right)\)

So, the axis of the ellipse, in this case, are parallel to the coordinate axis.

If there is zero correlation, and the variances are equal so that σ21 = σ22, then the eigenvalues will be equal to one another, and instead of an ellipse we will get a circle. In this special case we have a so-called circular normal distribution.

SAS plot

If the correlation is greater than zero, then the longer axis of the ellipse will have a positive slope.

SAS plot

Conversely, if the correlation is less than zero, then the longer axis of the ellipse will have a negative slope.

As the correlation approaches plus or minus 1, the larger eigenvalue will approach the sum of the two variances, and the smaller eigenvalue will approach zero:

\(\lambda_1 \rightarrow \sigma^2_1 +\sigma^2_2\) and \(\lambda_2 \rightarrow 0\)

So, what is going to happen in this case is that the ellipse becomes more and more elongated as the correlation approaches one.

SAS plot

The SAS program ellplot.sas below can be used to plot the 95% confidence ellipse corresponding to any specified variance-covariance matrix.

SAS program

inspect program


The bivaraite confidence interval for this example cannot be generated using Minitab.

4.9 - Summary

In this lesson we learned about:

Next, complete the homework problems that will give you a chance to put what you have learned to use...