This lesson is concerned with the multivariate normal distribution. Just as the univariate normal distribution tends to be the most important statistical distribution in univariate statistics, the multivariate normal distribution is the most important distribution in multivariate statistics.
The question one might ask is, "Why is the multivariate normal distribution so important?" There are three reasons why this might be so:
Upon completion of this lesson, you should be able to:
Before defining the multivariate normal distribution we will visit the univariate normal distribution. A random variable X is normally distributed with mean μ and variance σ^{2} if it has the probability density function of X as:
\(\phi(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\{\frac{1}{2\sigma^2}(x\mu)^2\}\)
This results in the usual bellshaped curve that you will see throughout statistics. In this expression, you will see the squared difference between the variable x and its mean, μ. This value will be minimum when x is equal to μ. The quantity 2σ^{2}(x  μ)^{2} will take its largest value when x is equal to μ or likewise, since the exponential function is a monotone function, the normal density takes a maximum value when x is equal to μ.
The variance σ^{2} defines the spread of the distribution about that maximum. If σ^{2} is large then the spread is going to be large, otherwise if the σ^{2} value is small then the spread will be small.
As shorthand notation we may use the expression below:
\(X \sim N(\mu, \sigma^2)\)
indicating that X is distributed according to (denoted by the wavey symbol 'tilde') a normal distribution (denoted by N), with mean μ and variance σ^{2}.
If we have a p x 1 random vector X that is distributed according to a multivariate normal distribution with population mean vector μ and population variancecovariance matrix Σ, then this random vector, X, will have the joint density function as shown in the expression below:
\(\phi(\textbf{x})=\left(\frac{1}{2\pi}\right)^{p/2}\Sigma^{1/2}\exp\{\frac{1}{2}(\textbf{x}\mathbf{\mu})'\Sigma^{1}(\textbf{x}\mathbf{\mu})\}\)
The Σ denotes determinant of the variancecovariance matrix Σ and inside the exponent Σ raised to the minus one is just the inverse of the variancecovariance matrix Σ. Again, this distribution will take maximum values when the vector X is equal to the mean vector μ, and decrease around that maximum.
If p is equal to 2, then we have just a bivariate normal distribution and this will yield a bellshaped curve but now in three dimensions.
The shorthand notation, similar to the univariate version above, is
\(\textbf{X} \sim N(\mathbf{\mu},\Sigma)\)
We use the expression the vector X 'is distributed as' multivariate normal with mean vector μ and variancecovariance matrix Σ.
Some things to note about the multivariate normal distribution:
1. The following term appearing inside the exponent of the multivariate normal distribution is a quadratic form:
\((\textbf{x}\mathbf{\mu})'\Sigma^{1}(\textbf{x}\mathbf{\mu})\)
This particular quadratic form is also called the squared Mahalanobis distance between the random vector x and the mean vector μ.
2. If the variables are uncorrelated then the variancecovariance matrix will be a diagonal matrix with variances of the individual variables appearing on the main diagonal of the matrix and zeros everywhere else:
\(\Sigma = \left(\begin{array}{cccc}\sigma^2_1 & 0 & \dots & 0\\ 0 & \sigma^2_2 & \dots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & \sigma^2_p \end{array}\right)\)
In this case the multivariate normal density function simplifies to the expression below:
\(\phi(\textbf{x}) = \prod_{j=1}^{p}\frac{1}{\sqrt{2\pi\sigma^2_j}}\exp\{\frac{1}{2\sigma^2_j}(x_j\mu_j)^2\}\)
Note that the product term, given by 'captial' pi, (Π), acts very much like the summation sign, but instead of adding we multiply over the elements ranging from j=1 to j=p. Inside this product is the familiar univariate normal distribution where the random variables are subscripted by j. In this case, the elements of the random vector, X_{1}, X_{2},., X_{p}, are going to be independent random variables.
3. We could also consider linear combinations of the elements of a multivariate normal random variable as shown in the expression below:
\(Y = \sum_{j=1}^{p}c_jX_j =\textbf{c}'\textbf{X}\)
Note that to define a linear combination, the random variables X_{j }need not be uncorrelated. The coefficients c_{j} are chosen arbitrarily, specific values are selected according to the problem of interest and so is influenced very much by subject matter knowledge. Looking back at the Women's Nutrition Survey Data, for example, we selected the coefficients to obtain the total intake of vitamins A and C.
Now suppose that the random vector X is multivariate normal with mean μ and variancecovariance matrix Σ.
\(\textbf{X} \sim N(\mathbf{\mu},\Sigma)\)
Then Y is normally distributed with mean:
\(\textbf{c}'\mathbf{\mu} = \sum_{j=1}^{p}c_j\mu_j\)
and variance:
\(\textbf{c}'\Sigma \textbf{c} =\sum_{j=1}^{p}\sum_{k=1}^{p}c_jc_k\sigma_{jk}\)
See previous lesson to review the computation of the population mean of a linear combination of random variables.
In summary, Y is normally distributed with mean c transposed μ and variance c transposed Σ times c.
\(Y \sim N(\textbf{c}'\mathbf{\mu},\textbf{c}'\Sigma\textbf{c}')\)
As we have seen before, these quantities may be estimated using sample estimates of the population parameters.
For variables with a multivariate normal distribution with mean vector μ and covariance matrix Σ, some useful facts are:
Measurements were taken on n heartattack patients on their cholesterol levels. For each patient, measurements were taken 0, 2, and 4 days following the attack. Treatment was given to reduce cholesterol level. The sample mean vector is:
Variable  Mean 
X_{1} = 0Day  259.5 
X_{2} = 2Day  230.8 
X_{3} = 4Day  221.5 
The covariance matrix is
0Day  2Day  4day  
0Day  2276  1508  813 
2Day  1508  2206  1349 
4Day  813  1349  1865 
Suppose that we are interested in the difference X_{1} – X_{2}, the difference between the 0day and the 2day measurements. We can write the linear combination of interest as
\(\textbf{a}'\textbf{x}= \left(\begin{array}{ccc}1 & 1 & 0 \end{array}\right)
\left(\begin{array}{c}x_1\\ x_2\\ x_3 \end{array}\right)\)
The mean value for the difference is \(\left(\begin{array}{ccc}1 & 1 & 0 \end{array}\right)\left(\begin{array}{c}259.5\\230.8\\221.5 \end{array}\right) = 28.7\).
The variance is \( \left(\begin{array}{ccc}1 & 1 & 0 \end{array}\right) \left(\begin{array}{ccc}2276 & 1508 & 813\\ 1508 & 2206 & 1349\\ 813 & 1349 & 1865 \end{array}\right) \left(\begin{array}{c}1\\1\\0 \end{array}\right)= \left(\begin{array}{ccc}768 & 698 & 536 \end{array}\right)\left(\begin{array}{c}1\\1\\0 \end{array}\right) = 1466\).
If we assume the three measurements have a multivariate normal distribution, then the distribution of the difference X_{1} – X_{2} has a univariate normal distribution.
To further understand the multivariate normal distribution it is helpful to look at the bivariate normal distribution. Here our understanding is facilitated by being able to draw pictures of what this distribution looks like.
We have just two variables, X_{1} and X_{2} and that these are bivariately normally distributed with mean vector components μ_{1} and μ_{2} and variancecovariance matrix takes the form as shown below:
\(\left(\begin{array}{c}X_1\\X_2 \end{array}\right) \sim N \left[\left(\begin{array}{c}\mu_1\\ \mu_2 \end{array}\right), \left(\begin{array}{cc}\sigma^2_1 & \rho \sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma^2_2 \end{array}\right)\right]\)
In this case we have the variances for the two variables on the diagonal and on the offdiagonal we have the covariance between the two variables. This covariance is equal to the correlation times the product of the two standard deviations. The determinant of the variancecovariance matrix is simply equal to the product of the variances times 1 minus the squared correlation.
\(\Sigma = \sigma^2_1\sigma^2_2(1\rho^2)\)
The inverse of the variancecovariance matrix takes the form below:
\(\Sigma^{1} = \frac{1}{\sigma^2_1\sigma^2_2(1\rho^2)} \left(\begin{array}{cc}\sigma^2_2 & \rho \sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma^2_1 \end{array}\right)\)
Substituting in the expressions for the determinant and the inverse of the variancecovariance matrix we obtain, after some simplification, the joint probability density function of (X_{1}, X_{2}) for the bivariate normal distribution as shown below:
\(\phi(x_1,x_2) = \frac{1}{2\pi\sigma_1\sigma_2\sqrt{1\rho^2}}\exp\{\frac{1}{2(1\rho^2)}\left[\left(\frac{x_1\mu_1}{\sigma_1}\right)^22\rho \left(\frac{x_1\mu_1}{\sigma_1}\right) \left(\frac{x_2\mu_2}{\sigma_2}\right) + \left(\frac{x_2\mu_2}{\sigma_2}\right)^2 \right]\}\)
The links to the following three plots are plots of the bivariate distribution for the various values for the correlation row.
The first plot shows the case where the correlation ρ is equal to zero. This special case is called the circular normal distribution. Here, we have a perfectly symmetric bellshaped curve in three dimensions.  
As ρ increases that bellshaped curve becomes flattened on the 45 degree line. So for ρ equals 0.7 we can see that the curve extends out towards minus 4 and plus 4 and becomes flattened in the perpendicular direction.  
Increasing ρ to 0.9 the curve becomes broader and the 45 degree line and even flatter still in the perpendicular direction. 
These three curves were produced using the SAS program normplot.sas shown below. The desired correlation is specified in the third line of the SAS code (here at 0.9). No other changes are required to run this program. It would be a good idea to try this program for various values of r between 1 and 1 to explore how the shape of the normal distribution varies with the correlation.
Note that this code assumes that the variances are both equal to one.

Click on the arrow in the window below to see how you can use Minitab to create plots of the bivariate distribution. You will need the formula that is found in the text file phi_equation_r=0.7.txt.
Recall the Multivariate Normal Density function below:
\(\phi(\textbf{x}) = \left(\frac{1}{2\pi}\right)^{p/2}\Sigma^{1/2}\exp\{\frac{1}{2}(\textbf{x}\mathbf{\mu})'\Sigma^{1}(\textbf{x}\mathbf{\mu})\}\)
You will note that this density function, φ(x), only depends on x through the squared Mahalanobis distance:
\((\textbf{x}\mathbf{\mu})'\Sigma^{1}(\textbf{x}\mathbf{\mu})\)
This is the equation for a hyperellipse centered at μ.
For a bivariate normal, where p = 2 variables, we have an ellipse as shown in the plot below:
If we define a specific hyperellipse by taking the squared Mahalanobis distance equal to a critical value of the chisquare distribution with p degrees of freedom and evaluate this at α, then the probability that the random value X will fall inside the ellipse is going to be equal to 1  α.
\(\text{Pr}\{(\textbf{x}\mathbf{\mu})'\Sigma^{1}(\textbf{x}\mathbf{\mu}) \le \chi^2_{p,\alpha}\}=1\alpha\)
This particular ellipse is called the (1  α) x 100% prediction ellipse for a multivariate normal random vector with mean vector μ and variancecovariance matrix Σ.
SAS does not provide Mahalanobis distance directly, but we can compute them using principal components. The steps are:
Suppose we have four xvariables, called x1, x2, x3, x4, and they have already been read into SAS. The following SAS code (mahalonobis.sas) will determine standardized principal components and calculate Mahalanobis distances (the printout will include observation numbers). Within the DATA step, the “uss(of prin1prin4)” function calculates the uncorrected sum of squares for the variables prin1prin4. This value will be computed for each observation in the “pcout” data set. The result of the DATA step will be a SAS data set named “mahal” that will include the original variables, the standardized principal component scores (named prin1prin4) and the Mahalanobis distance (named dist2).

Click on the graphic or the link below to walk through how to find Mahalonobis distances using Minitab.
The variable \(d^2 = (\textbf{x}\mathbf{\mu})'\Sigma^{1}(\textbf{x}\mathbf{\mu})\) has a chisquare distribution with p degrees of freedom, and for “large” samples the observed Mahalanobis distances have an approximate chisquare distribution. This result can be used to evaluate (subjectively) whether a data point may be an outlier and whether observed data may have a multivariate normal distribution.
A QQ plot can be used to picture the Mahalanobis distances for the sample. The basic idea is the same as for a normal probability plot. For multivariate data, we plot the ordered Mahalanobis distances versus estimated quantiles (percentiles) for a sample of size n from a chisquared distribution with p degrees of freedom. This should resemble a straightline for data from a multivariate normal distribution. Outliers will show up as points on the upper right side of the plot for which the Mahalanobis distance is notably greater than the chisquare quantile value.
Determining the Quantiles
This example reproduces Example 4.14 in the text (page 187) . For each of n = 30 boards, there are p = 4 measurements of board stiffness. Each measurement was done using a different method.
A SAS plot of the Mahalanobis distances is given below. The distances are on the vertical and the chisquare quantiles are on the horizontal. At the right side of the plot we see an upward bending. This indicates possible outliers (and a possible violation of multivariate normality). In particular, the final point has d^{2 }≈ 16 whereas the quantile value on the horizontal is about 12.5. The nexttolast point on the plot might also be an outlier. A printout of the distances, before they were ordered for the plot, shows that the two possible outliers are boards 16 and 9, respectively.
The SAS code (Q_Qplot.sas) below is used to produce the graph just given follows.
The data step reads the dataset.
The proc princomp calculates the principal components and stores the standardized principal components in a dataset named pcresult.
The next data step calculates the Mahalanobis distances and keeps them in a dataset named mahal.
The proc print will print the distances (with observation numbers).
Next we sort the mahal dataset in order of the distances. This is to prepare for the QQ plot.
In the next data step we compute estimated quantiles of a chisquare distribution with df = 4. In the prb = line, the value 30 is the sample size and in the cinv function the value 4 is the df (because we have 4 variables.
Finally, the gplot procedure plots distances versus chisquare quantiles.

Click on the graphic or the link below to walk through how to produce a QQ plot for the borad stiffness dataset using Minitab.
The next thing that we would like to be able to do is to describe the shape of this ellipse mathematically so that we can understand how the data are distributed in multiple dimensions under a multivariate normal. To do this we first must define the eigenvalues and the eigenvectors of a matrix.
In particular we will consider the computation of the eigenvalues and eigenvectors of a symmetric matrix A as shown below:
\(\textbf{A} = \left(\begin{array}{cccc}a_{11} & a_{12} & \dots & a_{1p}\\ a_{21} & a_{22} & \dots & a_{2p}\\ \vdots & \vdots & \ddots & \vdots\\ a_{p1} & a_{p2} & \dots & a_{pp} \end{array}\right)\)
Note: we would call the matrix symmetric if the elements a_{ij} are equal to a_{ji} for each i and j.
Usually A is taken to be either the variancecovariance matrix Σ, or the correlation matrix, or their estimates S and R, respectively.
Eigenvalues and eigenvectors are used for:
For the present we will be primarily concerned with eigenvalues and eigenvectors of the variancecovariance matrix.
First of all let's define what these terms are...
Definition: If we have a p x p matrix A we are going to have p eigenvalues, λ_{1}, λ_{2} ... λ_{p}. They are obtained by solving the equation given in the expression below:
\(\textbf{A}\lambda\textbf{I}=0\)
On the lefthand side, we have the matrix A minus λ times the Identity matrix. When we calculate the determinant of the resulting matrix, we end up with a polynomial of order p. Setting this polynomial equal to zero, and solving for λ we obtain the desired eigenvalues. In general, we will have p solutions and so there are p eigenvalues, not necessarily all unique.
Definition: The corresponding eigenvectors e_{1}, e_{2}, ...,e_{p} are obtained by solving the expression below:
\((\textbf{A}\lambda_j\textbf{I})\textbf{e}_j = \mathbf{0}\)
Here, we have the difference between the matrix A minus the j^{th} eignevalue times the Identity matrix, this quantity is then multiplied by the j^{th} eigenvector and set it all equal to zero. This will obtain the eigenvector e_{j} associated with eigenvalue λ_{j}.
Note: This does not have a generally have a unique solution. So, to obtain a unique solution we will often require that e_{j} transposed e_{j} is equal to 1. Or, if you like, the sum of the square elements of e_{j} is equal to 1.
\(\textbf{e}'_j\textbf{e}_j = 1\)
Also note that eigenvectors corresponding to different eigenvalues are orthogonal. In situations, where two (or more) eigenvalues are equal, corresponding eigenvectors may still be chosen to be orthogonal.
Example: Consider the 2 x 2 matrix
To illustrate these calculations consider the correlation matrix R as shown below:
\(\textbf{R} = \left(\begin{array}{cc} 1 & \rho \\ \rho & 1 \end{array}\right)\)
Then, using the definition of the eigenvalues, we must calculate the determinant of R  λ times the Identity matrix.
So, R in the expression above is given in blue, and the Identity matrix follows in red, and λ here is the eigenvalue that we wish to solve for. Carrying out the math we end up with the matrix with 1  λ on the diagonal and ρ on the offdiagonal. Then calculating this determinant we obtain 1  λ squared minus ρ^{2}:
\(\left\begin{array}{cc}1\lambda & \rho \\ \rho & 1\lambda \end{array}\right = (1\lambda)^2\rho^2 = \lambda^22\lambda+1\rho^2\)
Setting this expression equal to zero we end up with the following...
\( \lambda^22\lambda+1\rho^2=0\)
To solve for λ we use the general result that any solution to the second order polynomial below:
\(ay^2+by+c = 0\)
is given by the following expression:
\(y = \frac{b\pm \sqrt{b^24ac}}{2a}\)
Here, a = 1, b = 2 (the term that precedes λ) and c is equal to 1  ρ^{2}. Substituting these terms in the equation above, we obtain that λ must be equal to 1 plus or minus the correlation ρ.
\(\lambda = \frac{2 \pm \sqrt{2^24(1\rho^2)}}{2} = 1\pm\sqrt{1(1\rho^2)} = 1 \pm \rho\)
Here we will take the following solutions:
\( \begin{array}{ccc}\lambda_1 & = & 1+\rho \\ \lambda_2 & = & 1\rho \end{array}\)
Next, to obtain the corresponding eigenvectors, we must solve a system of equations below:
\((\textbf{R}\lambda\textbf{I})\textbf{e} = \mathbf{0}\)
This equals R  λ times I, this product multiplied by the eigenvector quantity times the eigenvector e equal to 0. Or in other words, this is translated for this specific problem in the expression below:
\(\left\{\left(\begin{array}{cc}1 & \rho \\ \rho & 1 \end{array}\right)\lambda\left(\begin{array}{cc}1 &0\\0 & 1 \end{array}\right)\right \}\left(\begin{array}{c} e_1 \\ e_2 \end{array}\right) = \left(\begin{array}{c} 0 \\ 0 \end{array}\right)\)
This simplifies as follows:
\(\left(\begin{array}{cc}1\lambda & \rho \\ \rho & 1\lambda \end{array}\right) \left(\begin{array}{c} e_1 \\ e_2 \end{array}\right) = \left(\begin{array}{c} 0 \\ 0 \end{array}\right)\)
Yielding a system of two equations with two unknowns:
\(\begin{array}{lcc}(1\lambda)e_1 + \rho e_2 & = & 0\\ \rho e_1+(1\lambda)e_2 & = & 0 \end{array}\)
Note that this does not have a unique solution. If (e_{1}, e_{2}) is one solution, then a second solution can be obtained by multiplying the first solution by any nonzero constant c, i.e., (ce_{1}, ce_{2}). Therefore, we will require the additional condition that the sum of the squared values of e_{1} and e_{2} are equal to 1.
\(e^2_1+e^2_2 = 1\)
Consider the first equation:
\((1\lambda)e_1 + \rho e_2 = 0\)
Solving this equation for e_{2} and we obtain the following:
\(e_2 = \frac{(1\lambda)}{\rho}e_1\)
Substituting this into \(e^2_1+e^2_2 = 1\) we get the following:
\(e^2_1 + \frac{(1\lambda)^2}{\rho^2}e^2_1 = 1\)
Recall that \(\lambda = 1 \pm \rho\). In either case we end up finding that \((1\lambda)^2 = \rho^2\), so that the expression above simplifies to:
\(2e^2_1 = 1\)
Or, in other words:
\(e_1 = \frac{1}{\sqrt{2}}\)
Using the expression for e_{2} which we obtained above,
\(e_2 = \frac{1\lambda}{\rho}e_1\)
we get
\(e_2 = \frac{1}{\sqrt{2}}\) for \(\lambda = 1 + \rho\) and \(e_2 = \frac{1}{\sqrt{2}}\) for \(\lambda = 1\rho\)
Therefore, the two eigenvectors are given by the two vectors as shown below:
\(\left(\begin{array}{c}\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \end{array}\right)\) for \(\lambda_1 = 1+ \rho\) and \(\left(\begin{array}{c}\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \end{array}\right)\) for \(\lambda_2 = 1 \rho\)
Some properties of the eigenvalues of the variancecovariance matrix are to be considered at this point. Suppose that λ_{1} through λ_{p} are the eigenvalues of the variancecovariance matrix Σ. By definition, the total variation is given by the sum of the variances. It turns out that this is also equal to the sum of the eigenvalues of the variancecovariance matrix. Thus, the total variation is:
\(\sum_{j=1}^{p}s^2_j = s^2_1 + s^2_2 +\dots + s^2_p = \lambda_1 + \lambda_2 + \dots + \lambda_p = \sum_{j=1}^{p}\lambda_j\)
The generalized variance is equal to the product of the eigenvalues:
\(\Sigma = \prod_{j=1}^{p}\lambda_j = \lambda_1 \times \lambda_2 \times \dots \times \lambda_p\)
The geometry of the multivariate normal distribution can be investigated by considering the orientation, and shape of the prediction ellipse as depicted in the following diagram:
The (1  α) x 100% prediction ellipse above is centered on the population means μ_{1} and μ_{2}.
The ellipse has axes pointing in the directions of the eigenvectors e_{1}, e_{2}, ..., e_{p}. Here, in this diagram for the bivariate normal, the longest axis of the ellipse points in the direction of the first eigenvector e_{1} and the shorter axis is perpendicular to the first, pointing in the direction of the second eigenvector e_{2}.
The corresponding halflengths of the axes are obtained by the following expression:
\(l_j = \sqrt{\lambda_j\chi^2_{p,\alpha}}\)
The plot above captures the lengths of these axes within the ellipse.
The volume (area) of the hyperellipse is equal to:
\(\frac{2\pi^{p/2}}{p\Gamma\left(\frac{p}{2}\right)}(\chi^2_{p,\alpha})^{p/2}\Sigma^{1/2}\)
Note that this is a function of the squareroot of the generalized variance (given by the square root of the determinant of the variancecovariance matrix). Thus, the volume (area) of the prediction ellipse is proportional to the square root of the generalized variance.
In this expression for the volume (area) of the hyperellipse, Γ(x) is the gamma function. To compute the gamma function, consider the two special cases:
p even
\(\Gamma\left(\frac{p}{2}\right) = \left(\frac{p}{2}1\right)!\)
p odd
\(\Gamma\left(\frac{p}{2}\right) = \frac{1 \times 3 \times 5 \times \dots \times (p2) \times \sqrt{\pi}}{2^{(p1)/2}}\)
We shall illustrate the shape of the multivariate normal distribution using the Wechsler Adult Intelligence Scale data.
Here we have data on n = 37 subjects taking the Wechsler Adult Intelligence Test . This test is broken up into four different components:
The data are stored in the file named wechslet.txt in five different columns. The first column is the ID number of the subjects, followed by the four component tasks in the remaining four columns.
These data may be analyzed using the sas program wechsler.sas shown below.
Walk through the procedures of the program by clicking on the "Inspect" button. You can also launch the program by clicking on the "Launch SAS" button on the right. Just as in previous lessons, marking up a print out of the SAS program is also a good strategy for learning how this program is put together.
The SAS output, (wechsler.lst), gives the results of the data analyses. Because the SAS output is usually a relatively long document, printing these pages of output out and marking them with notes is highly recommended if not required!
In the output, means and standard deviations for all four variables are given at the top of second page of the output. Means are given in the first row. Those means have been copied into the table shown below:

Click on the graphic or the link below to walk through how to produce the covariance matrix for the Wechsler Adult Intelligence Test data using Minitab.
Sample Means:
Variable

Mean

Information 
12.568

Similarities 
9.568

Arithmetic 
11.486

Picture Completion 
7.973

VarianceCovariance Matrix
\(\textbf{S} = \left(\begin{array}{rrrr}11.474 & 9.086 & 6.383 & 2.071\\ 9.086 & 12.086 & 5.938 & 0.544\\ 6.383 & 5.938 & 11.090 & 1.791\\ 2.071 & 0.544 & 1.791 & 3.694 \end{array}\right)\)
Here, for example, the variance for Information was 11.474. For Similarities it was 12.086. The covariance between Similarities and Information is 9.086. The total variance, which is the sum of the variances comes out to be 38.344, approximately.
The eigenvalues are given below:
\(\lambda_1 = 26.245\), \(\lambda_2 = 6.255\), \(\lambda_3 = 3.932\), \(\lambda_4 = 1.912\)
and finally at the bottom of the table we have the corresponding eigenvectors. They have been listed here below:
\(\mathbf{e_1}=\left(\begin{array}{r}0.606\\0.605\\0.505\\0.110 \end{array}\right)\), \(\mathbf{e_2}=\left(\begin{array}{r}0.218\\0.496\\0.795\\0.274 \end{array}\right)\), \(\mathbf{e_3}=\left(\begin{array}{r}0.461\\0.320\\0.335\\0.757 \end{array}\right)\), \(\mathbf{e_4}=\left(\begin{array}{r}0.611\\0.535\\0.035\\0.582 \end{array}\right)\)
For example, the eigenvectors corresponding the the eigenvalue 26.245, those elements are 0.606, 0.605, 0.505, and 0.110.
Now, let's consider the shape of the 95% prediction ellipse formed by the multivariate normal distribution whose variancecovariance matrix is equal to the sample variancecovariance matrix which we just obtained.
Recall the formula for the halflengths of the axis of this ellipse. This is equal to the square root of the eigenvalue times the critical value from a chisquare table.In this case, since we have four variables, this should be chisquare with four degrees of freedom. In this case, if we are going to consider a 95% prediction ellipse, the critical value for chisquare with four degrees of freedom is equal to 9.49 from the statistical table.
For looking at the first and longest axis of a 95% prediction ellipse, we substitute 26.245 for the largest eigenvalue, multiplied by 9.49 and take the square root. We end up with a 95% prediction ellipse with a halflength of 15.782 as shown below:
\(l_1=\sqrt{\lambda_1\chi^2_{4,0.05}} = \sqrt{26.245 \times 9.49} = 15.782\)
The direction of the axis is given by the first eigenvector. Looking at this first eigenvector we can see large positive elements corresponding to the first three variables. In other words, large elements for Information, Similarities, and Arithmetic. This suggests that this particular axis points in the direction specified by e_{1}; that is, increasing values of Information, Similarities, and Arithmetic.
The halflength of second longest axis can be obtained by substituting 6.255 for the second eigenvalue, multiplying this by 9.49, and taking the square root. We obtain a halflength of about 7.7, or about half the length of the first axis.
\(l_2=\sqrt{\lambda_2\chi^2_{4,0.05}} = \sqrt{6.255 \times 9.49} = 7.705\)
So, if you were to picture this particular ellipse you would see that the second axis is about half the length of the first and longest axis.
Looking at the corresponding eigenvector, e_{2}, we can see that this particular axis is pointed in the direction of points in the direction of increasing values for the third value, or Arithmetic and decreasing value for Similarities, the second variable.
Similar calculations can then be carried out for the third longest axis of the ellipse as shown below:
\(l_3=\sqrt{\lambda_3\chi^2_{4,0.05}} = \sqrt{3.931 \times 9.49} = 6.108\)
This third axis has halflength of 6.108, which is not much shorter or smaller than the second axis. It points in the direction of e_{3}; that is, increasing values of Picture Completion and Information, and decreasing values of Similarities and Arithmetic.
The shortest axis has halflength of about 4.260 as show below:
\(l_4=\sqrt{\lambda_4\chi^2_{4,0.05}} = \sqrt{1.912 \times 9.49} = 4.260\)
It points in the direction of e_{4}; that is, increasing values of Similarities and Picture Completion, and decreasing values of Information.
The overall shape of the ellipse can be obtained by comparing the lengths of the various axis. What we have here is basically and ellipse that is the shape of a slightly squashed football.
We can also obtain the volume of the hyperellipse using the formula that was given earlier. Again, our critical value from the chisquare, if we are looking at a 95% prediction ellipse, with four degrees of freedom is given at 9.49. Substituting into our expression we have the product of the eigenvalues in the square root. The gamma function is evaluated at 2, and gamma of 2 is simply equal to 1. Carrying out the math we end up with a volume of 15,613.132 as shown below:
\(\begin{array}{lcl} \frac{2\pi^{p/2}}{p\Gamma\left(\frac{p}{2}\right)}(\chi^2_{p,\alpha})^{p/2}\Sigma^{1/2} & = & \frac{2\pi^{p/2}}{p\Gamma\left(\frac{p}{2}\right)}(\chi^2_{p,\alpha})^{p/2}\sqrt{\prod_{j=1}^{p}\lambda_j} \\ & = & \frac{2\pi^2}{4\Gamma(2)}(9.49)^2\sqrt{26.245 \times 6.255 \times 3.932 \times 1.912}\\ & = & 444.429 \sqrt{1234.17086}\\ & = & 15613.132\end{array}\)
To further understand the shape of the multivariate normal distribution, let's return to the special case where we have p = 2 variables.
If ρ = 0, there is zero correlation, and the eigenvalues turn out to be equal to the variances of the two variables. So, for example, the first eigenvalue would be equal to σ^{2}_{1} and the second eigenvalue would be equal to σ^{2}_{2} as shown below:
\(\lambda_1 = \sigma^2_1\) and \(\lambda_2 = \sigma^2_2\)
the corresponding eigenvectors will have elements 1 and 0 for the first eigenvalue and 0 and 1 for the second eigenvalue.
\(\mathbf{e}_1 = \left(\begin{array}{c} 1\\ 0\end{array}\right)\), \(\mathbf{e}_2 = \left(\begin{array}{c} 0\\ 1\end{array}\right)\)
So, the axis of the ellipse, in this case, are parallel to the coordinate axis.
If there is zero correlation, and the variances are equal so that σ^{2}_{1} = σ^{2}_{2}, then the eigenvalues will be equal to one another, and instead of an ellipse we will get a circle. In this special case we have a socalled circular normal distribution.
If the correlation is greater than zero, then the longer axis of the ellipse will have a positive slope.
Conversely, if the correlation is less than zero, then the longer axis of the ellipse will have a negative slope.
As the correlation approaches plus or minus 1, the larger eigenvalue will approach the sum of the two variances, and the smaller eigenvalue will approach zero:
\(\lambda_1 \rightarrow \sigma^2_1 +\sigma^2_2\) and \(\lambda_2 \rightarrow 0\)
So, what is going to happen in this case is that the ellipse becomes more and more elongated as the correlation approaches one.
The SAS program ellplot.sas below can be used to plot the 95% confidence ellipse corresponding to any specified variancecovariance matrix.

The bivaraite confidence interval for this example cannot be generated using Minitab.
In this lesson we learned about:
Next, complete the homework problems that will give you a chance to put what you have learned to use...