11.1 - Principal Component Analysis (PCA) Procedure

Printer-friendly versionPrinter-friendly version

Suppose that we have a random vector X.

\(\textbf{X} = \left(\begin{array}{c} X_1\\ X_2\\ \vdots \\X_p\end{array}\right)\)

with population variance-covariance matrix

\(\text{var}(\textbf{X}) = \Sigma = \left(\begin{array}{cccc}\sigma^2_1 & \sigma_{12} & \dots &\sigma_{1p}\\ \sigma_{21} & \sigma^2_2 & \dots &\sigma_{2p}\\  \vdots & \vdots & \ddots & \vdots \\ \sigma_{p1} & \sigma_{p2} & \dots & \sigma^2_p\end{array}\right)\)

Consider the linear combinations

\(\begin{array}{lll} Y_1 & = & e_{11}X_1 + e_{12}X_2 + \dots + e_{1p}X_p \\ Y_2 & = & e_{21}X_1 + e_{22}X_2 + \dots + e_{2p}X_p \\ & & \vdots \\ Y_p & = & e_{p1}X_1 + e_{p2}X_2 + \dots +e_{pp}X_p\end{array}\)

Each of these can be thought of as a linear regression, predicting Yi from X1, X2, ... , Xp. There is no intercept, but ei1, ei2, ..., eip can be viewed as regression coefficients.

Note that Yi is a function of our random data, and so is also random. Therefore it has a population variance

\[\text{var}(Y_i) = \sum_{k=1}^{p}\sum_{l=1}^{p}e_{ik}e_{il}\sigma_{kl} = \mathbf{e}'_i\Sigma\mathbf{e}_i\]

Moreover, Yi and Yj have population covariance

\[\text{cov}(Y_i, Y_j) = \sum_{k=1}^{p}\sum_{l=1}^{p}e_{ik}e_{jl}\sigma_{kl} = \mathbf{e}'_i\Sigma\mathbf{e}_j\]

Collect the coefficients eij into the vector

\(\mathbf{e}_i = \left(\begin{array}{c} e_{i1}\\ e_{i2}\\ \vdots \\ e_{ip}\end{array}\right)\)

First Principal Component (PCA1): Y1

 The first principal component is the linear combination of x-variables that has maximum variance (among all linear combinations).  It accounts for as much variation in the data as possible.


Specifically we define coefficients e11, e12, ... , e1p for the first component in such a way that its variance is maximized, subject to the constraint that the sum of the squared coefficients is equal to one. This constraint is required so that a unique answer may be obtained.

More formally, select e11, e12, ... , e1p that maximizes

\[\text{var}(Y_1) = \sum_{k=1}^{p}\sum_{l=1}^{p}e_{1k}e_{1l}\sigma_{kl} = \mathbf{e}'_1\Sigma\mathbf{e}_1\]

subject to the constraint that

\[\mathbf{e}'_1\mathbf{e}_1 = \sum_{j=1}^{p}e^2_{1j} = 1\]

Second Principal Component (PCA2): Y2

The second principal component is the linear combination of x-variables that accounts for as much of the remaining variation as possible, with the constraint that the correlation between the first and second component is 0

Select e21, e22, ... , e2p that maximizes the variance of this new component...

\[\text{var}(Y_2) = \sum_{k=1}^{p}\sum_{l=1}^{p}e_{2k}e_{2l}\sigma_{kl} = \mathbf{e}'_2\Sigma\mathbf{e}_2\]

subject to the constraint that the sums of squared coefficients add up to one,

\[\mathbf{e}'_2\mathbf{e}_2 = \sum_{j=1}^{p}e^2_{2j} = 1\]

along with the additional constraint that these two components are uncorrelated.

\[\text{cov}(Y_1, Y_2) = \sum_{k=1}^{p}\sum_{l=1}^{p}e_{1k}e_{2l}\sigma_{kl} = \mathbf{e}'_1\Sigma\mathbf{e}_2 = 0\]

All subsequent principal components have this same property – they are linear combinations that account for as much of the remaining variation as possible and they are not correlated with the other principal components.

We will do this in the same way with each additional component. For instance:

ith Principal Component (PCAi): Yi

We select ei1, ei2, ... , eip to maximize

\[\text{var}(Y_i) = \sum_{k=1}^{p}\sum_{l=1}^{p}e_{ik}e_{il}\sigma_{kl} = \mathbf{e}'_i\Sigma\mathbf{e}_i\]

subject to the constraint that the sums of squared coefficients add up to one...along with the additional constraint that this new component is uncorrelated with all the previously defined components.

\(\mathbf{e}'_i\mathbf{e}_i = \sum_{j=1}^{p}e^2_{ij} = 1\)

\(\text{cov}(Y_1, Y_i) = \sum_{k=1}^{p}\sum_{l=1}^{p}e_{1k}e_{il}\sigma_{kl} = \mathbf{e}'_1\Sigma\mathbf{e}_i = 0\),

\(\text{cov}(Y_2, Y_i) = \sum_{k=1}^{p}\sum_{l=1}^{p}e_{2k}e_{il}\sigma_{kl} = \mathbf{e}'_2\Sigma\mathbf{e}_i = 0\),

\(\vdots\)

\(\text{cov}(Y_{i-1}, Y_i) = \sum_{k=1}^{p}\sum_{l=1}^{p}e_{i-1,k}e_{il}\sigma_{kl} = \mathbf{e}'_{i-1}\Sigma\mathbf{e}_i = 0\)

Therefore all principal components are uncorrelated with one another.