Lesson 0: Matrices and Vectors

Printer-friendly versionPrinter-friendly version

Introduction: Why Matrix Algebra?

Univariate statistics is concerned with random scalar variable \(Y\) .

In multivariate analysis, we are concerned with the joint analysis of multiple dependent variables. These variables can be represented using matrices and vectors. This provides simplification of notation and a format for expressing important formulas.

Example 1: Suppose that we measure the variables \(x_1\) = height (cm), \(x_2\) = left forearm length (cm) and \(x_3\) = left foot length for participants in a study of the physical characteristics of adult humans.  These three variables can be represented in the following column vector:

\[\mathbf{x}= \left(\begin{array}{l}x_1\\ x_2\\x_3 \end{array}\right)\]

The observed data for a specific individual, say the ith individual, might also be represented in an analogous vector. Suppose that the ith person in the sample has height = 175 cm, forearm length = 25.5 cm and foot length = 27 cm. In vector notation these observed data could be written as:

\[\mathbf{x_i} = \left(\begin{array}{l}x_{i1}\\x_{i2}\\x_{i3}\end{array}\right)=\left(\begin{array}{l}175\\25.5\\27.0\end{array}\right)\]

Notice the use and placement of the subscript i to represent the ith individual.

Definitions of Matrix and Vector

  • A matrix is two-dimensional array of numbers of formulas.
  • A vector is a matrix with either only one column or only one row. A column vector has only one column. A row vector has only one row.
  • The dimension of a matrix is expressed as number of rows × number of columns. For instance, a matrix with 10 rows and 3 columns is said to be a 10 × 3 matrix. The vectors written in Example 1 above are 3 × 1 matrices.
  • A square matrix is one for which the numbers of rows and columns are the same. For instance, a 4 × 4 matrix is a square matrix.

The Data Matrix in Multivariate Problems

Usually the observed data are represented by a matrix in which the rows are observations and the columns are variables. This is exactly the way the data are normally prepared for statistical software such as SAS or Minitab. 

The usual notation is n = the number of observed units (people, animals, companies, etc.) and p = number of variables measured on each unit. Thus the data matrix will be an n × p matrix.

Example 2: Suppose that we have scores for n = 6 college students who have taken the verbal and the science subtests of the College Qualification test (CQT). We have p =2 variables: (1) the verbal score and (2) the science score for each student.  The data matrix is the following 6 × 2 matrix:

\[\mathbf{X}=\left(\begin{array}{ll}41&26\\39&26\\53&21\\67&33\\61&27\\67&29\end{array}\right)\]

In the matrix just given, the first column gives the data for \(x_1\) = verbal score whereas the second column gives data for \(x_2\) = science score. Each row gives data for a student in the sample. To repeat – the rows are observations, the columns are variables.

Notation notes:

Note that we have used a small \(\textbf{x}\) to denote the vector of variables in Example 1 and a large \(\textbf{X}\) to represent the data matrix in Example 2. It should also be noted that, in matrix terms, the ith row in the data matrix \(\textbf{X}\) is the transpose of the data vector

\(\mathbf{x_i}=\left(\begin{array}{l}x_{i1}\\x_{i2}\end{array}\right)\), as we defined data vectors in Example 1.