Lesson 1: Measures of Central Tendency, Dispersion and Association

Printer-friendly versionPrinter-friendly version

Introduction

Goal: A partial description of the joint distribution of the data is provided here. Three aspects of the data are of importance, the first two of which you should already be familiar with from univariate statistics. These are:

  1. Central Tendency. What is a typical value for each variable?
  2. Dispersion. How far apart are the individual observations from a central value for a given variable?
  3. Association. This might (or might not!) be a new measure for you. When more than one variable are studied together, how does each variable relate to the remaining variables? How are the variables simultaneously related to one another? Are they positively or negatively related?

Population Parameters and Sample Statistics

 Statistics, as a subject matter, is the science and art of using sample information to make generalizations about populations.

A population is the collection of all people, plants, animals, or objects of interest about which we wish to make statistical inferences (generalizations). The population may also be viewed as the collection of all possible random draws from a stochastic model; for example, independent draws from a normal distribution with a given population mean and population variance.

A population parameter is a numerical characteristic of a population. In nearly all statistical problems we do not know the value of a parameter because we do not measure the entire population. We use sample data to make an inference about the value of a parameter.

A sample is the subset of the population that we actually measure or observe.

A sample statistic is a numerical characteristic of a sample. A sample statistic estimates the unknown value of a population parameter. Information collected from sample statistic is sometimes refered to as Descriptive Statistic.

 Here are the Notations that will be used:

\(X_{ij}\) = Observation for variable j in subject i .

\(p\) = Number of variables

\(n\) = Number of subjects

In the example to come, we'll have data on 737 people (subjects) and 5 nutritional outcomes (variables). So, 

\(p\) = 5 variables

\(n\) = 737 subjects

In multivariate statistics we will always be working with vectors of observations. So in this case we are going to arrange the data for the p variables on each subject into a vector. In the expression below, \(\textbf{X}_i\) is the vector of observations for the \(i\)th subject,  \(i\) = 1 to \(n\) (737). Therefore, the data for the \(j\)th variable will be located in the \(j\)th element of this subject's vector, \(j\) = 1 to \(p\) (5).

\[\mathbf{X}_i = \left(\begin{array}{l}X_{i1}\\X_{i2}\\ \vdots \\ X_{ip}\end{array}\right)\]

Learning Objectives & Outcomes

Upon completion of this lesson, you should be able to do the following:

  • interpret measures of central tendancy, dispersion, and association;
  • calculate sample means, variances, covariances, and correlations using a hand calculator;
  • use software like SAS or Minitab to compute sample means, variances, covariances, and correlations.