Lesson 18: Correlation and Agreement


Many biostatistical analyses are conducted to study the relationship between two continuous or ordinal scale variables within a group of patients.

Purposes of these analyses include:

  1. assessing correlation between the two variables, i.e., identifying whether values of one variable tend to be higher (or possibly lower) for higher values of the other variable;
  2. assessing the amount of agreement between the values of the two variables, i.e., comparing alternative ways of measuring or assessing the same response;
  3. assessing the ability of one variable to predict values of the other variable, i.e., formulating predictive models via regression analyses.

This lesson will focus only on correlation and agreement, (issues numbered 1 and 2 listed above).

Learning objectives & outcomes

Upon completion of this lesson, you should be able to do the following:

18.1 - Pearson Correlation Coefficient

Correlation is a general method of analysis useful when studying possible association between two continuous or ordinal scale variables. Several measures of correlation exist. The appropriate type for a particular situation depends on the distribution and measurement scale of the data. Three measures of correlation are commonly applied in biostatistics and these will be discussed below.

Suppose that we have two variables of interest, denoted as X and Y, and suppose that we have a bivariate sample of size n:

(X1 , Y1 ), (X2 , Y2 ), ... , (Xn , Yn )

and we define the following statistics:

\[\bar{X}=\frac{1}{n}\sum_{i=1}^{n}X_i , S_{XX}=\frac{1}{n-1}\sum_{i=1}^{n}(X_i-\bar{X})^2\]

\[\bar{Y}=\frac{1}{n}\sum_{i=1}^{n}Y_i , S_{YY}=\frac{1}{n-1}\sum_{i=1}^{n}(Y_i-\bar{Y})^2\]


These statistics above represent the sample mean for X, the sample variance for X, the sample mean for Y, the sample variance for Y, and the sample covariance between X and Y, respectively. These should be very familiar to you.

The sample Pearson correlation coefficient (also called the sample product-moment correlation coefficient) for measuring the association between variables X and Y is given by the following formula:


The sample Pearson correlation coefficient, rp , is the point estimate of the population Pearson correlation coefficient


The Pearson correlation coefficient measures the degree of linear relationship between X and Y and -1 ≤ rp ≤ +1, so that rp is a "unitless" quantity, i.e., when you construct the correlation coefficient the units of measurement that are used cancel out. A value of +1 reflects perfect positive correlation and a value of -1 reflects perfect negative correlation.

For the Pearson correlation coefficient, we assume that both X and Y are measured on a continuous scale and that each is approximately normally distributed.

The Pearson correlation coefficient is invariant to location and scale transformations. This means that if every Xi is transformed to

Xi * = aXi + b

and every Yi is transformed to

Yi * = cYi + d

where a > 0, b, c > 0, and d are constants, then the correlation between X and Y is the same as the correlation between X* and Y*.

With SAS, PROC CORR is used to calculate rp . The output from PROC CORR includes summary statistics for both variables and the computed value of rp . The output also contains a p-value corresponding to the test of:

H0 : ρp = 0 versus H0 : ρp ≠ 0

It should be noted that this statistical test generally is not very useful, and the associated p-value, therefore, should not be emphasized. What is more important is to construct a confidence interval.

The sampling distribution for Pearson's rp is not normal. In order to attain confidence limits for rp based on a standard normal distribution, we transform rp using Fisher's Z transformation to get a quantity, zp , that has an approximate normal distribution. Then we can work with this value. Here is what is involved in the transformation.

Fisher's Z transformation is defined as

\[z_p=\frac{1}{2}log_e\left( \frac{1+r_p}{1-r_p} \right) \sim N\left( \zeta_p , sd=\frac{1}{\sqrt{n-3}} \right)\]


\[\zeta_p=\frac{1}{2}log_e\left( \frac{1+\rho_p}{1-\rho_p} \right)\]

We will use this to get the usual confidence interval, so, an approximate 100(1 - α)% confidence interval for ζp is given by [zp, α/2 , zp, 1-α/2 ], where

\[z_{p , \alpha/2}=z_p-\left( t_{n-3 , 1-\alpha/2}/\sqrt{n-3} \right) , z_{p , 1-\alpha/2}=z_p+\left( t_{n-3 , 1-\alpha/2}/\sqrt{n-3} \right)\]

But really what we want is an approximate 100(1 - α)% confidence interval for ρp is given by [rp, α/2 , rp, 1-α/2 ], where

\[r_{p , \alpha/2}=\frac{exp(2z_{p , \alpha/2})-1}{exp(2z_{p , \alpha/2})+1},r_{p , 1-\alpha/2}=\frac{exp(2z_{p , 1-\alpha/2})-1}{exp(2z_{p , 1-\alpha/2})+1}\]

Again, you do not have to do this by hand. PROC CORR in SAS will do this for you but it is important to have an idea of what is going on.

18.2 - Spearman Correlation Coefficient

The Spearman rank correlation coefficient, rs , is a nonparametric measure of correlation based on data ranks. It is obtained by ranking the values of the two variables (X and Y) and calculating the Pearson rp on the resulting ranks, not the data itself. Again, PROC CORR will do all of these actual calculations for you.

The Spearman rank correlation coefficient has properties similar to those of the Pearson correlation coefficient, although the Spearman rank correlation coefficient quantifies the degree of linear association between the ranks of X and the ranks of Y. Also, rs does not estimate a natural population parameter (unlike Pearson's rp which estimates ρp ).

An advantage of the Spearman rank correlation coefficient is that the X and Y values can be continuous or ordinal, and approximate normal distributions for X and Y are not required. Similar to the Pearson rp , Fisher's Z transformation can be applied to the Spearman rs to get a statistic, zs , that has an asymptotic normal distribution for calculating an asymptotic confidence interval. Again, PROC CORR will do this as well.

18.3 - Kendall Tau-b Correlation Coefficient

The Kendall tau-b correlation coefficient, τb , is a nonparametric measure of association based on the number of concordances and discordances in paired observations.

Suppose two observations (Xi , Yi ) and (Xj , Yj ) are concordant if they are in the same order with respect to each variable. That is, if

(1) Xi < Xj and Yi < Yj , or if
(2) Xi > Xj and Yi > Yj

They are discordant if they are in the reverse ordering for X and Y, or the values are arranged in opposite directions. That is, if

(1) Xi < Xj and Yi > Yj , or if
(2) Xi > Xj and Yi < Yj

The two observations are tied if Xi = Xj and/or Yi = Yj .

The total number of pairs that can be constructed for a sample size of n is


N can be decomposed into these five quantities:

\[N = P + Q + X_0 + Y_0 + (XY)_0\]

where P is the number of concordant pairs, Q is the number of discordant pairs, X0 is the number of pairs tied only on the X variable, Y0 is the number of pairs tied only on the Y variable, and (XY)0 is the number of pairs tied on both X and Y.

The Kendall tau-b for measuring order association between variables X and Y is given by the following formula:


This value becomes scaled and ranges between -1 and +1. Unlike Spearman it does estimate a population variance as:

\[t_b \text{ is the sample estimate of } t_b = Pr[\text{concordance}] - Pr[\text{discordance}]\]

The Kendall tau-b has properties similar to the properties of the Spearman rs. Because the sample estimate, tb , does estimate a population parameter, tb , many statisticians prefer the Kendall tau-b to the Spearman rank correlation coefficient.

18.4 - Example - Correlation Coefficients

SAS Example (19.1_correlation.sas): Age and percentage body fat were measured in 18 adults. SAS PROC CORR provides estimates of the Pearson, Spearman, and Kendall correlation coefficients. It also calculates Fisher's Z transformation for the Pearson and Spearman correlation coefficients in order to get 95% confidence intervals.

SAS Program 19.1

The resulting estimates for this example are 0.7921, 0.7539, and 0.5762, respectively for the Pearson, Spearman, and Kendall correlation coefficients. The Kendall tau-b correlation typically is smaller in magnitude than the Pearson and Spearman correlation coefficients.

The 95% confidence intervals are (0.5161, 0.9191) and (0.4429, 0.9029), respectively for the Pearson and Spearman correlation coefficients. Because the Kendall correlation typically is applied to binary or ordinal data, its 95% confidence interval can be calculated via SAS PROC FREQ (this is not shown in the SAS program above).

18.5 - Use and Misuse of Correlation Coefficients

Correlation is a widely-used analysis tool which sometimes is applied inappropriately. Some caveats regarding the use of correlation methods follow.

1. The correlation methods discussed in this chapter should be used only with independent data; they should not be applied to repeated measures data where the data are not independent. For example, it would not be appropriate to use these measures of correlation to describe the relationship between Week 4 and Week 8 blood pressures in the same patients.

2. Caution should be used in interpreting results of correlation analysis when large numbers of variables have been examined, resulting in a large number of correlation coefficients.

3. The correlation of two variables that both have been recorded repeatedly over time can be misleading and spurious. Time trends should be removed from such data before attempting to measure correlation.

4. To extend correlation results to a given population, the subjects under study must form a representative (i.e., random) sample from that population. The Pearson correlation coefficient can be very sensitive to outlying observations and all correlation coefficients are susceptible to sample selection biases.

5. Care should be taken when attempting to correlate two variables where one is a part and one represents the total. For example, we would expect to find a positive correlation between height at age ten and adult height because the second quantity "contains" the first quantity.

6. Correlation should not be used to study the relation between an initial measurement, X, and the change in that measurement over time, Y - X. X will be correlated with Y - X due to the regression to the mean phenomenon.

7. Small correlation values do not necessarily indicate that two variables are unassociated. For example, Pearson's rp will underestimate the association between two variables that show a quadratic relationship. Scatterplots should always be examined.

8. Correlation does not imply causation. If a strong correlation is observed between two variables A and B, there are several possible explanations: (a) A influences B; (b) B influences A; (c) A and B are influenced by one or more additional variables; (d) the relationship observed between A and B was a chance error.

9. "Regular" correlation coefficients are often published when the researcher really intends to compare two methods of measuring the same quantity with respect to their agreement. This is a misguided analysis, because correlation measures only the degree of association; it does not measure agreement. The next section of this lesson will present a measure of agreement.

18.6 - Concordance Correlation Coefficient for Measuring Agreement

How well do two diagnostic measurements agree? Many times continuous units of measurement are used in the diagnostic test. We may not be interested in correlation or linear relationship between the two measures, but in a measure of agreement.

The concordance correlation coefficient, rc , for measuring agreement between continuous variables X and Y (both approximately normally distributed), is calculated as follows:


Similar to the other correlation coefficient, the concordance correlation satisfies -1 ≤ rc ≤ +1. A value of rc = +1 corresponds to perfect agreement. A value of rc = - 1 corresponds to perfect negative agreement, and a value of rc = 0 corresponds to no agreement. The sample estimate, rc , is an estimate of the population concordance correlation coefficient:


Let's look at an example that will help to make this concept clearer.

SAS Example (19.2_agreement_concordanc.sas) : The ACRN DICE trial was discussed earlier in this course. In that trial, participants underwent hourly blood draws between 08:00 PM and 08:00 AM once a week in order to determine the cortisol area-under-the-curve (AUC). The participants hated this! They complained about the sleep disruption every hour when the nurses came by to draw blood, so the ACRN wanted to determine for future studies if the cortisol AUC calculated on measurements every two hours was in good agreement with the cortisol AUC calculated on hourly measurements. The baseline data were used to investigate how well these two measurements agreed. If there is good agreement, the protocol could be changed to take blood every two hours.

Note for this SAS program - Run the program to view the output. This is higher level SAS than you are expected to program yourself in this course, but some of you may find the programming of interest.

The SAS program yielded rc = 0.95 and a 95% confidence interval = (0.93, 0.96). The ACRN judged this to be excellent agreement, so it will use two-hourly measurements in future studies.

What about binary or ordinal data? Cohen's Kappa Statistic will handle this...

18.7 - Cohen's Kappa Statistic for Measuring Agreement

Cohen's kappa statistic, κ , is a measure of agreement between categorical variables X and Y. For example, kappa can be used to compare the ability of different raters to classify subjects into one of several groups. Kappa also can be used to assess the agreement between alternative methods of categorical assessment when new techniques are under study.

Kappa is calculated from the observed and expected frequencies on the diagonal of a square contingency table. Suppose that there are n subjects on whom X and Y are measured, and suppose that there are g distinct categorical outcomes for both X and Y. Let fij denote the frequency of the number of subjects with the ith categorical response for variable X and the jth categorical response for variable Y.

Then the frequencies can be arranged in the following g × g table:

Y = 1
Y = 2
Y = g
X = 1
X = 2


X = g

The observed proportional agreement between X and Y is defined as:


and the expected agreement by chance is:


where fi+ is the total for the ith row and f+i is the total for the ith column. The kappa statistic is:


Cohen's kappa statistic is an estimate of the population coefficient:

\[\kappa=\frac{Pr[X=Y]-Pr[X=Y|X \text{ and }Y \text{ independent}]}{1-Pr[X=Y|X \text{ and }Y \text{ independent}]}\]

Generally, 0 ≤ κ ≤ 1, although negative values do occur on occasion. Cohen's kappa is ideally suited for nominal (non-ordinal) categories. Weighted kappa can be calculated for tables with ordinal categories.

SAS Example (19.3_agreement_Cohen.sas) : Two radiologists rated 85 patients with respect to liver lesions. The ratings were designated on an ordinal scale as:

0 ='Normal' 1 ='Benign' 2 ='Suspected' 3 ='Cancer'

SAS PROC FREQ provides an option for constructing Cohen's kappa and weighted kappa statistics.

SAS Program 19.3

The weighted kappa coefficient is 0.57 and the asymptotic 95% confidence interval is (0.44, 0.70). This indicates that the amount of agreement between the two radiologists is modest (and not as strong as the researchers had hoped it would be).

Note: Updated programs for examples 19.2 and 19.3 are in the folder for this lesson.  Take a look.

18.8 - Summary

In this lesson, among other things, we learned how to:

Let's put what we have learned to use by completing the following homework assignment:


Look for homework assignment and the dropbox in the folder for this week in ANGEL.