8.4 - Comparing Two Population Means: Paired Data

Printer-friendly versionPrinter-friendly version

Unit Summary

  • Inferences About the Difference Between Two Population Means for Paired Data
  • The Paired t-Procedure
  • An Example for the Paired t-Test
  • Using Minitab to Perform a Paired t-Test

reading assignmentReading Assignment
An Introduction to Statistical Methods and Data Analysis, (see Course Schedule).

 

Inferences About the Difference Between Two Population Means for Paired Data

Paired samples: The sample selected from the first population is related to the corresponding sample from the second population.

It is important to distinguish independent samples and paired samples. Some examples are given as follows.

Compare the time that males and females spend watching TV.   

Think about the following, then click on the icon to the left to compare your answers.

try it!

A. We randomly select 20 males and 20 females and compare the average time they spend watching TV. Is this an independent sample or paired sample?  

try it!

B. We randomly select 20 couples and compare the time the husbands and wives spend watching TV. Is this an independent sample or paired sample?

The paired t-test will be used when handling hypothesis testing for paired data.

The Paired t-Procedure

Assumptions:

  1. Paired samples
  2. The differences of the pairs follow a normal distribution or the number of pairs is large (note here that if the number of pairs is < 30, we need to check whether the differences are normal, but we do not need to check for the normality of each population)

Hypothesis:

\(H_0: \mu_d = 0\)
\(H_a: \mu_d \ne 0\)

OR

\(H_0: \mu_d = 0\)
\(H_a: \mu_d < 0\)

OR

\(H_0: \mu_d = 0\)
\(H_a: \mu_d > 0\)

t-statistic:

Let  d = differences between the pairs of data,  then \(\bar{d}\) = mean of these differences.

The test statistics is: \(t^{*}=\frac{\bar{d}-0}{{s_d }/\sqrt{n}}\)

degrees of freedom = n - 1
where n denotes the number of pairs or the number of differences.

Paired t-interval:

\[\bar{d}\pm t_{\alpha/2} \cdot \frac{s_d}{\sqrt{n}}\]

Note: \(s_{\bar{d}=\frac{s_d}{\sqrt{n}}}\) where \(s_{\bar{d}}\) is the standard deviation of the sample differences.

image of a glass of waterExample: Drinking Water

Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water (zinc_conc.txt).

Does the data suggest that the true average concentration in the bottom water exceeds that of surface water?

 
Location
 
1
2
3
4
5
6
7
8
9
10
Zinc
concentration in
bottom water
.430
.266
.567
.531
.707
.716
.651
.589
.469
.723
Zinc
concentration in
surface water
.415
.238
.390
.410
.605
.609
.632
.523
.411
.612

To perform a paired t-test for the previous trace metal example:

Assumptions:

1. Is this a paired sample? - Yes.

2. Is this a large sample? - No.

3. Since the sample size is not large enough (less than 30), we need to check whether the differences follow a normal distribution.

In Minitab, we can use Calc > calculator to obtain diff = bottom - surface and then perform a probability plot on the differences.

Thus, we conclude that the difference may come from a normal distribution.

Step 1. Set up the hypotheses:

\(H_0: \mu_d = 0\)
\(H_a: \mu_d > 0\)

where 'd' is defined as the difference of bottom - surface.

Step 2. Write down the significance level \(\alpha = 0.05\).

Step 3. What is the critical value and the rejection region?

\(\alpha = 0.05\), df = 9
\(t_{0.05} = 1.833\)
rejection region: \( t > 1.833\)

Step 4. Compute the value of the test statistic:

\[t^{*}=\frac{\bar{d}}{\frac{s_d }{\sqrt{n}}}=\frac{0.0804}{\frac{0.0523}{\sqrt{10}}}=4.86\]

Step 5. Check whether the test statistic falls in the rejection region and determine whether to reject Ho.

\(t^* = 4.86 > 1.833\)
reject \(H_0\)

Step 6. State the conclusion in words.

At \(\alpha = 0.05\), we conclude that, on average, the bottom zinc concentration is higher than the surface zinc concentration.

Minitab logoUsing Minitab to Perform a Paired t-Test

You can used a paired t-test in Minitab to perform the test. Alternatively, you can perform a 1-sample t-test on difference = bottom - surface.

1. Stat > Basic Statistics > Paired t

2. Click 'Options' to specify the confidence level for the interval and the alternative hypothesis you want to test.  The default null hypothesis is 0.

The Minitab output for paired T for bottom - surface is as follows:

Paired T for bottom - surface

 
N
Mean
StDev
SE Mean
bottom
10
0.5649
0.1468
0.0464
surface
10
0.4845
0.1312
0.0415
Difference
10
0.0804
0.0523
0.0165

95% lower bound for mean difference: 0.0505
T-Test of mean difference = 0 (vs > 0): T-Value = 4.86 P-Value = 0.000

Note: In Minitab, if you choose a lower-tailed or an upper-tailed hypothesis test, an upper or lower confidence bound will be constructed, respectively, rather than a confidence interval.

Minitab Movie icon Click on the 'Minitab Movie' icon to display a walk through of 'Conducting a Paired t-Test'.

Using the p-value to draw a conclusion about our example:

p-value = 0.000 < 0.05

Reject \(H_0\) and conclude that bottom zinc concentration is higher than surface zinc concentration.

Note: For the zinc concentration problem, if you do not recognize the paired structure, but mistakenly use the 2-sample t-test treating them as independent samples, you will not be able to reject the null hypothesis. This demonstrates the importance of distinguishing the two types of samples. Also, it is wise to design an experiment efficiently whenever possible.

 What if the assumption of normality is not satisfied? In this case we would use a nonparametric 1-sample test on the difference.