Constructing confidence intervals to estimate a population mean

Printer-friendly versionPrinter-friendly version

Previously we considered confidence intervals for 1-proportion and our multiplier in our interval used a z-value. But what if our variable of interest is a quantitative variable (e.g. GPA, Age, Height) and we want to estimate the population mean? In such a situation proportion confidence intervals are not appropriate since our interest is in a mean amount and not a proportion.

Therefore we apply similar techniques but now we are interested in estimating the population mean, μ, by using the sample statistic \(\bar{x}\) and the multiplier is a t-value. These t-values come from a t-distribution which is similar to the standard normal distribution from which the z-values came. The similarities are that the distribution is symmetrical and centered on 0. The difference is that when using a t-table we need to consider a new feature: degrees of freedom (df). This degree of freedom will be based on the sample size, n.

Initially we will consider confidence intervals for means of two situations:

  1. Confidence intervals for one mean
  2. Confidence intervals for a difference between two means when data is paired.

As we will see the interval calculations are identical just some notation differs. The reason for the similarity is that when we have paired data we can simply consider the differences to represent one set of data. So what is paired data?

Estimating a Population Mean μ

  • The sample statistic is the sample mean \(\bar{x}\)
  • The standard error of the mean is \(\frac{s}{\sqrt{n}}\) where s is the standard deviation of individual data values.
  • The multiplier, denoted by t*, is found using the t-table in the appendix of the book. It's a simple table. There are columns for .90, .95,.98, and .99 confidence. Use the row for df = n − 1.

Thus the formula for a confidence interval for the mean is \(\bar{x} \pm t^* \frac{s}{\sqrt{n}}\)

For large n, say over 30, using t* = 2 gives an approximate 95% confidence interval.

Example 1: In a class survey, students are asked if they are sleep deprived or not and also are asked how much they sleep per night. Summary statistics for the n = 22 students who said they are sleep deprived are:

minitab output of summary statistics for students who are sleep deprived, including the number of students 22, mean 5.77, standard deviation 1.572 and standard error of mean 0.335

  • Thus n = 22, \(\bar{x}\) = 5.77, s = 1.572, and standard error of the mean =  \(\frac{1.572}{\sqrt{22}}=0.335\) = 0.335
  • A confidence interval for the mean amount of sleep per night is 5.77 ± t* (0.335) for the population that feels sleep deprived.
  • Go to the t-table in the appendix of the book and use the df = 22 – 1 = 21 row. For 95% confidence the value of t* = 2.08.
  • A 95% confidence interval for μ is 5.77 ± (2.08) (0.335), which is 5.77 ± 0.70, or 5.07 to 6.7
  • Interpretation: With 95% confidence we estimate the population mean to be between 5.07 and 6.47 hours per night.

Example 1 Continued:

  • For a 99% confidence interval we would look under .99 in the df = 21 in the t-table. This gives t* = 2.83.
  • The 99% confidence interval is 5.77 ± (2.83) (0.335), which is 5.77 ± 0.95, or 4.82 to 6.72 hours per night.

Notice that the 99% confidence interval is wider than the 95% confidence interval. In the same situation, a higher confidence level gives a wider interval.

 Finding sample size for estimating a population mean

Calculating sample size for estimating a population mean is similar to that for estimating a population proportion: we solve for n in our margin for error.  However, since the t-distribution is not as “neat” as the standard normal distribution, the process can be iterative.  This means that we would solve, reset, solve, reset, etc. until we reached a conclusion.  Yet, we can avoid this iterative process if we employ an approximate method based on t-distribution approaching the standard normal distribution as the sample size increases.  This approximate method invokes the following formula:


where S is a sample standard deviation possibly based on prior studies or knowledge.