Lesson 9 - Statistical Test Using Rejection Region Approach, Statistical Test for Population Mean, How to Use Confidence Interval to Draw Conclusion About Two-Sided Test, Power and Sample Sizes

This lesson introduces the rejection region approach to hypothesis testing and compares it to the p-value approach. One sample t-test for population mean is introduced. The lesson is concluded by a discussion of computation of power and sample size for one sample t-test.

Lesson 9 Objectives

Upon successful completion of this lesson, you will be able to:

  • perform statistical test for population mean.
  • use confidence interval to draw conclusion about two-sided test.
  • learn how to calculate power and to choose the sample size for testing the population mean.

Lesson 9.1 - Statistical Test Using Rejection Region Approach

Unit Summary

  • Rejection Region Approach to Hypothesis Testing for One Proportion Problem
  • Comparing the P-Value Approach to the Rejection Region Approach

reading assignmentReading Assignment
An Introduction to Statistical Methods and Data Analysis, chapters 10.2, 5.6 and 5.7. Chapters 5.6 and 5.7 are mainly for Lesson 9.2 and 9.3, though.

Let's start out here by having Dr. Wiesner walk through a comparison of the p-value approach with the rejection region approach to hypothesis testing.

Rejection Region Approach to Hypothesis Testing for One Proportion Problem

One can perform hypothesis testing using the p-value approach, or one can perform hypothesis testing using a rejection region approach. The conclusions from the two approaches are exactly the same.

There are six parts of a test when using the rejection region approach:

  1. Null and alternative hypotheses
  2. Level of significance α
  3. Test statistics
  4. Critical values and rejection region
  5. Process of checking to see whether the test statistic falls in the rejection region
  6. Conclusion in words

Test statistic: The sample statistic one uses to either reject Ho (and conclude Ha) or not to reject Ho.

Critical values: The values of the test statistic that separate the rejection and non-rejection regions.

Rejection region: the set of values for the test statistic that leads to rejection of Ho.

Non-rejection region: the set of values not in the rejection region that leads to non-rejection of Ho.

As mentioned in lesson 8, the logic of hypothesis testing is to reject the null hypothesis if the sample data are not consistent with the null hypothesis. Thus, one rejects the null hypothesis if the observed test statistic is more extreme in the direction of the alternative hypothesis than one can tolerate. The critical values are the boundary values obtained corresponding to the preset α level.

One-proportion Z-test for π

Step 0. Check the conditions for the one-proportion z-test to be valid:

  1. nπo ≥ 5
  2. n(1 - πo) ≥ 5

Step 1. Set up the hypotheses as one of:

Two-tailed   Right-tailed   Left-tailed
Ho : π = πo
OR
Ho : π = πo
OR
Ho : π = πo
Ha : π ≠ πo   Ha : π > πo   Ha : π < πo

Step 2. Decide on the significance level, α .

Step 3. Compute the value of the test statistic:

\[z=\frac{\hat{\pi}-\pi_0}{\sqrt{\frac{\pi_0(1-\pi_0)}{n}}}\]

Step 4. Find the appropriate critical values for the tests using the z-table. Write down clearly the rejection region for the problem.

Step 5. Check to see if the value of the test statistic falls in the rejection region. If it does, then reject Ho (and conclude Ha). If it does not fall in the rejection region, do not reject Ho.

Step 6. State the conclusion in words.

Some expert claims that the probability of each person being left-handed is 0.25. It is observed that out of 30 randomly sampled people, 10 are left-handed. Using α = 0.05, is there sufficient evidence to conclude that the population proportion is different from 0.25?

a. Use the rejection region approach to perform the testing.

Step 0. Can we use the one-proportion z-test?

The answer is yes since the hypothesized value πo is 0.25 and we can check that:

nπo = 30 · 0.25 = 7.5 ≥ 5,
n(1 - πo) = 30 · (1 - 0.25) = 22.5 ≥ 5.

Step 1. Set up the hypotheses (since the research hypothesis is to check whether the proportion is different from 0.25, we set it up as a two-tailed test):

Ho: π = 0.25
Ha: π ≠ 0.25

Step 2. Decide on the significance level, α .

According to the question, α = 0.05.

Step 3. Compute the value of the test statistic:

\[z=\frac{\hat{\pi}-\pi_0}{\sqrt{\frac{\pi_0(1-\pi_0)}{n}}}=\frac{10/30-0.25}{\sqrt{\frac{0.25(1-0.25)}{30}}}=1.054\]

Step 4. Find the appropriate critical values for the test using the z-table. Write down clearly the rejection region for the problem. We can use Table 2 to find the value of Z0.025 since the row for df = ∞ (infinite) refers to the z-value.

From Table 2, Z0.025 is found to be 1.96 and thus the critical values are ± 1.96. The rejection region for the two-tailed test is given by:

z > 1.96 or z < -1.96

Step 5. Check whether the value of the test statistic falls in the rejection region. If it does, then reject Ho (and conclude Ha). If it does not fall in the rejection region, do not reject Ho.

The observed z-value is 1.05 and will be denoted as z*. Since z* does not fall within the rejection region, we do not reject Ho.

Step 6. State the conclusion in words.

Based on the observed data, there is not enough evidence to conclude that the population proportion of left-handed people is different from 0.25.

b. Use the p-value approach to perform the testing.

Step 0 - Step 3. The first few steps (Step 0 - Step 3) are exactly the same as the rejection region approach.

Step 4. In Step 4, we need to compute the p-value. Since it is a two-tailed test:

\(p-value=2\times P(z>|\frac {\hat{\pi}-\pi_0}{\sqrt{\frac{\pi_0 (1-\pi_0)}{n}}}|)\)
\(=2 \times P(z>|\frac{10/30-0.25}{\sqrt{\frac{0.25(1-0.25)}{30}}}|)\)
\(=2\times P(z>1.054)=0.2938\)

Step 5. Since p-value = 0.2938 > 0.05 (the α value), we cannot reject the null hypothesis.

Step 6. Conclusion in words:

Based on the observed data, there is insufficient evidence to conclude that the population proportion of left-handed people is different from 0.25.

Comparing the P-Value Approach to the Rejection Region Approach

Both approaches will ensure the same conclusion and either one will work. However, using the p-value approach has the following advantages:

  1. Using the rejection region approach, you need to check the table for the critical value every time people give you a different α value.
  2. In addition to just using it to reject or not reject Ho by comparing p-value to α value, p-value also gives us some idea of the strength of the evidence against Ho.

Lesson 9.2 - Statistical Test for Population Mean

Unit Summary

  • Null and Alternative Hypothesis for Testing a Population Mean
  • Performing a t-Test by Rejection Region Approach
  • Performing a t-Test by the P-Value Approach
  • Using a Confidence Interval to Draw a Conclusion About a Two-Tailed Test

 

reading assignmentReading Assignment
An Introduction to Statistical Methods and Data Analysis, chapters 5.6 and 5.7.

 

Null and Alternative Hypothesis for Testing a Population Mean

When the parameter that we want to test is the population mean, the test statistic for the population mean when σ is unknown has a t-distribution.

One-sample t-Test for the Population Mean μ

Step 0. Conditions for the one-sample t-test to be valid for testing one population mean:

Data follows a normal distribution or the sample size is large.

Step 1. Set up the hypotheses as one of:

Two-tailed   Right-tailed   Left-tailed
Ho : μ = μ0
OR
Ho : μ = μ0
OR
Ho : μ = μ0
Ha : μ ≠ μ0   Ha : μ > μ0   Ha : μ < μ0

Step 2. Decide on the significance level, α .

Step 3. Compute the value of the test statistic with the one sample t-test:

\[t=\frac{\bar{x}-\mu_0}{s/\sqrt{n}}\]

Step 4. Find the appropriate critical values for the tests using the t-table. Write down clearly the rejection region for the problem. Alternatively, compute the p-value if you are using the p-value approach.

Step 5. Check to see if the value of the test statistic falls in the rejection region. If it does, then reject Ho (and conclude Ha). If it does not fall in the rejection region, do not reject Ho. Alternatively, compare the p-value to α if you are using the p-value approach. If p-value ≤ α, reject Ho (and conclude Ha). If the p-value > α , do not reject Ho.

Step 6. State the conclusion in words.

Performing a t-Test by Rejection Region Approach

The mean length of certain construction lumber is supposed to be 8.5 feet. A random sample of 81 pieces of such lumbers gives a sample mean of 8.3 feet and a sample standard deviation of 1.2 feet. A builder claims that the mean of the lumber is different from 8.5 feet. Does the data support the builder's claim at α = 0.05? Use the rejection region approach.

Step 0. Can we use the one-sample t-test?

The answer is yes since the sample size is 81. We don't need to check to see if the data follow a normal distribution.

Note: It is also okay to use a one-sample z-test for population mean here as suggested by our textbook, since Z is close to t when sample size is large.

Step 1. Set up the hypotheses:

Ho : μ = 8.5
Ha : μ ≠ 8.5

Step 2. Decide on the significance level, α.

α = 0.05

Step 3. Compute the value of the test statistic:

\[t=\frac{\bar{x}-\mu_0}{s/\sqrt{n}}=\frac{8.3-8.5}{1.2/\sqrt{81}}=-1.5\]

The observed value of the t statistic is -1.5. We can denote it as t*.

Step 4. Find the appropriate critical values for the tests using the t-table. Write down clearly the rejection region for the problem.

Since n = 81, degrees of freedom = 80, and the critical values are ± t0.025. The value from Minitab is t0.025 = 1.99.

The critical values are ±1.99 and the rejection region for the two-tailed test is given by:

t > 1.99 or t < -1.99.

Step 5. Check whether the value of the test statistic falls in the rejection region.

Since -1.5 does not fall in the rejection region, we cannot reject Ho at α = 0.05.

Step 6. State the conclusion in words:

At α = 0.05, the data does not provide sufficient evidence to conclude that the mean length of the construction lumber is different from 8.5 feet.

Performing a t-Test by the P-Value Approach

Since the t-table is not as detailed as the z-table, we can only estimate the p-value of a t-test using the t-table. In order to obtain the exact p-value of a t-test, one has to use a statistical package such as Minitab.

Minitab logo Minitab commands to obtain P(t > t*):

  1. Calc > Probability Distributions > t-distribution
  2. Choose the cumulative distribution to find P(tt*) and obtain P(t > t*) as 1 - P(tt*).

Use the p-value approach to draw a conclusion from the previous example.

Since it is a two-tailed test, the p-value is:

p-value = 2 · P(t > |-1.5|) = 2 · P(t > 1.5)

Minitab t distribution dialog box

The output of Minitab will give the value:

Cumulative Distribution Function

Student's t distribution with 80 DF

x
P ( X ≤ x )
1.5
0.931225

p-value = 2 · P(t > 1.5) = 2 · (1 - P(t 1.5))

= 2 · (1 - 0.931225) = 0.13755

Since the computed p-value is larger than 0.05, we cannot reject the null hypothesis at level 0.05.

Minitab Movie icon Click on the 'Minitab Movie' icon to view a display of 'Comparing a t-Value to a Critical t-Value'.

Using a Confidence Interval to Draw a Conclusion About a Two-tailed Test

For the two-tailed test:

Ho : μ = μ0
Ha : μ ≠ μ0

The null hypothesis will be rejected at level α if and only if the value μ0 does not fall within the (1 - α) confidence interval for μ .

Let's use the previous example to show how to use a confidence interval to draw a conclusion about a two-tailed test. A 95% confidence interval for the mean lumber length is:

(8.03, 8.57)

For the two-tailed test:

Ho : μ = 8.5
Ha : μ ≠ 8.5

Since 8.5 falls within the 95% confidence interval, we cannot reject the null hypothesis at level 0.05.

It is possible to use a one-sided confidence bound to draw a conclusion about a one-sided test, but you have to be very careful about obtaining the one-sided confidence bound.

Lesson 9.3 - Power and Sample Sizes Determination for Testing a Population Mean

Unit Summary

  • Why Do We Need to Compute the Power of a Test?
  • Power and Type II Error of a Test
  • Choosing the Sample Size for Testing Population Mean
  • Using Minitab to Perform a One-Sample t-Test and to Compute Power
  • Statistical and Practical Significances

reading assignmentReading Assignment
An Introduction to Statistical Methods and Data Analysis, chapter 5.5, and Minitab Help on sample size computation.

 

Why Do We Need to Compute the Power of a Test?

When the data indicate that one cannot reject the null hypothesis, does it mean that one can accept the null hypothesis? For example, when the p-value computed from the data is 0.12, one fails to reject the null hypothesis at α = 0.05. Can we say that the data support the null hypothesis?

Answer: When you perform hypothesis testing, you only set the size of Type I error and guard against it. Thus, we can only present the strength of evidence against the null hypothesis. One can sidestep the concern about Type II error if the conclusion never mentions that the null hypothesis is accepted. When the null hypothesis cannot be rejected, there are two possible cases: 1) one can accept the null hypothesis, 2) the sample size is not large enough to either accept or reject the null hypothesis. To make the distinction, one has to check β. If β at a likely value of the parameter is small, then one accepts the null hypothesis. If the β is large, then one cannot accept the null hypothesis.

The relationship between α and β :

If the sample size is fixed, then decreasing α will increase β . If one wants both to decrease, then one has to increase the sample size.

Power and Type II Error of a Test

Power = the probability of correctly rejecting a false null hypothesis = 1 - β .

Choosing the Sample Size for Testing Population Mean

Refer to page 218 (edition 5) or pg 243 (edition 6) of our textbook to see the graphs that show the probability of Type II error.

Usually, acceptable values of power are larger than 0.7. One usually sets the power to be 0.8 or 0.85. Again, the acceptable values of power depend on the problem just as the value of α depends on the problem.

The following are interrelated: Power (which is 1 - β), sample size, α , and the distance between the actual mean and the mean specified in the null hypothesis.

Using Minitab to Perform a One-Sample T-Test and to Compute Power

To calculate the smallest sample size needed for specified α , β , μaa is the likely value of μ at which you want to evaluate the power; μa is chosen subjectively to reflect the likely value of μ from the user's prior knowledge):

One-Tailed test:

\[n=\sigma^2 \frac{(t_\alpha + t_\beta)^2}{(\mu_0-\mu_a)^2}\]

Two-Tailed test:

\[n=\sigma^2 \frac{(t_{\alpha/2 }+ t_\beta)^2}{(\mu_0-\mu_a)^2}\]

Note: The above two formulas are included for your reference only. When you need to compute the sample size, you can simply use Minitab. The formula given in our book are the approximation to the above formula, replacing t by z.

Minitab logo Using Minitab to compute the sample size or the power:

Stat > power and sample size > 1-sample t

Note: The minimum difference referred to in Minitab is the difference between μ0 and μa.

Note: One-sample t-tests are used to perform hypothesis tests of the mean.  To calculate power or sample size for these tests, you need to determine the minimum difference (effect) that you consider to be meaningful.  Then, you can determine the power or the sample size you need to be able to refhect the null hypothesis when the true value differs from the hypothesized value by this minimum difference.

Power and sample size for 1-Sample t Minitab Help

Weight change in pounds of 14 female subjects after taking an exercise program for six weeks are recorded:

17
7
-4
-18
2
9
12
9
-12
-9
-18
-14
-18
-20

Is there sufficient evidence that the average weight change is different from 0? (set α = 0.05)

a. State the null and alternative hypothesis:

Ho : μ = 0
Ha : μ ≠ 0

b. Use Minitab to check whether the one-sample t-test may be used.

Now, the sample size is only 14 and thus we need to use the normal probability plot to check whether the data may come from a normal distribution.

Minitab > Graph > probability plot

probability plot of data

We can see that we can use the t-test since the normal probability plot indicates that there is no evidence to suggest that the data do not come from a normal distribution.

c. Use Minitab to perform the test and draw a conclusion using the p-value.

Stat > Basic Statistics > 1-Sample t

Dialog box items:

  • Variables: Select the column(s) containing the variable(s) that you want to perform the hypothesis test.
  • Test mean: Choose to perform a one-sample t-test by checkind the box for Perform hypothesis test; then specify the null hypothesis test value by entering this value into the text box for Hypothesized mean.  For this example, enter the value 0

Click on Options... in the Confidence Level text box, type your desired confidence level (for this example, use 95). In the Alternative hypothesis text box, select the desired alternative hypothesis from: mean ≠ hypothesized mean, mean < hypothesized mean, mean > hypothesized mean. For this example, select mean ≠ hypothesized mean.

One-Sample T: C1

Test of μ = 0 vs ≠ 0

Variable
N
Mean
StDev
SE Mean
C1
14
-4.07
13.08
3.50
         
Variable
95.0% CI
T
P
C1
( -11.62, 3.48)
-1.16
0.265

Using Minitab, we see that the observed t-value is -1.16 and the p-value is 0.265 which is greater than α = 0.05. We conclude that we cannot reject the null hypothesis.

There are two possible reasons for the failure of rejection of the null hypothesis:

  1. the null hypothesis is reasonable, or
  2. there's an insufficient sample size to achieve a powerful test.

We do not yet know which one is the real reason and thus proceed to compute the power of the test.

d. Use Minitab to compute the power (1 - β) of the test at the likely value μa = -5.0. Based on the computed power, would you accept the null hypothesis?

n = 14, α = 0.05

Difference to detect is = 0 - (-5) = 5.

Using Minitab > Stat > power and sample size, we find that power = 0.2635. The power is very low and we cannot accept the null hypothesis since the possible Type II error is β = 1 - 0.2635 = 0.7365. The possible Type II error is too high.

power

e. Use Minitab to find how large a sample size is needed. Suppose we want α = 0.05, power = 0.8, and the minimum detectable difference = 5?

From the Minitab output of the one-sample t-test, we see that the standard deviation is 13.08. We can thus estimate σ by 13.08 for the sample size computation problem:

Mintab dialog box for Power and Sample size for 1-sample t

The answer is given by:

Power and Sample Size

1-Sample t Test

Testing mean = null (versus no = null)
Calculating power for mean = null + difference
Alpha = 0.05 Sigma = 13.08

 
Sample
Target
Actual
Difference
Size
Power
Power
5
56
0.8000
0.8024

Thus, we get that 56 samples need to be collected in order to draw meaningful results about the hypothesis testing problem.

Statistical and Practical Significances

Words of Caution: Critics of hypothesis-testing procedures have observed that a population mean is rarely exactly equal to the value in the null hypothesis and hence, by obtaining a large enough sample, virtually any null hypothesis can be rejected. Thus, it is important to distinguish between statistical significance and practical significance.

Statistical significance is concerned with whether an observed effect is due to chance and practical significance means that the observed effect is large enough to be useful in the real world.

Last Words from Minitab:

Power & Sample Size Tools   

Gathering data is like tasting fine wine—you need the right amount. With wine, too small a sip keeps you from accurately assessing a subtle bouquet, but too large a sip overwhelms the palate.

We can’t tell you how big a sip to take at a wine-tasting event, but when it comes to collecting data, Minitab Statistical Software’s Power and Sample Size tools can tell you how much data you need to be sure about your results.  

Lesson 9 - Homework

Practice Problems:

1.   A dealer in recycled paper places empty trailers at various sites. The trailers are gradually filled by individuals who bring in old newspapers and magazines, and are picked up on several schedules. One such schedule involves pickup every second week. This schedule is desirable if the average amount of recycled paper is more than 1,600 cubic feet per 2-week period. The dealer’s records for eighteen 2-week periods show the following volumes (in cubic feet) at a particular site (recycled_paper.txt) where \(\hat{y}\) = 1,718.3 and s = 137.8.

a.  Place a 95% confidence interval of  \(\mu\).

b.  Compute the p-value for the test statistic. Is there strong evidence that \(\mu\) is greater than 1,600?

2.   The undergraduate GPA of 18 students from a large MBA class of 800 students is selected. The data are given as (mba_student_gpa.txt).

Use the data in the file above to test the research hypothesis that the average undergraduate GPA of the MBA class is more than 3.5. Give the level of significance of your test. Use the p-value approach to perform the test at a default level of significance.

 

 solutions logo for Practice Problems

ASK!  If you have a question about any part of these practice problems, please post your question to the discussion forum in ANGEL.


Homework Problems to Submit

Now, find Homework 9 in the Homework Assignments folder ANGEL and submit it to the Dropbox by the due date.

If there are data referred to in the homework problems, please find these data files in the Datasets folder in ANGEL.