7.1  Introduction to Hypothesis Testing
Unit Summary 

Reading Assignment
An Introduction to Statistical Methods and Data Analysis, (See Course Schedule).
Hypothesis Testing
The second type of inference method  confidence intervals was the first, is hypothesis testing. A hypothesis, in statistics, is a statement about a population where this statement typically is represented by some specific numerical value. In testing a hypothesis, we use a method where we gather data in an effort to gather evidence about the hypothesis. In hypothesis testing there are certain steps one must follow. Below these are summarized into six such steps to conducting a test of a hypothesis.
1. Setting up two competing hypotheses  Each hypothesis test includes two hypothesis about the population. One is the null hypothesis, notated as H_{o}, which is a statement of a particular parameter value. This hypothesis is assumed to be true until there is evidence to suggest otherwise. The second hypothesis is called the alternative, or research, hypothesis, notated as H_{a}. The alternative hypothesis is a statement of a range of alternative values in which the parameter may fall. One must also check that any assumptions (conditions) needed to run the test have been satisfied e.g. normality of data, independence, and number of success and failure outcomes.
2. Set some level of significance called alpha. This value is used as a probability cutoff for making decisions about the null hypothesis. As we will learn later, this alpha value represents the probability we are willing to place on our test for making an incorrect decision in regards to rejecting the null hypothesis. The most common alpha value is 0.05 or 5%. Other popular choices are 0.01 (1%) and 0.1 (10%).
3. Calculate a test statistic. Gather sample data and calculate a test statistic where the sample statistic is compared to the parameter value. The test statistic is calculated under the assumption the null hypothesis is true, and incorporates a measure of standard error and assumptions (conditions) related to the sampling distribution. Such assumptions could be normality of data, independence, and number of success and failure outcomes.
4. Calculate probability value (pvalue), or find rejection region  A pvalue is found by using the test statistic to calculate the probability of the sample data producing such a test statistic or one more extreme. The rejection region is found by using alpha to find a critical value; the rejection region is the area that is more extreme than the critical value.
5. Make a test decision about the null hypothesis  In this step we decide to either reject the null hypothesis or decide to fail to reject the null hypothesis. Notice we do not make a decision where we will accept the null hypothesis.
6. State an overall conclusion  Once we have found the pvalue or rejection region, and made a statistical decision about the null hypothesis (i.e. we will reject the null or fail to reject the null). Following this decision, we want to summarize our results into an overall conclusion for our test.
Hypotheses and Test Statistics
We will continue our discussion by considering two specific hypothesis tests: a test of one proportion, and a test of one mean. We will provide the general set up of the hypothesis and the test statistics for both tests. From there, we will branch off into specific discussions on each of these tests.
In order to make judgment about the value of a parameter, the problem can be set up as a hypothesis testing problem.
The Null and Alternative Hypothesis
We usually set the hypothesis that one wants to conclude as the alternative hypothesis, also called the research hypothesis.
There are three types of alternative hypotheses:
1. The population parameter is not equal to a certain value. Referred to as a "twosided test".
2. The population parameter is less than a certain value. Referred to as a "lefttailed test"
3. The population parameter is greater than a certain value. Referred to as a "righttailed test".
For all three alternatives, the null hypothesis is the population parameter is equal to that certain value.
Since hypothesis tests are about a parameter value, the hypotheses use parameter notation  p for proportion or \(\mu\) for mean  in their arrangement. For tests of a proportion or a test of a mean, we would choose the appropriate alternative based on our research question. Below are the possible alternative hypothesis from which we would select only one of them based on the research question. The symbols \(p_0\) and \(\mu_0\) are just used in these general statements. In practice, these get replaced by the parameter value being tested. The examples following will illustrate.
1. The population parameter is not equal to a certain value. Referred to as a "twotailed test".
\(H_a: p \ne p_0\), or \(H_a: \mu \ne \mu_0\)
2. The population parameter is less than a certain value. Referred to as a "lefttailed test"
\(H_a: p < p_0\), or \(H_a: \mu < \mu_0\)
3. The population parameter is greater than a certain value. Referred to as a "righttailed test".
\(H_a: p > p_0\), or \(H_a: \mu > \mu_0\)
The null hypothesis in each case would be:
\(H_0: p = p_0\), or \(H_0: \mu = \mu_0\)
When debating the State Appropriation for Penn State, the following question is asked: "Are the majority of students at Penn State from Pennsylvania?" To answer this question, we can set it up as a hypothesis testing problem and use data collected to answer it. This example is about a population proportion and thus we set up the hypotheses in terms of p. Here the value \(p_0\) is 0.5 since more than 0.5 constitute a majority. The hypthoses set up would be a righttailed test:
\(H_0: p = 0.5\) vs. \(H_a: p > 0.5\)
A consumer test agency wants to see the whether the mean lifetime of a brand of tires is less than 42,000 miles as the tire manufacturer advertises that the average lifetime is at least 42,000 miles. In this example, we are discussing a mean and therefore set up the hypotheses in terms of \(\mu\). Here the value of \(\mu_0\) is 42,000. With the consumer test agency wanting to research that the mean lifetime is below 42,000, we would set up the hypotheses as a lefttailed test:
\(H_0: \mu = 42,000\) vs. \(H_a: \mu < 42,000\)
The length of a certain lumber from a national home building store is supposed to be 8.5 feet. A builder wants to check whether the shipment of lumber she receives has a mean length different from 8.5 feet. In this example, we are discussing a mean and therefore set up the hypotheses in terms of \(\mu\). Here the value of \(\mu_0\) is 8.5. With the builder wanting to check if the mean length is different from 8.5, she would set up the hypotheses as a twotailed test:
\(H_0: \mu = 8.5\) vs. \(H_a: \mu \ne 8.5\)
A political news company believes the national approval rating for the current president has fallen below 40%. In this example, we are discussing a proportion and therefore will set up the hypothesis in terms of p. Here is the \(p_0\) value is 0.4 and the hypotheses would be set up as a lefttailed test:
\(H_0: p = 0.4\) vs. \(H_a: p < 0.4\)
Choosing the Null and Alternative Hypothesis
If the conditions necessary to conduct the hypothesis test are satistified, then we can use the formulas below to calculate the appropriate test statistic from our sample data. These assumptions and test statistics are as follows:
Test of One Proportion: the conditions are that \(np_0\) and \(n(1 p_0\)) are at least 5. If so, then the one proportion test statistic is:
\[Z^{*} = \frac{\hat{p}p_0}{\sqrt{\frac{p_0 (1p_0)}{n}}}\]
Test of One Mean: the condition is that the data satisfies the conditions similar to those used for constructing a tconfidence interval for the mean. Those were either the data comes from an approxmately normal distribution, or the sample size is large enough (at least 30), or a small sample size (less than 30) the data is not skewed or has outliers. If any of these conditions are satisfied, the we can calculate the following test statistic:
\[t^{*} = \frac{\bar{x}\mu_0}{S/\sqrt{n}}\]
NOTE  do not get too hung up on symbols. We just want to use a notation that helps to remind us that these values are a test statstic.
The Logic of Hypothesis Testing
How do we decide whether to reject the null hypothesis?
 If the sample data are consistent with the null hypothesis, then we do not reject it.
 If the sample data are inconsistent with the null hypothesis, but consistent with the alternative, then we reject the null hypothesis and conclude that the alternative hypothesis is true.
Referring back to the first example above, say we take a random sample of 500 Penn State students and find that 278 are from Pennsylvania. Can we conclude that the proportion is larger than 0.5?
Is 278/500 = 0.556 much bigger than 0.5? What is much bigger? This depends on the standard deviation of \(\hat{p}\) under the null hypothesis.
\[\hat{p}{p}_0 =0.5560.5=0.056\]
The standard deviation of \(\hat{p}\), if the null hypotheses is true (e.g. when \(p_0 = 0.5\)) is:
\[\sqrt{\frac{p_0 (1p_0)}{n}}=\sqrt{\frac{0.5 \cdot (10.5)}{n}}=\sqrt{\frac{0.5 \cdot (10.5)}{500}}\]
We can compare them by taking the ratio.
\[Z^{*} = \frac{\hat{p}p_0}{\sqrt{\frac{p_0 (1p_0)}{n}}}=\frac{0.5560.5}{\sqrt{\frac{0.5 \cdot (10.5)}{n}}}=2.504\]
In the lumber example above, the mean length of the lumber is supposed to be 8.5 feet. A builder wants to check whether the shipment of lumber she receives has a mean length different from 8.5 feet. If the builder observes that the sample mean of 61 pieces of lumber is 8.3 feet with a sample standard deviation of 1.2 feet. What will she conclude?
Is 8.3 very different from 8.5? This depends on the standard deviation of \(\bar{X}\):
\[t^{*} = \frac{\bar{x}\mu_0}{S/\sqrt{n}}=\frac{8.38.5}{1.2/\sqrt{61}}=1.3\]
Thus, we are asking if 1.3 is very far away from zero, since that corresponds to the case when \(\bar{X}\) is equal to \(\mu_0\). If it is far away, then it is unlikely that the null hypothesis is true and one rejects it. Otherwise, one cannot reject the null hypothesis.
How do we determine whether to reject the null hypothesis? It depends on the level of significance \(\alpha\) (step 2 of conducting a hypothesis test), and the probability the sample data would produce the observed result.