# 9.7 - Comparing Two Groups Using Software

For the following examples use either of the following datasets: Course_Survey.MTW or Course_Survey.XLS

About the data set: A few semesters ago, all students registered for STAT200 at University Park were asked to complete a survey. A total of 1004 students responded. If we assume that this sample represents the PSU-UP undergraduate population, then we can make inferences regarding this population based on the survey results.

### Independent Proportions

**Question 1:** Would you date someone with a great personality even if you did not find them attractive?

Hypotheses Statements: What would be the correct hypothesis to determine if there is a difference in gender between the true percentages that would date someone with a great personality even if they did not find them attractive among PSU-UP undergraduate students?

\(H_{0} : p_1 - p_2 =0\)

\( H_{a} :p_1 - p_2 \neq 0 \)

To perform a two proportion hypothesis test in Minitab:

- Open Minitab data set
- Go to Stat > Basic Stat > 2- proportions
- Click the radio button for Samples in One Column (this is the default)
- Click the text box for Samples (cursor should be in this box)
- Select from the variables list the variable DatePerly (be sure the variable DatePerly appears in the text box)
- Enter Gender in the text box for Subscripts
- Click Options and select the correct Alternative (e.g. not equal to); enter the correct Test Proportion value (default is 0.0 which is correct for this example); if using Version 15 in Minitab check the box for Used Pooled Estimate of p - remember this is only available in Version 15 or higher and is used when the test proportion is 0.
- Click OK twice

This should result in the following output:

To perform a two proportion hypothesis test in Minitab Express:

- Open the Course_Survey.MTW data set.
- From the menu bar, select Statistics > Two Samples > Proportions.
- Double-click on the variable
*DatePerly*to insert it into the "Samples" box. - Double-click on the variable
*Gender*to insert it into the "Sample IDs" box. - Click on the options tab and verify that the alternative hypothesis is "Difference ≠ 0" and the confidence level is 95.
- Change the test method to "Use the pooled estimate of the proportion".
- Click OK.

This should result in the following output:

**Conclusion and Decision**: Since the *p*-value is less than 0.05 we would reject the null hypothesis and conclude that there is statistical evidence that a difference exists between the true proportion of female students who would date someone with a great personality if not attracted to them and the true proportion of males who would do so.

**Confidence Interval interpretation**: We are 95% confident that the true difference in gender between the percentages that would date someone with a great personality even if they did not find them attractive among PSU-UP undergraduate students undergraduate students is between 24.1% inches and 36.0%.

About the output:

- Difference is given as
*p*(female) −*p*(male) indicating that the difference is calculated by taking Female minus Male. If you wanted the reverse, then in Minitab you would have to recode the Gender variable to 0 and 1 where the 1 would represent Female. - The value of 0.300931 found in the Estimate for Difference would be the sample statistic used to build the confidence interval. That is, we would take this value and then add and subtract from it the margin of error.
- Since the confidence interval results in an interval that contains all positive values and we took Female minus Male, we would conclude Female PSU−UP undergraduates are more likely than their male counterparts to date someone with a great personality even if they did not find them attractive.
- The Event = Yes indicates that the Yes response was the "success" of interest. If No was of concern, then we would have to recode these responses to 0 and 1 where 1 would represent No.
- The z value of 9.45 is the test statistic one would use to find the
*p*value. This test statistic is found by taking the sample statistic (i.e. the estimate difference) minus the hypothesize value of 0 (see that Test of Difference equals 0) and dividing by the standard error.

### Paired Means

**Question 2:** What are the measurements of the span of your right and left hand?

Hypotheses Statements: What would be the correct hypothesis to determine if mean right hand spans differ than mean left hand spans among PSU-UP undergraduate students?

\(H_0 : \mu_d =0\)

\(H_a : \mu_d \neq 0\)

To perform a matched pairs hypothesis test in Minitab:

- Open Minitab data set.
- Go to Stat > Basic Stat > Paired-t.
- Click the radio button for Samples in Column (this is the default).
- Click the text box for First Sample (cursor should be in this box).
- Select from the variables list the variable Rspan (be sure the variable Rspan appears in the text box) and then in Second Sample enter the variable Lspan.
- Click Options and select the correct Alternative (e.g. not equal to); enter the correct Test Mean value (default is 0.0 which is correct for this example.
- Click OK twice.

This should result in the following output:

To perform a matched pairs hypothesis test in Minitab Express:

- Open the Course_Survey.MTW data set.
- From the menu bar, select Statistics > Two Samples > Paired t.
- Double-click the variable
*Rspan*to insert it into the "Sample 1" box. - Double-click the variable
*Lspan*to insert it into the "Sample 2" box. - Click the option tab, and verify that your alternative hypothesis is "Mean difference ≠ 0", and that your confidence level is 95.
- Click OK.

This should result in the following output:

**Conclusion and Decision**: Since the *p*-value of 0.068 is greater than 0.05 we would not reject the null hypothesis. We do not have enough statistical evidence to say that, on average, the true mean length of right hand spans differs from the true mean length of left hand spans for PSU−UP undergrads.

**NOTE: This was a two-sided test since we used "not equal" in the alternative hypothesis (also see that minitab says "not = 0"). If the research interest was to show that on average right-hand spans were longer than left-hand spans then our new H_{a} would use > and we need to divide this p-value by 2. In then next example we show how to use minitab to conduct such one-sided hypothesis tests.**

**Confidence Interval interpretation**: We are 95% confident that the true mean difference between right hand spans and left hand spans for PSU−UP undergraduate students is between −0.0035 inches and 0.0998 inches.

About the output:

- Based on the text Paired T for Rspan − Lspan the difference is found by taking the right hand spans minus the left hand spans.
- The value of the Mean found in the row named Difference would be the sample statistic used to build the confidence interval. That is, we would take this value and then add and subtract from it the margin of error.
- the value in the Difference row under SE Mean is the Standard Error of the Mean and is calculated by taking the Standard Deviation found in that Difference row (0.798840) and dividing by the square root of the number of differences.
- Since the confidence interval results in an interval that contains zero, we would conclude that no difference exists between the means. This result should concur with our hypothesis result as long as the alpha value used for the test corresponds to the level of confidence (i.e. alpha of 0.05 corresponds to a 95% level of confidence; alpha of 0.10 would correspond to a 90% level of confidence).
- The T-value of 1.83 is the test statistic one would use to find the
*p*-value. This test statistic is found by taking the sample statistic (i.e. the estimate difference) − the hypothesize value of 0 (see that Test of Difference equals 0) and dividing by the standard error (SE Mean of 0.026308).

### Independent Means

**Question 3:** Some students who belong to Greek organizations (e.g. fraternities and sororities) believe that they do not drink anymore or less than non-Greek-organization students. However, the university administration does not agree, and wants to show that on average, the population of Greek students drink more more frequently during any given month than their non-Greek counterparts. What would be the correct hypothesis to determine if the administration is correct? Assume that non-Greeks are group 1 and Greeks are group 2.

\(H_0 : \mu_1 - \mu_2 \geq 0\)

\(H_a : \mu_1 - \mu_2 < 0\)

To perform a two sample mean hypothesis test in Minitab:

- Open Minitab data set.
- Go to Stat > Basic Stat > 2-Sample t.
- Click the radio button for Samples in One Column (this is the default).
- Click the text box for Samples (cursor should be in this box).
- Select from the variables list the variable DaysAlco (be sure the variable DaysAlco appears in the text box).
- Enter Greek in the text box for Subscripts.
- Click check box for Assume Equal Variances (we can verify this with the output).
- Click Options and select the correct Alternative (less than for this example); enter the correct Test Difference value (default is 0.0 which is correct for this example).
- Click OK twice.

This should result in the following output:

To perform a two sample mean hypothesis test in Minitab Express:

- Open the Course_Survey.MTW data set.
- From the menu bar, select Statistics > Two Samples > t.
- Double-click on the variable
*DaysAlco*to insert it into the "Samples" box. - Double-click on the variable
*Greek?*to insert it into the "Sample IDs" box. - Click the options tab, and change the alternative hypothesis to "Difference < 0".
- Verify that the confidence level is 95, and check the box to assume equal variances.
- Click OK.

This should result in the following output:

**Conclusion and Decision**: Since the p-value is approximately 0.000 (we do not actually state that the p-value is 0) and is less than 0.05 we would reject the null hypothesis and conclude that there is statistical evidence that on average, the greek population drinks more days per month than the non-greeks at PSU-UP.

**Confidence Interval interpretation**: We are 95% confident that for PSU−UP the true mean difference between the number of days per month non−Greek−organization students drink compared to their Greek counterparts is more than 3 days.

About the output:

- Difference is given as μ(No) − μ(Yes) indicating that the difference is calculated by taking No minus Yes for those responding to whether there belonged to a Greek organization.
- The value of −3.712 found in the Estimate for Difference would be the sample statistic used to build the confidence interval. That is, we would take this value and then
**only add**the margin of error. We only add since we are conducting a one−sided test of hypothesis and this side is for less than. The 95% upper bound provides the upper limit to our confidence interval and combining our alternative hypothesis implies that our estimated true mean difference is no greater (i.e. the true mean difference is 3.003 days or more. If this seems confusing consider if you reversed the order subtracting non−Greeks from Greeks. The results would be the same except the bound would be positive and would represent the lower bound. The interpretation then might seem more clear). - the value in the Difference row under SE Mean is the Standard Error of the Mean and is calculated by taking the Standard Deviation found in that Difference row (0.798840) and dividing by the square root of the number of differences.
- The t value of −8.61 is the test statistic one would use to find the
*p*value. This test statistic is found by taking the sample statistic (i.e. the estimate difference) − the hypothesized value of 0 (see that Test of Difference equals 0) and dividing by the standard error. - The both use pooled standard devaitions = 5.3574 indicates that the pooled variance assumption was used which makes sense since the ratio between the two standard deviations, 5.52 and 4.48, is not greater than 2.