We can use statistical inference (i.e. hypothesis testing) to draw conclusions about how the population of y-values relates to the population of x-values, based on the sample of x and y values. Our model extends beyond a simple straight-line summary as we include a parameter for the natural variation about the regression line as seen in real-life relationships. For each x the regression line will tell us on the average how a population of y-values would react. We call these y-values the mean response. Naturally we would expect some variation (i.e. not all of the same response for a given x. Think of not all people of the same height having the same weight) above and below the mean response. The equation E(Y) = Βo + Β1 describes this population relationship. For any given x-value the mean y-value should be E(Y) = Βo + Β1. [NOTE: Some texts will use the notation uy in place of E(Y). These notations are read, respectively, as "The mean of y" and "The expectation of y". Both interpretations have the same meaning.] There are some assumptions, however, that come with this analysis.
Returning to our output for the final exam data, we can conduct an hypothesis test of slope of the regression line using t-test methods to test the following hypothesis:
H0 : Β1 = 0 Ha : Β1 ≠ 0
From this output we concern ourselves with the second row under Predictor as the first row, Constant is not relevant for our purposes. This second Predictor row shows our estimate for Β as 0.7513; standard error \(SE_\beta\) of 0.1414; a t-value of 5.31; and p-value of 0.000 or approximately 0. Since our test in Minitab is whether the true slope is zero or not zero we are conducting a two-sided hypothesis test. In general, the t test statistic is found by \((\beta-0)/SE_\beta\), or in this example t = 0.7513/0.1414 = 5.31 The p-value is found by doubling the probability P(T ≥ |t|). In this example since the p-value is less than our standard alpha value of 0.05 we would reject H0 and decide that the true slope does differ significantly from zero. We would then conclude that Quiz Average is a significant predictor of scores on the Final exam.
We now turn to questions 2 and 3 regarding estimating both a population mean response and individual response to a given x-value denoted as \(x^*\). Formulas exist and can be found in various texts, but we will use Minitab for calculations. Keep in mind that when estimating in statistics we typically are referring to the use of confidence intervals. That will be the case here as well as we will use Minitab to calculate confidence intervals for these estimates.
Sticking with the exam data, what would be a 95% confidence interval for the mean Final exam score for all students with a Quiz Average of 84.44 and what would be a 95% prediction interval for the Final exam score of a particular student with an 84.44 Quiz Average? To use Minitab or SPSS we follow our initial regression steps but with a few additions:
The confidence interval which can be found in the output window under the heading 95% CI in Minitab and in the SPSS data spreadsheet under the LMCI and UMCI headings is (72.79, 78.33). This is interpreted as "We are 95% confident that the true mean Final exam score for students with an 84.44 Quiz Average is between 72.79% and 78.33%.
The prediction interval which can be found in the output window under the heading 95% PI in Minitab and in the SPSS data spreadsheet under the LICI and UICI headings is (55.84, 95.28). This is interpreted as "We are 95% confident that the true Final exam score for a student with an 84.44 Quiz Average is between 55.84% and 95.28%.
You should notice that the confidence interval is narrower than the prediction interval which makes sense intuitively. Since the confidence interval is estimating an average or mean response for all students with this quiz average you should expect that to be more precise than the prediction of the exact Final score for just one student.