Lesson 7: MLR Estimation, Prediction & Model Assumptions

Printer-friendly versionPrinter-friendly version

Introduction

This lesson extends the methods from Lesson 3 and 4 to the context of multiple linear regression. Thus, we focus our efforts on using a multiple linear regression model to answer two specific research questions, namely:

  • What is the average response for a given set of values of the predictors x1, x2, ...?
  • What is the value of the response likely to be for a given set of values of the predictors x1, x2, ...?

In particular, we will learn how to calculate and interpret:

  • A confidence interval for estimating the mean response for a given set of values of the predictors x1, x2, ....
  • A prediction interval for predicting a new response for a given set of values of the predictors x1, x2, ....

How do we evaluate a model? How do we know if the model we are using is good? One way to consider these questions is to assess whether the assumptions underlying the multiple linear regression model seem reasonable when applied to the dataset in question. Since the assumptions relate to the (population) prediction errors, we do this through the study of the (sample) estimated errors, the residuals.

Learning objectives & outcomes

Upon completion of this lesson, you should be able to do the following:

  • Distinguish between estimating a mean response (confidence interval) and predicting a new observation (prediction interval).
  • Understand the various factors that affect the width of a confidence interval for a mean response.
  • Understand why a prediction interval for a new response is wider than the corresponding confidence interval for a mean response.
  • Know the formula for a prediction interval depends strongly on the condition that the error terms are normally distributed, while the formula for the confidence interval is not so dependent on this condition for large samples.
  • Know the types of research questions that can be answered using the materials and methods of this lesson.
  • Understand why we need to check the assumptions of our model.
  • Know the things that can go wrong with the linear regression model.
  • Know how we can detect various problems with the model using a residuals vs. fits plot.
  • Know how we can detect various problems with the model using a residuals vs. predictor plot.
  • Know how we can detect a certain kind of dependent error terms using a residuals vs. order plot.
  • Know how we can detect non-normal error terms using a normal probability plot.
  • Apply some numerical tests for assessing model assumptions.