Lesson 5: Multiple Linear Regression

Printer-friendly versionPrinter-friendly version

Introduction

In this lesson, we make our first (and last?!) major jump in the course. We move from the simple linear regression model with one predictor to the multiple linear regression model with two or more predictors. That is, we use the adjective "simple" to denote that our model has only predictor, and we use the adjective "multiple" to indicate that our model has at least two predictors.

In the multiple regression setting, because of the potentially large number of predictors, it is more efficient to use matrices to define the regression model and the subsequent analyses. This lesson considers some of the more important multiple regression formulas in matrix form. If you're unsure about any of this, it may be a good time to take a look at this Matrix Algebra Review.

The good news!

The good news is that everything you learned about the simple linear regression model extends — with at most minor modification — to the multiple linear regression model. Think about it — you don't have to forget all of that good stuff you learned! In particular:

  • The models have similar "LINE" assumptions. The only real difference is that whereas in simple linear regression we think of the distribution of errors at a fixed value of the single predictor, with multiple linear regression we have to think of the distribution of errors at a fixed set of values for all the predictors. All of the model checking procedures we learned earlier are useful in the multiple linear regression framework, although the process becomes more involved since we now have multiple predictors. We'll explore this issue further in Lesson 7.
  • The use and interpretation of r2 (which we'll denote R2 in the context of multiple linear regression) remains the same. However, with multiple linear regression we can also make use of an "adjusted" R2 value, which is useful for model building purposes. We'll explore this measure further in Lesson 10.
  • With a minor generalization of the degrees of freedom, we use t-tests and t-intervals for the regression slope coefficients to assess whether a predictor is significantly linearly related to the response, after controlling for the effects of all the opther predictors in the model.
  • With a minor generalization of the degrees of freedom, we use prediction intervals for predicting an individual response and confidence intervals for estimating the mean response. We'll explore these further in Lesson 7.

Learning objectives and outcomes

Upon completion of this lesson, you should be able to do the following:

  • Know how to calculate a confidence interval for a single slope parameter in the multiple regression setting.
  • Be able to interpret the coefficients of a multiple regression model.
  • Understand what the scope of the model is in the multiple regression model.
  • Understand the calculation and interpretation of R2 in a multiple regression setting.
  • Understand the calculation and use of adjusted R2 in a multiple regression setting.