Lesson 5: Regression Shrinkage Methods

Key Learning Goals for this Lesson:
  • Understand and know how to perform ridge regression.
  • Understand and know how to perform LASSO.

Textbook reading: Section 6.2: Shrinkage Methods


  • Linear regression: $E(Y_j | X) = X \beta$ ;
  • Or for a more general regression function: $E(Y_j | X) = f (X)$;
  • In a prediction context, there is less concern about the values of the components of the right hand side, rather interest is on the total contribution.

Variable Selection:

  • The driving force behind variable selection:
  • The desire for a parsimonious regression model (one that is simpler and easier to interpret);
  • The need for greater accuracy in prediction.
  • The notion of what makes a variable "important" is still not well understood, but one interpretation (Breiman, 2001) is that a variable is important if dropping it seriously a ffects prediction accuracy.
  • Selecting variables in regression models is a complicated problem, and there are many conflicting views on which type of variable selection procedure is best, e.g. LRT, F-test, AIC, and BIC.

There are two main types of stepwise procedures in regression:

  • Backward elimination: eliminate the least important variable from the selected ones.

  • Forward selection: add the most important variable from the remaining ones.

  • A hybrid version that incorporates ideas from both main types: alternates backwards and forwards steps, and stops when all variables have either been retained for inclusion or removed.

Criticisms of Stepwise Methods:

  • There is no guarantee that the subsets obtained from stepwise procedures will contain the same variables or even be the "best" subset.

  • When there are more variables than observations (p > n), backward elimination is typically not a feasible procedure.

  • The maximum or minimum of a set of correlated F statistics is not itself an F statistic.

  • It produces a single answer (a very specifi c subset) to the variable selection problem, although several diff erent subsets may be equally good for regression purposes.

  • The computing is easy by the use of R function step() or regsubsets(). However, to specify a practically good answer, you must know the practical context in which your inference will be used.

    Scott Zeger on 'how to pick the wrong model': Turn your scientifi c problem over to a computer that, knowing nothing about your science or your question, is very good at optimizing AIC, BIC, ...