Lesson 5: Regression Shrinkage Methods
|Key Learning Goals for this Lesson:|
Textbook reading: Section 6.2: Shrinkage Methods
- Linear regression: $E(Y_j | X) = X \beta$;
- Or for a more general regression function: $E(Y_j | X) = f (X)$;
- In a prediction context, there is less concern about the values of the components of the right hand side, rather interest is on the total contribution.
- The driving force behind variable selection:
- The desire for a parsimonious regression model (one that is simpler and easier to interpret);
- The need for greater accuracy in prediction.
- The notion of what makes a variable "important" is still not well understood, but one interpretation (Breiman, 2001) is that a variable is important if dropping it seriously affects prediction accuracy.
- Selecting variables in regression models is a complicated problem, and there are many conflicting views on which type of variable selection procedure is best, e.g. LRT, F-test, AIC, and BIC.
There are two main types of stepwise procedures in regression:
Backward elimination: eliminate the least important variable from the selected ones.
Forward selection: add the most important variable from the remaining ones.
A hybrid version that incorporates ideas from both main types: alternates backwards and forwards steps, and stops when all variables have either been retained for inclusion or removed.
Criticisms of Stepwise Methods:
There is no guarantee that the subsets obtained from stepwise procedures will contain the same variables or even be the "best" subset.
When there are more variables than observations (p > n), backward elimination is typically not a feasible procedure.
The maximum or minimum of a set of correlated F statistics is not itself an F statistic.
It produces a single answer (a very specific subset) to the variable selection problem, although several different subsets may be equally good for regression purposes.
The computing is easy by the use of R function step() or regsubsets(). However, to specify a practically good answer, you must know the practical context in which your inference will be used.
Scott Zeger on 'how to pick the wrong model': Turn your scientific problem over to a computer that, knowing nothing about your science or your question, is very good at optimizing AIC, BIC, ...