Lesson 6: Logistic Regression

Printer-friendly versionPrinter-friendly version

Thus far our focus has been on describing interactions or associations between two or three categorical variables mostly via single summary statistics and with significance testing. From this lesson on, we will focus on modeling. Models can handle more complicated situations and analyze the simultaneous effects of multiple variables, including mixtures of categorical and continuous variables. In Lesson 6 and Lesson 7, we study the binary logistic regression, which we will see is an example of a generalized linear model.

Binary Logistic Regression is a special type of regression where binary response variable is related to a set of explanatory variables, which can be discrete and/or continuous. The important point here to note is that in linear regression, the expected values of the response variable are modeled based on combination of values taken by the predictors. In logistic regression Probability or Odds of the response taking a particular value is modeled based on combination of values taken by the predictors. Like regression (and unlike log-linear models that we will see later), we make an explicit distinction between a response variable and one or more predictor (explanatory) variables. We begin with two-way tables, then progress to three-way tables, where all explanatory variables are categorical. Then we introduce binary logistic regression with continuous predictors as well. In the last part we will focus on more model diagnostics and model selection.

Logistic regression is applicable, for example, if:

  • we want to model the probabilities of a response variable as a function of some explanatory variables, e.g. "success" of admission as a function of gender.
  • we want to perform descriptive discriminate analyses such as describing the differences between individuals in separate groups as a function of explanatory variables, e.g. student admitted and rejected as a function of gender
  • we want to predict probabilities that individuals fall into two categories of the binary response as a function of some explanatory variables, e.g. what is the probability that a student is admitted given she is a female
  • we want to classify individuals into two categories based on explanatory variables, e.g. classify new students into "admitted" or "rejected" group depending on their gender.

Key Concepts

  • Generalized Linear Models
  • Binary Logistic Regression for 2 × 2 and 2 × J tables
  • Binary Logistic Regression for 2 × I × J tables and k-way tables
  • Model Fit and Parameter Estimation & Interpretation
  • Link to test of independence and log-linear model of independence
  • Multiple Logistic Regression with categorical explanatory variables
  • Multiple Binary Logistic Regression with a combination of categorical and continuous predictors
  • Model diagnostics

Objectives

  • Understand the basic ideas behind modeling categorical data with binary logistic regression.
  • Understand how to fit the model and interpret the parameter estimates, especially in terms of odds and odd ratios.
  • Understand the basic ideas behind modeling binary response as a function of two or more categorical explanatory variables.
  • Understand the basic ideas behind modeling binary response as a function of continuous and categorical explanatory variables.

Useful Links

Readings

  • Agresti (2007) Ch 3, Sec 3.1-3.2, Ch. 4, Ch. 5
  • Agresti (2013) Ch 4, Sec 4.1-4.2, Ch. 5, Ch. 6; more advanced Ch 4, Sec 4.4-4.7, and Ch. 7

To complete this lesson you should :

  • Read the online course material
  • Read suggested textbook pages
  • Complete the Discussion questions and exercises placed throughout the online Lesson 6 material