6.4 - Summary Points for Logistic Regression

Logit models represent how binary (or multinomial) response variable is related to a set of explanatory variables, which can be discrete and/or continuous. In this lesson we focused on Binary Logistic Regression. Below is a brief summary and link to Log-Linear and Probit models.

Summary Points for Logistic Regression

For a more detailed discussion refer to Agresti (2007), Ch.3, Agresti (2013), Ch.4, (pages 115-118, 135-132), and/or McCullagh & Nelder (1989).

They are related in a sense that the loglinear models are more general than logit models, and some logit models are equivalent to certain loglinear models (e.g. consider the admissions data example or boys scout example).

The Link between Logit and Probit Models

Both model how binary response variable depends on a set of explanatory variable They have the same:

But they differ in the link function.

The logistic regression model

\(\text{logit}(\pi(x))=\text{log} \left(\dfrac{\pi(x)}{1-\pi(x)}\right)=\beta_0+\beta x\)

uses the logistic cumulative distribution function (cdf).

The probit model

\(\text{probit}(\pi(x))=\beta_0+\beta x\)

uses normal cdf

\(\text{probit}(\pi)=F^{-1}(X \leq x)\)

that is the inverse of the standard normal distribution. For example, probit(0.975) = 1.96, probit(0.950) = 1.64, and probit(0.5) = 0.

Fitted values between these two models are often very similar. Rarely does one of these models fit substantially better (or worse) than the other, although more difference can be observed with sparse data.

Why does this work?

Think back to intro statistics classes and approximating binomial distribution with normal.

More on probit models see Agresti (2007), Section 3.2.4, or Agresti (2013), Section 6.6.

Some additional references: