8.4 - The Proportional-Odds Cumulative Logit Model

Printer-friendly version

Cumulative-logit Models for Ordinal Responses

Proportional-odds cumulative logit model is possibly the most popular model for ordinal data. This model uses cumulative probabilities upto a threshold, thereby making the whole range of ordinal categories binary at that threshold. Let the response be Y=1,2,..., J where the ordering is natural. The associated probabilities are 1, π2,..., πJ}, and a cumulative probability of a response less than equal to j is:

$P(Y \leq j)=\pi_1+\ldots+\pi_j$

Then a cumulative logit is defined as

$\text{log}\left(\dfrac{P(Y \leq j)}{P(Y > j)}\right)=\text{log}\left(\dfrac{P(Y \leq j)}{1-P(Y \leq j)}\right)=\text{log}\left(\dfrac{\pi_1+\ldots+\pi_j}{\pi_{j+1}+\ldots+\pi_J}\right)$

This describes the log-odds of two cumulative probabilities, one less-than and the other greater-than type. This measures how likely the response is to be in category j or below versus in a category higher than j.

The sequence of cumulative logits may be defined as:

\begin{array}{rcl}
L_1 &=& \text{log} \left(\dfrac{\pi_1}{\pi_2+\pi_3+\cdots+\pi_r}\right)\\
L_2 &=& \text{log} \left(\dfrac{\pi_1+\pi_2}{\pi_3+\pi_4+\cdots+\pi_r}\right)\\
& \vdots &  \\
L_{r-1} &=& \text{log} \left(\dfrac{\pi_1+\pi_2+\cdots+\pi_{r-1}}{\pi_r}\right)
\end{array}

In this notation, Lj is the log-odds of falling into or below category j versus falling above it. Suppose we incorporate covariates into the model, like this:

\begin{array}{rcl}
L_1 &=& \beta_{10}+\beta_{11}X_1+\cdots+\beta_{1p}X_p\\
L_2 &=& \beta_{20}+\beta_{21}X_1+\cdots+\beta_{2p}X_p\\
& \vdots &  \\
L_{r-1} &=& \beta_{r-1,0}+\beta_{r-1,1}X_1+\cdots+\beta_{r-1,p}X_p\\
\end{array}

Notice that (unlike the adjacent-category logit model) this is not a linear reparameterization of the baseline-category model. The cumulative logits are not simple differences between the baseline-category logits. Therefore, the above model will not give a fit equivalent to that of the baseline-category model.

Now suppose that we simplify the model by requiring the coefficient of each X-variable to be identical across the r − 1 logit equations. Then, changing the names of the intercepts to α's, the model becomes:

\begin{array}{rcl}
L_1 &=& \alpha_1+\beta_1X_1+\cdots+\beta_p X_p\\
L_2 &=& \alpha_2+\beta_1X_1+\cdots+\beta_p X_p\\
& \vdots &  \\
L_{r-1} &=& \alpha_{r-1}+\beta_1X_1+\cdots+\beta_p X_p
\end{array}

This model, called the proportional-odds cumulative logit model, has (r − 1) intercepts plus p slopes, for a total of r + p − 1 parameters to be estimated.

Notice that intercepts can differ, but that slope for each variable stays the same across different equations! One may think of this as a set of parallel lines (or hyperplanes) with different intercepts. The proportional-odds condition forces the lines corresponding to each cumulative logit to be parallel.

Interpretation

• In this model, intercept αj is the log-odds of falling into or below category j when X1 = X2 = · · · = 0.
• A single parameter βk describes the effect of xk on Y such that βk is the increase in log-odds of falling into or below any category associated with a one-unit increase in Xk, holding all the other X-variables constant; compare this to the baseline logit model where there are J-1 parameters for a single explanatory variable. Therefore, a positive slope indicates a tendency for the response level to decrease as the variable decreases.
• Constant sloped βk: The effect of Xk, is the same for all J-1 ways to collapse Y into dichotomous outcomes.

For simplicity, let's consider only one predictor: $\text{logit}[P(Y \leq j)]=\alpha_j+\beta x$

Then the cumulative probabilities are given by: $P(Y \leq j)=\text{exp}(\alpha_j+\beta x)/(1+\text{exp}(\alpha_j+\beta x))$ and since β is constant, the curves of cumulative probabilities plotted against x are parallel.

• The odds-ratio is proportional to the difference between x1 and x2 where β is the constant of proportionality: exp[β(x1-x2)], and thus the name "proportional odds model".
 Do you see how we get the above measure of odds-ratio?

Continuous Latent Response

One reason for the proportional-odds cumulative-logit model's popularity is its connection to the idea of a continuous latent response. Suppose that the categorical outcome is actually a categorized version of an unobservable (latent) continuous variable.

For example, it is reasonable to think that a 5-point Likert scale (1 = strongly disagree, 2 = agree, 3 = neutral, 4 = agree, 5 = strongly agree) is a coarsened version of a continuous variable Z indicating degree of approval. The continuous scale is divided into five regions by four cut-points c1, c2, c3, c4 which are determined by nature (not by the investigator). If Zc1 we observe Y = 1; if c1< Zc2 we observe Y = 2; and so on. Here is the connection: Suppose that the Z is related to the X's through a homoscedastic linear regression. For example, with a single X, the relationship looks like this.

If the regression of Z on the X's has the form

$Z=\gamma_0+\gamma_1 X_1+\gamma_2 X_2+\ldots+\gamma_p X_p+\epsilon$,

where ε is a random error from a logistic distribution with mean zero and constant variance, then the coarsened version Y will be related to the X's by a proportional-odds cumulative logit model. (The logistic distribution has a bell-shaped density similar to a normal curve. If we were to have normal errors rather than logistic errors, the cumulative logit equations would change to have a probit link. In most cases, the fit of a logit and probit model are quite similar.)

If the regression of Z on the X's is heteroscedastic—for example, if the variance increases with the mean—then the logit equations will "fan out" and not have constant slope. A model with non-constant slopes is somewhat undesirable, because the lines are not parallel; the logit lines will eventually cross each other, implying negative probabilities for some categories.

Two examples of fitting this model in SAS and R are discussed next.  For more details on this and related multinomial models, see Agresti (2007, Sec 6.2 and Sec 6.3) or Agrsti (2013, Sec 8.2 and Sec 8.3).

If time permits, you should also read and listen to the Case Study: The Ice Cream Study at Penn State  where Dr. Bill Harkness unravels the "mystery" of the polytomous logistic regression (through SAS, although we do provide the R code too). While doing this, please try to note:

(1) Why does Dr. Harkness say the ordinary chi-square test is not sufficient for this type of data?

(2) Why is this particular example quadratic in nature?

(3) Can we use the ordinary regression model for this data? And at which point would it be OK to approximate a categorical response variable as continuous; e.g. with how many levels?

(4) Does the proportional odds model for the ice cream data fit well?