8.1 - Polytomous (Multinomial) Logistic Regression

Printer-friendly versionPrinter-friendly version

We have already learned about binary logistic regression, where the response is a binary variable with 'success' and 'failure' being only two categories. But logistic regression can be extended to handle responses, Y, that are polytomous, i.e. taking r > 2 categories. (Note: The word polychotomous is sometimes used, but this word does not exist!).

When r = 2, Y is dichotomous and we can model log of odds that an event occurs or does not occur. For binary logistic regression there is only 1 logit that we can form.

\(\text{logit}(\pi)=\text{log}\left(\dfrac{\pi}{1-\pi}\right)\)

When r > 2, we have a multi-category or polytomous response variable. There are r (r − 1)/2 logits (odds) that we can form, but only (r − 1) are non-redundant. There are different ways to form a set of (r − 1) non-redundant logits, and these will lead to different polytomous (multinomial) logistic regression models.

Multinomial Logistic Regression models how multinomial response variable Y depends on a set of k explanatory variables, X=(X1, X2, ... Xk). This is also a GLM where the random component assumes that the distribution of Y is  Multinomial(n,$\mathbf{π}$), where $\mathbf{π}$ is a vector with probabilities of "success" for each category. The systematic component are explanatory variables (can be continuous, discrete, or both) and are linear in the parameters, e.g.,  β0 + βxi + ... + β0 + βxk. Again, transformation of the X's themselves are allowed like in linear regression. The link function is the generalized Logit, that it the logit link for each pair of non-redundant logits as discussed above. 

When analyzing a polytomous response, it's important to note whether the response is ordinal (consisting of ordered categories) or nominal (consisting of unordered categories). For binary logistic model this question does not arise.

Some types of models are appropriate only for ordinal responses; e.g., cumulative logits model, adjacent categories model, continuation ratios model Other models may be used whether the response is ordinal or nominal; e.g., baseline logit model, and conditional logit model.

If the response is ordinal, we do not necessarily have to take the ordering into account, but only very rarely this information is ignored. Ordinality in the response is a vital information; neglecting it almost always will lead to sub-optimal models. Using the natural ordering can

  • lead to a simpler, more parsimonious model and
  • increase power to detect relationships with other variables.

If the response variable is polytomous and all the potential predictors are discrete as well, we could describe the multi-way contingency table by a loglinear model. However, if you are analyzing a set of categorical variables, and one of them is clearly a "response" while the others are predictors, I recommend that you use logistic rather than loglinear models. Fitting a loglinear model in this setting could have two disadvantages:

  • It has many more parameters, and many of them are not of interest. The loglinear model, as we will learn later, describes the joint distribution of all the variables, whereas the logistic model describes only the conditional distribution of the response given the predictors.
  • The loglinear model is often more complicated to interpret. In the loglinear model, the effect of a predictor X on the response Y is described by the XY association. In a logit model, however, the effect of X on Y is a main effect.

 


Grouped versus ungrouped response & the sampling model

We have already pointed out in lessons on logistic regression, data can come in ungrouped (e.g., database form) or grouped format (e.g., tabular form).

Consider a study that investigates the cheese preference for four types of cheeses; for the detailed analysis see Cheese Tasting example. The response variable Y is a Likert Scale response with nine categories:

Y = 1 for strong dislike ,
Y = 2 dislike,
.
.
.
Y = 9 for excellent taste.

The main predictor of interest is type of cheese (A, B, C and D). The data could arrive in ungrouped form, with one record per subject (as below) where the first column indicates the type of cheese and the second column the value of Y:

A 1
A 3
B 4
C 1
D 9
A 3
B 2
D 7
D 1
.
.
D 9
.
.

Or it could arrive in grouped form (e.g., table):

Cheese
Response category
1
2
3
4
5
6
7
8
9
A
0
0
1
7
8
8
19
8
1
B
6
9
12
11
7
6
1
0
0
C
1
1
6
8
23
7
5
1
0
D
0
0
0
1
3
7
14
16
11

Sampling Model

In ungrouped form, the response occupies a single column of the dataset, but in grouped form the response occupies r columns. Most computer programs for polytomous logistic regression can handle grouped or ungrouped data.

Whether the data are grouped or ungrouped, we will imagine the response to be multinomial. That is, the "response" for row i,

\(y_i=(y_{i1},y_{i2},\ldots,y_{ir})^T \),

is assumed to have a multinomial distribution with index \(n_i=\sum_{j=1}^r y_{ij}\) and parameter

\(\pi_i=(\pi_{i1},\pi_{i2},\ldots,\pi_{ir})^T \).

For example, for the first row, cheese A, π1 = (π11, π12, . . . , π19)T.

  • If the data are grouped, then ni is the total number of "trials" in the ith row of the dataset, and yij is the number of trials in which outcome j occurred. For example, for the first row, cheese A, n1 = 52, and there are 0 people who have a strong dislike for this cheese, y11 = 0 or just dislike the cheese A, y12 = 0.
  • If the data are ungrouped, yi = 1 implies that the outcome has occurred and yi = 0 if it has not, while ni = 1. Note, however, that if the data are ungrouped, we do not have to actually create a dataset with columns of 0's and 1's; a single column containing the response level 1, 2, . . . , r is sufficient like above for the cheese example.

Describing polytomous responses by a sequence of binary models.

In some cases, it makes sense to "factor" the response into a sequence of binary choices and model them with a sequence of ordinary logistic models. The number of binary logistic regressions needed is equal to the number of categories of the response minus 1, e.g., r-1.

For example, consider a medical study to investigate the long-term effects of radiation exposure on mortality. The response variable has four levels (Y=1 if alive, Y=2 if dead from cause other than cancer, Y=3 if dead from cancer other than leukemia, and Y=4 if dead from leukemia). The main predictor of interest is level of exposure (low, medium, high). The four-level response can be modeled via a single multinomial model, or as a sequence of binary choices in three stages:

plot

  • The stage 1 model, which is fit to all subjects, describes the log-odds of death.
  • The stage 2 model, which is fit only to the subjects that die, describes the log-odds of death due to cancer versus death from other causes.
  • The stage 3 model, which is fit only to the subjects who die of cancer, describes the log-odds of death due to leukemia versus death due to other cancers.

Because the multinomial distribution can be factored into a sequence of conditional binomials, we can fit these three logistic models separately. The overall likelihood function factors into three independent likelihoods.

This approach is attractive when the response can be naturally arranged as a sequence of binary choices. But in situations where arranging such a sequence is unnatural, we should probably fit a single multinomial model to the entire response. The cheese example above would not be a good example for  the binary sequence approach since we are dealing with four very different cheese types.