6.2.4  Explanatory Variable with Multiple Levels
The concepts discussed with binary predictors extend to predictors with multiple levels. In this lesson we consider,
 Y_{i} binary response, X_{i} discrete explanatory variable (with k = 3 levels), and
 link to the analysis of 2 × 3 tables
but the basic ideas easily extend to any 2 × J table.
We begin by replicating the analysis of the original 3 × 2 tables with logistic regression.
How many parents smoke?

Student smokes?


Yes (Z = 1)

No (Z = 2)


Both (Y = 1)

400

1380

One (Y = 2)

416

1823

Neither (Y = 3)

188

1168

First, we reexpress the data in terms of y_{i }= number of smoking students, and n_{i }= number of students for the three groups based on parents behavior:
How many parents smoke?

Student smokes?


y_{i}

n_{i}


Both

400

1780

One

416

2239

Neither

188

1356

Then we decide on a baseline level for the explanatory variable X, and create k − 1 dummy indicators if X is a categorical variable with k levels.
For our example, let's parent smoking = Neither be a baseline, and define a pair of dummy indicators (or design variables) that takes one of two values,
X_{1} = 1 if parent smoking = One ,
X_{1} = 0 if otherwise,X_{2} = 1 if parent smoking = Both,
X_{2} = 0 if otherwise.
Let \(\dfrac{\pi}{1\pi}=\) odds of student smoking; notice that we dropped the index i denoting each individual student to simplify the notation  we are still modeling each student or rather a randomly chosen student from the same classification cell. Then the model
\(\text{log}\left(\dfrac{\pi}{1\pi}\right)=\beta_0+\beta_1 X_1+\beta_2 X_2\)
says that the logodds of a student smoking is β_{0} for parents smoking = neither, β_{0} + β_{1} for parents smoking = one and β_{0} + β_{2} for parents smoking = both, and exp(β_{j}) = conditional odds ratio for level j of predictor X versus the baseline. Therefore,
\begin{array}{rcl}
\beta_1 &=& \text{log_odds for }one  \text{log_odds for }neither \\
exp(\beta_1) &=& \dfrac{\text{odds smoking}one}{\text{odds smoking}neither} \\
\beta_2 &=& \text{log_odds for }both  \text{log_odds for }neither \\
exp(\beta_2) &=& \dfrac{\text{odds smoking}both}{\text{odds smoking}neither} \\
\dfrac{exp(\beta_1)}{exp(\beta_2)}&=& exp(\beta_1\beta_2)=\dfrac{\text{odds smoking}one}{\text{odds smoking}both}
\end{array}
and we expect to get \(\hat{\beta}_1=ln(1.42)=0.351\), e.g. 1.42=(416 × 1168)/(188 × 1823), and \(\hat{\beta}_2=ln(1.80)=0.588\), e.g., 1.80=400 × 1168/188 × 1380. The estimated intercept should be \(\hat{\beta}_0=ln(188/1168)=1.826\).
Fitting A Multilevel Predictor Model in SAS and R
There are different versions of SAS program on the course web site to fit this data. Here is one from smoke.sas where '0' = neither parent smokes, '1' = one smokes, and '2' = both smoke, and we use PROC LOGISTIC; notice we could use proc GENMOD too.
The option param=ref tells SAS to create a set of two dummy variables to distinguish among the three categories, where '0'=neither is a baseline because of option descending and ref=first (see the previous section for details).
Another way of doing the same in smoke.sas but using character labels (e..g, "both", "one", "neither") rather than numbers for the categories:
In the class statement, the option order=data tells SAS to sort the categories of S by the order in which they appear in the dataset rather than alphabetical order. The option ref='neither' makes neither the reference group (i.e. the group for which both dummy variables are zero). Let's look at some relevant portions of the output of smoke.lst that differ from the analysis of the corresponding 2 × 2 table from the previous section of the notes.
Fisher scoring is a variant of NewtonRaphson method for ML estimation. In logistic regression they are equivalent. Notice now there are 3 observations since we have 3 groupings by the levels of the explanatory variable.
From an explanatory variable S with 3 levels (0,1,2), we created two dummy variables, i.e., design variables:
X_{1} = 1 if parent smoking = One,
X_{1} = 0 otherwise,X_{2} = 1 if parent smoking=Both,
X_{2} = 0 otherwise.
Since parent smoking = Neither is equal 0 for both dummy variables, that's the baseline.
Here, we specified neither to be a reference variable. We used option data=order so that both takes values (1,0), and one (0,1). If we didn't use that option, the other levels would be set based on the alphabetical order, but in this case they coincide.
What are the parameter estimates? What is the fitted model?
Here the s_{1} and s_{2} correspond to x_{1} and x_{2} dummy variables. The saturated model is,
\(\text{logit}(\pi)=1.8266+0.3491 X_1+0.5882 X_2\)
and the estimated probability of a child smoking given the explanatory variable:
\(\hat{\pi}_i=\dfrac{\text{exp}(1.8266+0.3491 X_1+0.5882 X_2)}{1+\text{exp}(1.8266+0.3491 X_1+0.5882 X_2)}\)
For example, the predicted probability of a student smoking given that only one parent is smoking is:
\begin{align}
P(Y_i=1neither=0,one=1,both=0) &= P(Y_i=1X_1=1,X_2=0)\\
&= \dfrac{\text{exp}(1.8266+0.3491)}{1+\text{exp}(1.8266+0.3491)}\\
&= 0.1858\\
\end{align}
In SAS, if we include the statement
output out=predict pred=phat reschi=pearson resdev=deviance;
then SAS creates a new dataset called "results" that includes all of the variables in the original dataset, the predicted probabilities \(\hat{\pi}_i\), and the Pearson and deviance residuals. We can add some code to calculate and print out the estimated expected number of successes \(\hat{\mu}_i=n_i\hat{\pi}_i\) (e.g., "shat") and failures \(n_i\hat{\mu}_i=n_i(1\hat{\pi}_i)\) (e.g., "fhat").
Running this program gives a new output section:
Here "s" are the levels of the categorical predictor for parents' smoking behavior, "y" as before the number of students smoking for each level of the predictor, "n" the marginal counts for each level of the predictor", "prob" is the estimated probability of "success" (e.g. a student smoking given the level of the predictor), "shat" and "fhat" expected number of successes and failures respectively, and "pearson" and "deviance" are Pearson and Deviance residuals.
All of the "shat" and "fhat" values, that is predicted number of success and failures are greater than 5.0, so the χ^{2} approximation is trustworthy.
Below is the R code (smoke.R) that replicates the analysis of the original 2 × 3 table with logistic regression.
First, let’s see the table we created for the analysis.
You should again notice the difference in data input. In SAS, we input y and n as data columns; here we just input data columns as yes and no.
Next, the model information is displayed.
If you don’t specify the baseline using the “class” option, R will set the first level as the default. Here, it’s “0”. the parentsmoke1 and parentsmoke2 correspond to the X_{1} and X_{2} dummy variables. The saturated model is:
\(\text{logit}(\pi)=1.8266+0.3491 X_1+0.5882 X_2\)
and the estimated probability of a child smoking given the explanatory variable:
\(\hat{\pi}_i=\dfrac{\text{exp}(1.8266+0.3491 X_1+0.5882 X_2)}{1+\text{exp}(1.8266+0.3491 X_1+0.5882 X_2)}\)
For example, the predicted probability of a student smoking given that only one parent is smoking is:
\begin{align}
P(Y_i=1neither=0,one=1,both=0) &= P(Y_i=1X_1=1,X_2=0)\\
&= \dfrac{\text{exp}(1.8266+0.3491)}{1+\text{exp}(1.8266+0.3491)}\\
&= 0.1858\\
\end{align}
In R, deviance residuals are given directly in the output by summary() function.
> smoke.logistic<glm(response~parentsmoke, family=binomial(link=logit))
> summary(smoke.logistic)
Call:
glm(formula = response ~ parentsmoke, family = binomial(link = logit))
Deviance Residuals:
[1] 0 0 0
Coefficients:
Estimate Std. Error z value Pr(>z)
(Intercept) 1.82661 0.07858 23.244 < 2e16 ***
parentsmoke1 0.34905 0.09554 3.654 0.000259 ***
parentsmoke2 0.58823 0.09695 6.067 1.3e09 ***

We can also obtain the deviance and Pearson residuals by using residulas.glm() function. Do help(residuals.glm()) to study this object more.
To obtain deviance residuals
> residuals(smoke.logistic)
[1] 0 0 0
To obtain Person residuals:
> residuals(smoke.logistic, type="pearson")
1 2 3
2.440787e13 4.355932e13 1.477321e11
To obtain the predicted values, use function predict.glm() with which we can specify the type of predicted values we want.
type is the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a default binomial model the default predictions are of logodds (probabilities on logit scale) and type = "response" gives the predicted probabilities. The "terms" option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale.
For example, the code below gives predicted probabilities.
> predict(smoke.logistic, type="response")
1 2 3
0.2247191 0.1857972 0.1386431
All of the predicted number of success and failures are greater than 5.0 so the χ^{2} approximation is trustworthy.
Overall goodnessoffit
The goodnessoffit statistics Χ^{2} and G^{2} from this model are both zero, because the model is saturated. Suppose that we fit the interceptonly model as before by removing the predictors from the model statement:
The goodnessoffit statistics are shown below.
The Pearson statistic Χ^{2} = 37.5663 and the deviance G^{2} = 38.3658 are precisely equal to the usual Χ^{2} and G^{2} for testing independence in the 2 × 3 table.
Clearly, in this example, the saturated model fits perfectly and the independence model does not fit well.
We can use the HosmerLemeshow test to assess the overall fit of the model. As in the previous example, the Hosmer and Lemeshow statistic is not very meaningful since the number of groups is small.
The residual deviance is almost 0 because the model is saturated. We can find G^{2 }= 38.366 by null deviance, which are precisely equal to the usual G^{2} for testing independence in the 2 by 3 table. Since this statistics is large which leads to small pvalues, it provides evidence against the interceptonly model in favor of the current model. You can also find the same result in the output from the anova() part.
Clearly, in this example, the saturated model fits perfectly and the independence model does not fit well.
We can use the HosmerLemeshow test to assess the overall fit of the model. As in the previous example, the Hosmer and Lemeshow statistic is not very meaningful since the number of groups is small.
Here is a link to a program in R that produces this Hosmer and Lemeshow statistic (ROCandHL.R)
> ### TO use HomserLemeshow statistic (although not relevant here since the number of groups is small)
> ## First run ROCandHL.R script which has function hosmerlem() in it
> ## hosmerlem() takes the vector of successes, predicted vector of success and g=# of groups as input
> ## produce the vector of predicted success "yhat"
> yhat=rowSums(response)*predict(smoke.logistic, type="response")
> yhat
1 2 3
400 416 188
>
> hosmerlem(response[,1], yhat, g=3) ## here run 3 groups
X^2 Df P(>Chi)
"1.00593633284243e24" "1" "."
We have shown that analyzing a 2 × 3 table for associations is equivalent to a binary logistic regression with two dummy variables as predictors. For 2 × J tables, we would fit a binary logistic regression with J − 1 dummy variables.
Testing the Joint Significance of All Predictors.
In our model
\(\text{log}\left(\dfrac{\pi}{1\pi}\right)=\beta_0+\beta_1 X_1+\beta_2 X_2\)
Test H_{0} : β_{1} = β_{2} = 0 versus the alternative that at least one of the coefficients β_{1}, . . . , β_{k } is not zero.
In other words, this is testing the null hypothesis that an interceptonly model is correct,
\(\text{log}\left(\dfrac{\pi}{1\pi}\right)=\beta_0\)
versus the alternative that the current model is correct
\(\text{log}\left(\dfrac{\pi}{1\pi}\right)=\beta_0+\beta_1 X_1+\beta_2 X_2\)
This test has k degrees of freedom (e.g. the number of dummy or indicator variables, that is the number of βparameters (except the intercept).
Large chisquare statistics lead to small pvalues and provide evidence against the interceptonly model in favor of the current model.
Testing for an Arbitrary Group of Coefficients
For the previous test this would be equivalent to testing "interceptonly" model vs. full (saturated) model.
The likelihoodratio statistic is:
ΔG^{2} = −2 log L from reduced model
−(−2 log L from current model)
and the degrees of freedom is k (the number of coefficients in question). The pvalue is \(P(\chi^2_k \geq \Delta G^2)\).
For our example, ΔG^{2} = 5176.510 − 5138.144 = 38.3658 with df = 3 − 1 = 2.
Notice that this matches:
Likelihood Ratio 38.3658 2 <.0001
from the "Testing Global Hypothesis: BETA=0" section.
How would you test only for X_{2}? Which two models can you compare? How do you interpret this case? 
For example, here is the saturated model we just looked at in SAS.
Compare this to
model y/n = s1 /scale=none lackfit;
Testing Individual Parameters
Testing a hypothesis that the probability of the characteristic depends on the value of the jth variable.
Testing H_{0} : β_{j} = 0 versus H_{1} : β_{j} ≠ 0.
A value of z^{2} bigger than 3.84 indicates that we can reject the null hypothesis β_{j} = 0 at the .05level.
From the second row of this part of the ouput:
\(\beta_1:\left(\dfrac{0.3491}{0.0955}\right)^2=13.3481\)
A value of z bigger than 1.96 or small pvalue indicates that we can reject the null hypothesis β_{j} = 0 at the .05level.
Confidence Intervals: An approximate (1 − α) × 100% confidence interval for j is given by
\(\hat{\beta}_j \pm z_{(1\alpha/2)} \times SE(\hat{\beta}_j)\)
For example, a 95% CI for β_{1}
0.3491 ± 1.96 × 0.0955 = (0.16192, 0.5368)
Then, the 95% CI for the oddsratio of a student smoking if one parent is smoking in comparison to neither smoking is:
(exp(0.16192), exp(0.5368)) = (1.176, 1.710)
Student and parents' smoking behaviors are dependent. Furthermore,
 estimated conditional odds ratio of a student smoking between one parent smoking and neither smoking: exp(β_{1}) = exp(0.3491) = 1.418,
 estimated conditional odds ratio of a student smoking between both parents smoking and neither smoking: exp(β_{2}) = exp(0.5882) = 1.801, and
 estimated conditional odds ratio of a student smoking between both parents smoking and one smoking: exp(β_{2})/exp(β_{1}) = 1.801/1.418 = 1.27 = exp(β_{2} − β_{1}) = exp(0.5882 − 0.3491) = 1.27, i.e., a student who has both parents smoking is 27% more likely (on the odds scale) to smoke than the a student who has only one parent smoking.