10.2.5 - Conditional Independence

Printer-friendly version

Two variables conditionally independent given the third, e.g., (DA, SA), (DS, SA), (DS, DA). Consider modeling D independent from A given S.

Objective:

Main assumptions:

• The N = IJK counts in the cells are assumed to be independent observations of a Poisson random variable, and
• there is no interaction between three categorical variables, i.e., $\lambda_{ijk}^{DSA}=0$ for all i, j, k, and
• there is no partial interactions, $\lambda_{ik}^{DA}=0$, for all i, j, k or conditional odds ratio
• $\theta_{DA(j)} = \dfrac{\mu_{ijk}\mu_{i'jk'}}{\mu_{i'jk}\mu_{ijk'}}=1$

Model Structure:

$\text{log}(\mu_{ijk})=\lambda+\lambda_i^D+\lambda_j^S+\lambda_k^A+\lambda_{ij}^{DS}+\lambda_{jk}^{SA}$

Parameter estimates and interpretation:

This model implies that partial odds ratios are characterized by the two-way interaction terms, and that the associations between D and S do NOT depend on the levels A, and the associations between S and A do NOT depend on the levels of D.

$\theta_{DS(A=reject)}=\theta_{DS(A=accept)}=\dfrac{\text{exp}(\lambda_{ij}^{DS})\text{exp}(\lambda_{i'j'}^{DS})}{\text{exp}(\lambda_{i'j}^{DS})\text{exp}(\lambda_{ij'}^{DS})}$

$\theta_{SA(D="A")}=\ldots=\theta_{SA(D="F")}=\dfrac{\text{exp}(\lambda_{jk}^{SA})\text{exp}(\lambda_{j'k'}^{SA})}{\text{exp}(\lambda_{j'k}^{SA})\text{exp}(\lambda_{jk'}^{SA})}$

 How do these relate to partial and marginal tables? How would you interpret the parameters? Do you notice any similarity in the parameter estimates from the joint independence model? What is the estimated coefficient of DS association for department A? Update, the given SAS or R code to fit the above model, run it, and SUBMIT your answers on ANGEL; notes below may help.

In SAS, this model can be fitted as:

Model Fit:

The goodness-of-fit statistics indicate that the model does not fit.

How did we get these DF?

df = (IJK-1) - ((I-1) + (J-1) + (K-1) + (I-1)(J-1) + (I-1)(K- 1)) = I(J-1)(K-1)

So where is the lack of fit?

As before, look at residuals. For example, the adjusted residual for the first cell is -12.14792, a great deviation from zero.

We can also evaluate overall the individual parameters and their significance:

This is like an ANOVA table in ANOVA and regression models. All parameters are significantly different from zero. That is they contribute significantly in describing the relationships between our variables, but the overall lack of fit of the model suggests that they are not sufficient.

In R, (DS, SA) model can be fitted as:

berk.cind=glm(berk.data$Freq~berk.data$Admit+berk.data$Gender+berk.data$Dept+berk.data$Gender*berk.data$Dept+berk.data$Gender*berk.data$Admit, family=poisson())

Model Fit:

The goodness-of-fit statistics indicate that the model does not fit, e.g., Residual deviance:  783.6  on 10  degrees of freedom

How did we get these DF?

df = (IJK-1)-((I-1)+(J-1)+(K-1)+(I-1)(J-1)+(I-1)(K- 1))=I(J-1)(K-1)

So where is the lack of fit?

As before, look at residuals. For example, the pearson residual for the first cell is 7.551454, a great deviation from zero.

We can also evaluate overall the individual parameters and their significance, by running a command anova(berk.cind):

Analysis of Deviance TableModel: poisson, link: logResponse: berk.data$FreqTerms added sequentially (first to last) Df Deviance Resid. Df Resid. DevNULL 23 2650.10berk.data$Admit                   1   230.03        22    2420.07berk.data$Gender 1 162.87 21 2257.19berk.data$Dept                    5   159.52        16    2097.67berk.data$Gender:berk.data$Dept   5  1220.61        11     877.06berk.data$Admit:berk.data$Gender  1    93.45        10     783.61

This is like an ANOVA table in ANOVA and regression models. All parameters are significantly different from zero. That is they contribute significantly in describing the relationships between our variables, but the overall lack of fit of the model suggests that they are not sufficient. If you compare this with the ANOVA table from the SAS output given above, some of the values differ because of the different order we entered the variables into the model; just like in linear regression models. The values also differ because R gives you the values for the deviance statistic (e.g., "Deviance Resid") whereas SAS gives you the usual chi-squared. To compute the p-value here, e.g. for the last term, 1-pchisq(93.45, 1), which will be nearly 0 indicating that overall this term contributes significantly to the fit of the model.