# 5.4.5 - Cochran-Mantel-Haenszel Test

This is another way to test for conditional independence, by exploring associations in partial tables for 2 × 2 × *K* tables. Recall, the null hypothesis of conditional independence is equivalent to the statement that **all** conditional odds ratios given the levels *k * are equal to 1, e.g.,

*H*_{0} : θ_{AB(1)} = θ_{AB(2)} = ... = θ_{AB(K)} = 1

The Cochran-Mantel-Haenszel (*CMH*) test statistic is

\(M^2=\dfrac{[\sum_k(n_{11k}-\mu_{11k})]^2}{\sum_k Var(n_{11k})}\)

where \(\mu_{11k}=E(n_{11})=\frac{n_{1+k}n_{+1k}}{n_{++k}}\) is the expected frequency of the first cell in the *k*th partial table assuming the conditional independence holds, and the variance of cell (1, 1) is

\(Var(n_{11k})=\dfrac{n_{1+k}n_{2+k}n_{+1k}n_{+2k}}{n^2_{++k}(n_{++k}-1)}\).

### Properties of the CMH statistic

- For large samples, when
*H*_{0}is true,*CMH*has chi-squared distribution with*df*= 1. - If all θ
_{AB(k)}= 1, then*CMH*is close to zero - If some or all θ
_{AB(k) }> 1, then*CMH*is large - If some or all θ
_{AB(k)}< 1, then*CMH*is large - If some θ
_{AB(k)}_{AB(k)}*CMH*is NOT an appropriate test; that is, the test works well if the conditional odds ratios are in the same direction and comparable in size. - The
*CMH*can be generalized to*I*×*J*×*K*tables (see Agresti (2007), Sec. 7.3.5-7.3.7 or Agresti (2013) Sec. 8.4.3.-8.4.4). The generalization varies depending on the nature of the variables:- the general association statistic treats both variables as nominal and thus has
*df*= (*I*−1)×(*J*−1) - the row mean scores differ statistic treats the row variable as nominal and column variable as ordinal, and has
*df*=*I*− 1 - the nonzero correlation statistic treats both variables as ordinal, and
*df*= 1

- the general association statistic treats both variables as nominal and thus has

### Common odds-ratio estimate

As we have seen before, it’s always informative to have a summary estimate of strength of association (rather than just a hypothesis test). If the associations are similar across the partial tables, we can summarize them with a single value: ** an estimate of the common odds ratio**, which for a 2 × 2 ×

*K*table equals:

\(\hat{\theta}_{MH}=\dfrac{\sum_k(n_{11k}n_{22k})/n_{++k}}{\sum_k(n_{12k}n_{21k})/n_{++k}}\)

This is a useful summary statistics especially if the model of *homogeneous associations* holds, as we will see in the next section.

### Example - Boy Scouts and Juvenile Delinquency

For the boy scout data based on the first method of doing individual chi-squared tests in each conditional table we concluded that *B* and *D* are independent given *S*. Here we repeat our analysis using the CMH statistic.

In the SAS program file boys.sas, the **cmh** option (e.g., **tables SES*scouts*delinquent / chisq cmh**) gives the following summary statistics output where the *CMH* statistics are:

The small value of the *general association* statistic, *CMH* = 0.0080 which is very close to zero indicates that conditional independence model is a good fit for this data; i.e., we cannot reject the null hypothesis.

The hypothesis of conditional independence is tenable, thus θ_{BD(high)} = θ_{BD(mid)} = θ_{BD(low)} = 1, is also tenable. Below, we can see that the association can be summarized with the common odds ratio value of 0.978, with a 95% CI (0.597, 1.601).

Since θ_{BD(high)} ≈ θ_{BD(mid)} ≈ θ_{BD(low)}, the *CMH* is typically a more powerful statistic than the Pearson chi-squared statistic we calculated in the prevous section, *X*^{2} = 0.160.

The option in R is mantelhaen.test() and used in the file boys.R as shown below:

Here is the output:

It gives the same value as SAS (e.g., Mantel-Haenszel X^{2}= 0.008, df = 1, p-value = 0.9287), and it only computes the *general association version *of the CMH statistic which treats both variables as nominal, which is very close to zero and indicates that conditional independence model is a good fit for this data; i.e., we cannot reject the null hypothesis.

The hypothesis of conditional independence is tenable, thus θ_{BD(high)} = θ_{BD(mid)} = θ_{BD(low)} = 1, is also tenable. Above, we can see that the association can be summarized with the common odds ratio value of 0.978, with a 95% CI (0.597, 1.601).

Since θ_{BD(high)} ≈ θ_{BD(mid)} ≈ θ_{BD(low)}, the *CMH* is typically a more powerful statistic than the Pearson chi-squared statistic we calculated in the prevous section, *X*^{2} = 0.160.