# 5.4.5 - Cochran-Mantel-Haenszel Test

Printer-friendly version

This is another way to test for conditional independence, by exploring associations in partial tables for 2 × 2 × K tables. Recall, the null hypothesis of conditional independence is equivalent to the statement that all conditional odds ratios given the levels k are equal to 1, e.g.,

H0 : θAB(1) = θAB(2) = ... = θAB(K) = 1

The Cochran-Mantel-Haenszel (CMH) test statistic is

$$M^2=\dfrac{[\sum_k(n_{11k}-\mu_{11k})]^2}{\sum_k Var(n_{11k})}$$

where $$\mu_{11k}=E(n_{11})=\frac{n_{1+k}n_{+1k}}{n_{++k}}$$ is the expected frequency of the first cell in the kth partial table assuming the conditional independence holds, and the variance of cell (1, 1) is

$$Var(n_{11k})=\dfrac{n_{1+k}n_{2+k}n_{+1k}n_{+2k}}{n^2_{++k}(n_{++k}-1)}$$.

### Properties of  the CMH statistic

• For large samples, when H0 is true, CMH has chi-squared distribution with df = 1.
• If all θAB(k) = 1, then CMH is close to zero
• If some or all θAB(k) > 1, then CMH is large
• If some or all θAB(k) < 1, then CMH is large
• If some θAB(k) < 1 and others θAB(k) > 1, then CMH is NOT an appropriate test; that is, the test works well if the conditional odds ratios are in the same direction and comparable in size.
• The CMH can be generalized to I × J × K tables (see Agresti (2007), Sec. 7.3.5-7.3.7 or Agresti (2013) Sec. 8.4.3.-8.4.4). The generalization varies depending on the nature of the variables:
• the general association statistic treats both variables as nominal and thus has df = (I −1)×(J −1)
• the row mean scores differ statistic treats the row variable as nominal and column variable as ordinal, and has df = I − 1
• the nonzero correlation statistic treats both variables as ordinal, and df = 1

### Common odds-ratio estimate

As we have seen before, it’s always informative to have a summary estimate of strength of association (rather than just a hypothesis test). If the associations are similar across the partial tables, we can summarize them with a single value: an estimate of the common odds ratio, which for a 2 × 2 × K table equals:

$$\hat{\theta}_{MH}=\dfrac{\sum_k(n_{11k}n_{22k})/n_{++k}}{\sum_k(n_{12k}n_{21k})/n_{++k}}$$

This is a useful summary statistics especially if the model of homogeneous associations holds, as we will see in the next section.

### Example - Boy Scouts and Juvenile Delinquency

For the boy scout data based on the first method of doing individual chi-squared tests in each conditional table we concluded that B and D are independent given S. Here we repeat our analysis using the CMH statistic.

In the SAS program file boys.sas, the cmh option (e.g., tables SES*scouts*delinquent / chisq cmh) gives the following summary statistics output where the CMH statistics are:

The small value of the general association statistic, CMH = 0.0080 which is very close to zero indicates that conditional independence model is a good fit for this data; i.e., we cannot reject the null hypothesis.

The hypothesis of conditional independence is tenable, thus θBD(high) = θBD(mid) = θBD(low) = 1, is also tenable.  Below, we can see that the association can be summarized with the common odds ratio value of 0.978, with a 95% CI (0.597, 1.601).

Since θBD(high) ≈ θBD(mid) ≈ θBD(low), the CMH is typically a more powerful statistic than the Pearson chi-squared statistic we calculated in the prevous section, X2 = 0.160.

The option in R is mantelhaen.test() and used in the file boys.R as shown below:

Here is the output:

It gives the same value as SAS (e.g., Mantel-Haenszel X2= 0.008, df = 1, p-value = 0.9287), and it only computes the general association version of the CMH statistic which treats both variables as nominal, which is very close to zero and indicates that conditional independence model is a good fit for this data; i.e., we cannot reject the null hypothesis.

The hypothesis of conditional independence is tenable, thus θBD(high) = θBD(mid) = θBD(low) = 1, is also tenable. Above, we can see that the association can be summarized with the common odds ratio value of 0.978, with a 95% CI (0.597, 1.601).

Since θBD(high) ≈ θBD(mid) ≈ θBD(low), the CMH is typically a more powerful statistic than the Pearson chi-squared statistic we calculated in the prevous section, X2 = 0.160.