The Wilcoxon Signed Rank Test for a Median

Printer-friendly versionPrinter-friendly version

Developed in 1945 by the statistician Frank Wilcoxon, the signed rank test was one of the first "nonparametric" procedures developed. It is considered a nonparametric procedure, because we make only two simple assumptions about the underlying distribution of the data, namely that:

(1) the random variable X is continuous

(2) the probablility density function of X is symmetric

Then, upon taking a random sample X1, X2, ..., Xn, we are interested in testing the null hypothesis:

\(H_0 : m=m_0\)

against any of the possible alternative hypotheses:

\(H_A : m > m_0\)   or   \(H_A : m < m_0\)   or   \(H_A : m \ne m_0\)

As we often do, let's motivate the procedure by way of example.

pygmy sunfishExample

Let Xi denote the length, in centimeters, of a randomly selected pygmy sunfish, i = 1, 2, ... 10. If we obtain the following data set:

5.0   3.9   5.2   5.5   2.8   6.1   6.4   2.6   1.7   4.3

can we conclude that the median length of pygmy sunfish differs significantly from 3.7 centimeters?

Solution. We are interested in testing the null hypothesis H0: m = 3.7 against the alternative hypothesis HA: m ≠ 3.7. In general, the Wilcoxon signed rank test procedure requires five steps. We'll introduce each of the steps as we apply them to the data in this example.

Step #1. In general, calculate X− m0 for i = 1, 2, ..., n. In this case, we have to calculate X− 3.7 for i = 1, 2, ..., 10:

data

Step #2. In general, calculate the absolute value of X− m0, that is, |X− m0| for i = 1, 2, ..., n. In this case, we have to calculate |X− 3.7| for i = 1, 2, ..., 10:

data

Step #3. Determine the rank Ri, i = 1, 2,..., n of the abolute values (in ascending order) according to their magnitude. In this case, the value of 0.2 is the smallest, so it gets rank 1. The value of 0.6 is the next smallest, so it gets rank 2. We continue ranking the data in this way until we have assigned a rank to each of the data values:

data

Step #4. Determine the value of W, the Wilcoxon signed rank test statistic:

\[ W=\sum_{i=1}^{n}Z_i R_i\]

where Zi is an indicator variable with Zi = 0 if X− m0 is negative and Zi = 1 if X− m0 is positive. That is, with Zi defined as such, W is then the sum of the positive signed ranks. In this case, because the first observation yields a positive X1 − 3.7, namely 1.3, Z1 = 1. Because the fifth obervation yields a negative X5 − 3.7, namely −0.9, Z5 = 0. Determining Zi as such for i = 1, 2, ..., 10, we get:

data

And, therefore W equals 40:

\[ W=(1)(5)+(1)(1)+ ... +(0)(-8)+(1)(2) =5+1+6+7+9+10+2=40\]

Step #5. Determine if the observed value of W is extreme in light of the assumed value of the median under the null hypothesis. That is, calculate the P-value associated with W, and make a decision about whether to reject or not to reject. Whoa, nellie!  We're going to have to take a break from this example before we can finish, as we first have to learn something about the distribution of W.

The Distribution of W

As is always the case, in order to find the distribution of the discrete random variable W, we need:

(1) to find the range of possible values of W, that is, we need to specify the support of W

(2) to determine the probability that W takes on each of the values in the support

Let's tackle the support of W first. Well, the smallest that \(W=\sum_{i=1}^{n}Z_i R_i\) could be is 0. That would happen if each observation Xi fell below the value of the median m0 specified in the null hypothesis, thereby causing Zi = 0, for i = 1, 2, ..., n:

drawing

The largest that \(W=\sum_{i=1}^{n}Z_i R_i\) could be is \(\frac{n(n+1)}{2}\). That would happen if each observation fell above the value of the median m0 specified in the null hypothesis, thereby causing Zi = 1, for i = 1, 2, ..., n:

drawing

and therefore W reduces to the sum of the integers from 1 to n:

\[W=\sum_{i=1}^{n}Z_i R_i=\sum_{i=1}^{n}=\frac{n(n+1)}{2}\]

So, in summary, W is a discrete random variable whose support ranges between 0 and n(n+1)/2.

Now, if we have a small sample size n, such as we do in the above example, we could use the exact probability distribution of W to calculate the P-values for our hypothesis tests. Errr.... first we have to determine the exact probability distribution of W. Doing so is very doable. It just takes some thinking and perhaps a bit of tedious work. Let's make our discussion concrete by considering a very small sample size, n = 3, say. In that case, the possible values of W are the integers 0, 1, 2, 3, 4, 5, 6. Now, each of the three data points would be assigned a rank Ri of either 1, 2, or 3, and depending on whether the data point fell above or below the hypothesized median m0, each of the three possible ranks 1, 2, or 3 would remain either a positive signed rank or become a negative signed rank. In this case, because we are considering such a small sample size, we can easily enumerate each of the possible outcomes, as well as sum W of the positive ranks to see how each arrangement results in one of the possible values of W:

W when n equals 3

There we have it. We're just about done with finding the exact probability distribution of when n = 3. All we have to do is recognize that under the null hypothesis, each of the above eight arrangements (columns) is equally likely. Therefore, we can use the classical approach to assigning the probabilities. That is:

  • P(W = 0) = 1/8, because there is only one way that W = 0
  • P(W = 1) = 1/8, because there is only one way that W = 1
  • P(W = 2) = 1/8, because there is only one way that W = 2
  • P(W = 3) = 2/8, because there are two ways that W = 3
  • P(W = 4) = 1/8, because there is only one way that W = 4
  • P(W = 5) = 1/8, because there is only one way that W = 5
  • P(W = 6) = 1/8, because there is only one way that W = 6

And, just to make sure that we haven't made an error in our calculations, we can verify that the sum of the probabilities over the support 0, 1, ..., 6 is indeed 1/8 + 1/8 + ... + 1/8 = 1. 

Hmmm. That was easy enough. Let's do the same thing for a sample size of n = 4. Well, in that case, the possible values of W are the integers 0, 1, 2, ..., 10. Now, each of the four data points would be assigned a rank Ri of either 1, 2, 3, or 4, and depending on whether the data point fell above or below the hypothesized median m0, each of the three possible ranks 1, 2, 3, or 4 would remain either a positive signed rank or become a negative signed rank. Again, because we are considering such a small sample size, we can easily enumerate each of the possible outcomes, as well as sum W of the positive ranks to see how each arrangement results in one of the possible values of W:

W when n equals 4

Again, under the null hypothesis, each of the above 16 arrangements is equally likely, so we can use the classical approach to assigning the probabilities:

  • P(W = 0) = 1/16, because there is only one way that W = 0
  • P(W = 1) = 1/16, because there is only one way that W = 1
  • P(W = 2) = 1/16, because there is only one way that W = 2
  • P(W = 3) = 2/16, because there are two ways that W = 3
  • and so on...
  • P(W = 9) = 1/16, because there is only one way that W = 9
  • P(W = 10) = 1/16, because there is only one way that W = 10

Do you want to do the calculation for the case where n = 5? Here's what the enumeration of possible outcomes looks like:

W when n equals 5

After having worked through finding the exact probability distribution of W for the cases where n = 3, 4, and 5, we should be able to make some generalizations. First, note that, in general, there are 2n total number of ways to make signed rank sums, and therefore the probability that W takes on a particular value w is:

\[P(W=w)=f(w)=\frac{c(w)}{2^n}\]

where c(w) = the number of possible ways to assign a + or a − to the first n integers so that \(\sum_{i=1}^{n}Z_i R_i=w\). 

Okay, now that we have the general idea of how to determine the exact probability distribution of W, we can breathe a sigh of relief when it comes to actually analyzing a set of data. That's because someone else has done the dirty work for us for sample sizes n = 3, 4, ..., 12, and published the relevant results in a statistical table of W. (Our textbook authors chose not to include such a table in our textbook.) By relevant, I mean the probabilities in the "tails" of the distribution of W. After all, that's what P-values generally are, that is, probabilities in the tails of the distribution under the null hypothesis.

As the table of W suggests, our determination of the probability distribution of W when n = 4 agrees with the results published in the table:

table

because both we and the table claim that:

\[P(W \le 0)=P(W \ge 10)=0.062\]

and:

\(P(W \le 1)=P(W =0)+P(W =1)=0.062+0.062=0.125\)

\(P(W \ge 9)=P(W =9)+P(W =10)=0.062+0.062=0.125\)

Okay, it should be pretty obvious that working with the exact distribution of W is going to be pretty limiting when it comes to large sample sizes. In that case, we do what we typically do when we have large sample sizes, namely use an approximate distribution of W.

Theorem. When the null hypothesis is true, for large n:

\[W'={\sum_{i=1}^{n}Z_i R_i - \frac{n(n+1)}{4} \over \sqrt{\frac{n(n+1)(2n+1)}{24}}}\]

follows an approximate standard normal distribution N(0, 1).

Proof. Because the Central Limit Theorem is at work here, the approximate standard normal distribution part of the theorem is trivial. Our proof therefore reduces to showing that the mean and variance of W are:

\(E(W)=\frac{n(n+1)}{4}\)   and   \(Var(W)=\frac{n(n+1)(2n+1)}{24}\)

respectively. To find E(W) and Var(W), note that \(W=\sum_{i=1}^{n}Z_i R_i\) has the same distribution of \(U=\sum_{i=1}^{n}U_i\) where:

  • Ui = 0 with probability ½
  • Ui = i with probability ½

In case that claim was less than obvious, consider this intuitive, hand-waving kind of argument:

  • W and U are both sums of a subset of the numbers 1, 2, ..., n
  • Under symmetry, an equally likely chance of getting assigned either a + or a − is equivalent to having an equally likely chance of being included in the sum or not.

At any rate, we therefore have:

\[E(W)=E(U)=\sum_{i=1}^{n}E(U_i)=\sum_{i=1}^{n}\left[0\left(\frac{1}{2}\right)+i\left(\frac{1}{2}\right) \right]=\frac{1}{2}\sum_{i=1}^{n}i=\frac{1}{2}\times\frac{n(n+1)}{2}=\frac{n(n+1)}{4} \]

and:

\[Var(W) =Var(U)=\sum_{i=1}^{n}Var(U_i)\]

because the Ui's are independent under the null hypothesis. Now:

\[Var(U_i) = (E(U_{i}^{2})-E(U_i))^2 = \left[0^2\left(\frac{1}{2}\right)+i^2\left(\frac{1}{2}\right) \right]-\left(\frac{i}{2}\right)^2 = \frac{i^2}{2}-\frac{i^2}{4} = \frac{i^2}{4}\]

and therefore:

\[Var(W)=\sum_{i=1}^{n}Var(U_i)=\sum_{i=1}^{n}\frac{i^2}{4}=\frac{1}{4}\sum_{i=1}^{n}i^2=\frac{1}{4}\times\frac{n(n+1)(2n+1)}{6}  \]

Therefore, in summary, under the null hypothesis, we have that:

\[W'=\frac{\sum_{i=1}^{n}Z_i R_i - \frac{n(n+1)}{4}}{\sqrt{\frac{n(n+1)(2n+1)}{24}}}  \]

follows an approximate standard normal distribution as was to be proved. Laughing

Let's return to our example now to complete our work.

pygmy sunfishExample (continued)

Let Xi denote the length of a randomly selected pygmy sunfish, i = 1, 2, ... 10. If we obtain the following data set:

5.0   3.9   5.2   5.5   2.8   6.1   6.4   2.6   1.7   4.3

can we conclude that the median length of pygmy sunfish differs significantly from 3.7 centimeters?

Solution. Recall that we are interested in testing the null hypothesis H0m = 3.7 against the alternative hypothesis HAm ≠ 3.7. The last time we worked on this example, we got as far as determining that W = 40 for the given data set. Now, we just have to use what we know about the distribution of W to complete our hypothesis test. Well, in this case, with n = 10, our sample size is fairly small so we can use the exact distribution of W. The upper and lower percentiles of the Wilcoxon signed rank statistic when n = 10 are:

W table when n is 10

Therefore, our P-value is 2 × 0.116 = 0.232. Because our P-value is large, we cannot reject the null hypothesis. There is insufficient evidence at the 0.05 level to conclude that the median length of pygmy sunfish differs significantly from 3.7 centimeters.

Notes

A couple of notes are worth mentioning before we take a look at another example:

(1) Our textbook authors define \(W=\sum_{i=1}^{n}R_i\) as the sum of all of the ranks, as opposed to just the sum of the positive ranks. That is perfectly fine, but not the most typical way of defining W.

(2) W is based on the ranks of the deviations from the hypothesized median m0, not on the deviations themselves. In the above example, W = 40 even if x7 = 6.4 or 10000 (now that's a pretty strange sunfish) because its rank would be unchanged. It is in this sense that W protects against the effect of outliers.

Now for that last example.

diabetesExample 

The median age of the onset of diabetes is thought to be 45 years. The ages at onset of a random sample of 30 people with diabetes are:

35.5   44.5  39.8  33.3  51.4  51.3  30.5  48.9   42.1   40.3
46.8   38.0  40.1  36.8  39.3  65.4  42.6  42.8   59.8   52.4
26.2   60.9  45.6  27.1  47.3  36.6  55.6  45.1   52.2   43.5

Assuming the distribution of the age of the onset of diabetes is symmetric, is there evidence to conclude that the median age of the onset of diabetes differs significantly from 45 years?

Solution. We are interested in testing the null hypothesis H0m = 45 against the alternative hypothesis HAm ≠ 45. We can use Minitab's calculator and statistical functions to do the dirty work for us:

table

Then, summing the last column, we get:

data

Because we have a large sample (n = 30), we can use the normal approximation to the distribution of W. In this case, our P-value is defined as two times the probability that W ≤ 200. Therefore, using a half-unit correction for continuity, our transformed signed rank statistic is:

\[W'=\frac{200.5 - \left(\frac{30(31)}{4}\right)}{\sqrt{\frac{30(31)(61)}{24}}}=-0.6581 \]

Therefore, upon using a normal probability calculator (or table), we get that our P-value is:

\[P \approx 2 \times P(W' < -0.66)=2(0.2546) \approx 0.51 \]

Because our P-value is large, we cannot reject the null hypothesis. There is insufficient evidence at the 0.05 level to conclude that the median age of the onset of diabetes differs significantly from 45 years. 

By the way, we can even be lazier and let Minitab do all of the calculation work for us. Under the Stat menu, if we select Nonparametrics, and then 1-Sample Wilcoxon, we get:

minitab