# For A Median

### The Method

As is generally the case, let's motivate the method for calculating **a confidence interval for a population median** ** m** by way of a concrete example. Suppose

*Y*

_{1}<

*Y*

_{2}<

*Y*

_{3}<

*Y*

_{4}<

*Y*

_{5}are the order statistics of a random sample of size

*n*= 5 from a continuous distribution. Our work from the previous lesson tells us that

*Y*

_{3}serves as a good point estimator of the median

*m*. Let's see what we can come up with for a confidence interval given we have these order statistics at our disposal. Well, suppose we suggested that the interval constrained by the first and fifth order statistics, that is, (

*Y*

_{1},

*Y*

_{5}) would serve as a good interval. How confident can we be that the interval (

*Y*

_{1},

*Y*

_{5}) would contain the unknown population median

*m*? To answer that question, we simply need to calculate the following probability:

*P*(*Y*_{1 }< *m* < *Y*_{5})

Calculating the probability reduces to a simple binomial calculation once we figure out all the ways in which the population median *m* is sandwiched between *Y*_{1 }and *Y*_{5}. Well, the population median *m* is sandwiched between *Y*_{1 }and *Y*_{5}, if the first order statistic is the only order statistic less than the median *m*:

The population median *m* is sandwiched between *Y*_{1 }and *Y*_{5}, if the first two order statistics are the only order statistics less than the median *m*:

The population median *m* is sandwiched between *Y*_{1 }and *Y*_{5}, if the first three order statistics are less than the median *m*, and the fourth and fifth order statistics are greater than *m*:

And, the population median *m* is sandwiched between *Y*_{1 }and *Y*_{5}, if the fifth order statistic is the only order statistic greater than the median *m*:

This means that in order to calculate the probability *P*(*Y*_{1 }< *m* < *Y*_{5}), we need to calculate the probability of each of the above events. Now, if we let W denote the number of *X _{i}* <

*m*, then

*W*is a binomial random variable with

*n*mutually independent trials and probability of success

*p*=

*P*(

*X*<

_{i}*m*) = 0.5. And, reviewing the events as depicted above, the desired probability is calculated as:

*P*(*Y*_{1 }< *m* < *Y*_{5}) = *P*(*W* = 1) + *P*(*W* = 2) + *P*(*W* = 3) + *P*(*W* = 4)

The binomial p.m.f. (or, alternatively, the binomial table) makes the calculation straightforward:

\[P(Y_1<m<Y_5)=\sum_{k=1}^{4}P(W=k)=\sum_{k=1}^{4}\binom{5}{k}(0.5)^k(0.5)^{5-k}=0.9376 \]

So, the probability that the random interval (*Y*_{1}, *Y*_{5}) contains the median *m* is 0.9376. We aren't always so lucky with arriving at a decent confidence coefficient on our first try. Sometimes we have to try again aiming to get a confidence coefficient that it as least 90%, but as close to 95% as possible. In this case, the confidence coefficient for the interval (*Y*_{2}, *Y*_{4}) is:

\[P(Y_2<m<Y_4)=\sum_{k=2}^{3}P(W=k)=\sum_{k=2}^{3}\binom{5}{k}(0.5)^k(0.5)^{5-k}=0.6250 \]

Clearly, we would be better served to stick with the interval (*Y*_{1}, *Y*_{5}) in this case. Let's take a look at an example.

### Example

An ecology laboratory studied tree dispersion patterns for the sugar maple whose seeds are dispersed by the wind. In a 50-meter by 50-meter plot, the laboratory researchers measured distances between like trees yielding the following distances, in meters and in increasing order, for 19 sugar maple trees:

**2.10 2.35 2.35 3.10 3.10 3.15 3.90 3.90 4.00 4.805.00 5.00 5.15 5.35 5.50 6.00 6.00 6.25 6.45**

Find a reasonable confidence interval for the median.

**Solution.** Because there are *n* = 19 data points, y_{10}= 4.80 serves as a good point estimator for the population median *m*. Let's go up and down a few spots from there to consider:

(*y*_{6}, *y*_{14}) = (3.15, 5.35)

as a possible confidence interval for *m*. The confidence coefficient associated with the interval (*Y*_{6}, *Y*_{14}) is calculated using a binomial table with *n* = 19 and *p* = 0.5:

\[P(Y_6<m<Y_{14})=P(6\le W \le 13)=P(W \le 13) -P(W \le 5)=0.9682-0.0318=0.9364 \]

We can therefore be 93.64% confident that the population median falls in the interval (3.15, 5.35).

Could we do any better? Well, if we were to use the narrower interval (*y*_{7}, *y*_{13}) = (3.90, 5.15) instead, its confidence coefficient is not quite as good:

\[P(Y_7<m<Y_{13})=P(7\le W \le 12)=P(W \le 12) -P(W \le 6)=0.9165-0.0835=0.8330 \]

Or, if we were to use the wider interval (*y*_{5}, *y*_{15}) = (3.10, 5.50) instead, its confidence coefficient is perhaps a bit too high:

\[P(Y_5<m<Y_{15})=P(5\le W \le 14)=P(W \le 14) -P(W \le 4)=0.9904-0.0096=0.9808 \]

In general, we should aim to get a confidence coefficient at least 90%, but as close to 95% as possible. And, we shouldn't really "shop around" for an interval after we've collected the data. We should decide in advance which confidence interval we are going to use, and commit to use it even after the data have been collected.

### A Helpful Table

Yeehaw! The authors of your textbook did a very kind thing for us by calculating the confidence coefficients for confidence intervals for the median *m* for various sample sizes *n. *The resulting confidence coefficients are reported in the following table (or you can look in your text book if you don't want to use a magnifying glass to see this one):

The reading of the table is pretty straightforward. For example, if we have a sample of size *n* = 12, the table tells us we can be 96.14% confident that the population median falls in the interval constrained by the third and tenth order statistic, that is, in the interval (*Y*_{3}, *Y*_{10}). And, if we have a sample of size *n* = 18, the table tells us we can be 96.92% confident that the population median falls in the interval constrained by the fifth and fourteenth order statistic, that is, in the interval (*Y*_{5}, *Y*_{14}).

### Normal Approximations of the Confidence Coefficients

All of our confidence coefficient calculations have involved binomial probabilities. It stands to reason, then, that if our sample size *n* is larger than 20, say, we could use the normal approximation to the binomial distribution. In our case, *W*, the number of *X _{i}* <

*m,*follows a binomial distribution with mean and variance:

\(\mu=np=0.5n\) and \(\sigma^2=np(1-p)=0.5(1-0.5)n=0.25n\)

respectively. Therefore:

\[Z=\frac{W-0.5n}{\sqrt{0.25n}} \]

follows, at least approximately, the standard normal *N*(0,1) distribution.

### Example

A sample of 26 offshore oil workers took part in a simulated escape exercise, resulting in the following data on time (in seconds) to complete the escape:

**325 325 334 339 356 356 359 359 363364 364 366 369 370 373 373 374 375389 392 393 394 397 402 403 424**

Use the normal approximation to the binomial to find the approximate confidence coefficient associated with the (*Y*_{8}, *Y*_{18}) confidence interval for the median *m*. (The data are from the journal article "Oxygen Consumption and Ventilation During Escape from an Offshore Platform," *Ergonomics* 1997: 281-292.)

**Solution.** In this case, the mean and variance are:

\(\mu=np=0.5(26)=13\) and \(\sigma^2=np(1-p)=0.5(1-0.5)n=0.25(26)=6.5\)

respectively. Therefore, the approximate confidence coefficient for the interval (*Y*_{8}, *Y*_{18}) is:

\[P(Y_8<m<Y_{18})=P(8 \le W \le 17)=P\left(\frac{7.5-13}{\sqrt{6.5}} < Z < \frac{17.5-13}{\sqrt{6.5}} \right)\]

which can be simplified to:

\[P(Y_8<m<Y_{18})=P(-2.16 \le Z \le 1.77)=0.9616 - 0.0154 = 0.9462\]

We can be approximately 94.6% confident that the median time of all escapes is between 359 and 375 seconds.