Neyman-Pearson Lemma

As we learned from our work in the previous lesson, whenver we perform a hypothesis test, we should make sure that the test we are conducting has sufficient power to detect a meaningful difference from the null hypothesis. That said, how can we be sure that the T-test for a mean μ is the "most powerful" test we could use? Is there instead a K-test or a V-test or you-name-the-letter-of-the-alphabet-test that would provide us with more power? A very important result, known as the Neyman Pearson Lemma, will reassure us that each of the tests we learned in Section 7 is the most powerful test for testing statistical hypotheses about the parameter under the assumed probability distribution. Before we can present the lemma, however, we need to (1) define some notation, (2) learn the distinction between simple and composite hypotheses, and (3) define what it means to have a best critical region of size α. First, the notation.

 Notation. If X1, X2, ..., Xn is a random sample of size n from a distribution with probability density (or mass) function f(x;θ), then the joint probability density (or mass) function of  X1, X2, ..., Xn is denoted by the likelihood function L(θ). That is, the joint p.d.f. or p.m.f. is: $$L(\theta) =L(\theta; x_1, x_2, ... , x_n) = f(x_1;\theta) \times f(x_2;\theta) \times ... \times f(x_n;\theta)$$  Note that for the sake of ease, we drop the reference to the sample x1, x2, ..., xn in using L(θ) as the notation for the likelihood function. We'll want to keep in mind though that the likelihood L(θ) still depends on the sample data.

Now, the definition of simple and composite hypotheses.

 Definition. If a random sample is taken from a distribution with parameter θ, a hypothesis is said to be a simple hypothesis if the hypothesis uniquely specifies the distribution of the population from which the sample is taken. Any hypothesis that is not a simple hypothesis is called a composite hypothesis.

Example

Suppose X1X2, ..., Xn is a random sample from an exponential distribution with parameter θ. Is the hypothesis H: θ = 3 a simple or a composite hypothesis?

Solution. The p.d.f. of an exponential random variable is:

$f(x) = \frac{1}{\theta}e^{-x/\theta}$

for x ≥ 0. Under the hypothesis Hθ = 3, the p.d.f. of an exponential random variable is:

$f(x) = \frac{1}{3}e^{-x/3}$

for x ≥ 0. Because we can uniquely specify the p.d.f. under the hypothesis Hθ = 3, the hypothesis is a simple hypothesis.

Example

Suppose X1X2, ..., Xn is a random sample from an exponential distribution with parameter θ. Is the hypothesis Hθ > 2 a simple or a composite hypothesis?

Solution. Again, the p.d.f. of an exponential random variable is:

$f(x) = \frac{1}{\theta}e^{-x/\theta}$

for x ≥ 0. Under the hypothesis Hθ > 2, the p.d.f. of an exponential random variable could be:

$f(x) = \frac{1}{3}e^{-x/3}$

for x ≥ 0. Or, the p.d.f. could be:

$f(x) = \frac{1}{22}e^{-x/22}$

for x ≥ 0. The p.d.f. could, in fact, be any of an infinite number of possible exponential probability density functions. Because the p.d.f. is not uniquely specified under the hypothesis Hθ > 2, the hypothesis is a composite hypothesis.

Example

Suppose X1X2, ..., Xn is a random sample from a normal distribution with mean μ and unknown variance σ2. Is the hypothesis Hμ = 12 a simple or a composite hypothesis?

Solution. The p.d.f. of a normal random variable is:

$f(x)= \frac{1}{\sigma\sqrt{2\pi}} exp \left[-\frac{(x-\mu)^2}{2\sigma^2} \right]$

for −∞ < x < ∞, −∞ < μ < ∞, and σ > 0. Under the hypothesis Hμ = 12, the p.d.f. of a normal random variable is:

$f(x)= \frac{1}{\sigma\sqrt{2\pi}} exp \left[-\frac{(x-12)^2}{2\sigma^2} \right]$

for −∞ < x < ∞ and σ > 0. In this case, the mean parameter μ = 12 is uniquely specified in the p.d.f., but the variance σ2 is not. Therefore, the hypothesis Hμ = 12 is a composite hypothesis.

And, finally, the definition of a best critical region of size α.

 Definition. Consider the test of the simple null hypothesis H0: θ = θ0 against the simple alternative hypothesis HA: θ = θa. Let C and D be critical regions of size α, that is, let: $$\alpha = P(C;\theta_0)$$  and   $$\alpha = P(D;\theta_0)$$ Then, C is a best critical region of size α if the power of the test at θ = θa is the largest among all possible hypothesis tests. More formally, C is the best critical region of size α if, for every other critical region D of size α, we have: $$P(C;\theta_\alpha) \ge P(D;\theta_\alpha)$$ that is, C is the best critical region of size α if the power of C is at least as great as the power of every other critical region D of size α. We say that C is the most powerful size α test.

Now that we have clearly defined what we mean for a critical region C to be "best," we're ready to turn to the Neyman Pearson Lemma to learn what form a hypothesis test must take in order for it to be the best, that is, to be the most powerful test.

 The Neyman Pearson Lemma. Suppose we have a random sample X1, X2, ..., Xn from a probability distribution with parameter θ. Then, if C is a critical region of size α and k is a constant such that: $$\frac{L(\theta_0)}{L(\theta_\alpha)} \le k$$ inside the critical region C and: $$\frac{L(\theta_0)}{L(\theta_\alpha)} \ge k$$ outside the critical region C then C is the best, that is, most powerful, critical region for testing the simple null hypothesis H0: θ = θ0 against the simple alternative hypothesis HA: θ = θa.

Proof. See Hogg and Tanis, pages 400-401 (8th edition pages 513-14).

Well, okay, so perhaps the proof isn't all that particularly enlightening, but perhaps if we take a look at a simple example, we'll become more enlightened. Suppose X is a single observation (that's one data point!) from a normal population with unknown mean μ and known standard deviation σ = 1/3. Then, we can apply the Nehman Pearson Lemma when testing the simple null hypothesis H0μ = 3 against the simple alternative hypothesis HAμ = 4. The lemma tells us that, in order to be the most powerful test, the ratio of the likelihoods:

$\frac{L(\mu_0)}{L(\mu_\alpha)} = \frac{L(3)}{L(4)}$

should be small for sample points X inside the critical region ("less than or equal to some constant k") and large for sample points X outside of the critical region ("greater than or equal to some constant k"). In this case, because we are dealing with just one observation X, the ratio of the likelihoods equals the ratio of the normal probability curves:

$\frac{L(3)}{L(4)}= \frac{f(x; 3, 1/9)}{f(x; 4, 1/9)}$

Then, the following drawing summarizes the situation:

In short, it makes intuitive sense that we would want to reject H0μ = 3 in favor of HAμ = 4 if our observed x is large, that is, if our observed x falls in the critical region C. Well, as the drawing illustrates, it is those large X values in C for which the ratio of the likelihoods is small; and, it is for the small X values not in C for which the ratio of the likelihoods is large. Just as the Neyman Pearson Lemma suggests!

Well, okay, that's the intuition behind the Neyman Pearson Lemma. Now, let's take a look at a few examples of the lemma in action.

Example

Suppose X is a single observation (again, one data point!) from a population with probabilitiy density function given by:

$$f(x) = \theta x^{\theta -1}$$

for 0 < x < 1. Find the test with the best critical region, that is, find the most powerful test, with significance level α = 0.05, for testing the simple null hypothesis H0: θ = 3 against the simple alternative hypothesis HAθ = 2.

Solution. Because both the null and alternative hypotheses are simple hypotheses, we can apply the Neyman Pearson Lemma in an attempt to find the most powerful test. The lemma tells us that the ratio of the likelihoods under the null and alternative must be less than some constant k. Again, because we are dealing with just one observation X, the ratio of the likelihoods equals the ratio of the probability density functions, giving us:

$\frac{L(\theta_0)}{L(\theta_\alpha)}= \frac{3x^{3-1}}{2x^{2-1}}= \frac{3}{2}x \le k$

That is, the lemma tells us that the form of the rejection region for the most powerful test is:

$\frac{3}{2}x \le k$

or alternatively, since (2/3)k is just a new constant k*, the rejection region for the most powerful test is of the form:

$x < \frac{2}{3}k = k^*$

Now, it's just a matter of finding k*, and our work is done. We want α = P(Type I Error) = P(rejecting the null hypothesis when the null hypothesis is true) to equal 0.05. In order for that to happen, the following must hold:

$\alpha = P( X < k^* \text{ when } \theta = 3) = \int_{0}^{k^*} 3x^2dx = 0.05$

Doing the integration, we get:

$\left[ x^3\right]^{x=k^*}_{x=0} = (k^*)^3 =0.05$

And, solving for k*, we get:

$k^* =(0.05)^{1/3} = 0.368$

That is, the Neyman Pearson Lemma tells us that the rejection region of the most powerful test for testing H0θ = 3 against HAθ = 2, under the assumed probability distribution, is:

$x < 0.368$

That is, among all of the possible tests for testing H0θ = 3 against HAθ = 2, based on a single observation X and with a significance level of 0.05, this test has the largest possible value for the power under the alternative hypohthesis, that is, when θ = 2.

Example

Suppose X1X2, ..., Xn is a random sample from a normal population with mean μ and variance 16. Find the test with the best critical region, that is, find the most powerful test, with a sample size of n = 16 and a significance level α = 0.05 to test the simple null hypothesis H0μ = 10 against the simple alternative hypothesis HAμ = 15.

Solution. Because the variance is specified, both the null and alternative hypotheses are simple hypotheses. Therefore, we can apply the Neyman Pearson Lemma in an attempt to find the most powerful test. The lemma tells us that the ratio of the likelihoods under the null and alternative must be less than some constant k:

$\frac{L(10)}{L(15)}= \frac{(32\pi)^{-16/2} exp \left[ -(1/32)\sum_{i=1}^{16}(x_i -10)^2 \right]}{(32\pi)^{-16/2} exp \left[ -(1/32)\sum_{i=1}^{16}(x_i -15)^2 \right]} \le k$

Simplifying, we get:

$exp \left[ - \left( \frac{1}{32} \right) \left( \sum_{i=1}^{16}(x_i -10)^2 - \sum_{i=1}^{16}(x_i -15)^2 \right) \right] \le k$

And, simplifying yet more, we get:

Now, taking the natural logarithm of both sides of the inequality, collecting like terms, and multiplying through by 32, we get:

$-10\Sigma x_i +2000 \le 32ln(k)$

And, moving the constant term on the left-side of the inequality to the right-side, and dividing through by −160, we get:

$\frac{1}{16}\Sigma x_i \ge -\frac{1}{160}(32ln(k)-2000)$

That is, the Neyman Pearson Lemma tells us that the rejection region for the most powerful test for testing H0μ = 10 against HAμ = 15, under the normal probability model, is of the form:

$\bar{x} \ge k^*$

where k* is selected so that the size of the critical region is α = 0.05. That's simple enough, as it just involves a normal probabilty calculation! Under the null hypothesis, the sample mean is normally distributed with mean 10 and standard deviation 4/4 = 1. Therefore, the critical value kis deemed to be 11.645:

That is, the Neyman Pearson Lemma tells us that the rejection region for the most powerful test for testing H0μ = 10 against HAμ = 15, under the normal probability model, is:

$\bar{x} \ge 11.645$

The power of such a test when μ = 15 is:

$P(\bar{X} > 11.645 \text{ when } \mu = 15) = P \left( Z > \frac{11.645-15}{\sqrt{16} / \sqrt{16} } \right) = P(Z > -3.36) = 0.9996$

The power can't get much better than that, and the Neyman Pearson Lemma tells us that we shouldn't expect it to get better! That is, the Lemma tells us that there is no other test out there that will give us greater power for testing H0μ = 10 against HAμ = 15.