Estimating a Proportion for a Small, Finite Population

Printer-friendly versionPrinter-friendly version

The methods of the last page, in which we derived a formula for the sample size necessary for estimating a population proportion p work just fine when the population in question is very large. When we have smaller, finite populations, however, such as the students in a high school or the residents of a small town, the formula we derived previously requires a slight modification. Let's start, as usual, by taking a look at an example.

rural india woman

Example

A researcher is studying the population of a small town in India of N = 2000 people.  She's interested in estimating p for several yes/no questions on a survey.

How many people n does she have to randomly sample (without replacement) to ensure that her estimates \(\hat{p}\) are within ε = 0.04 of the true proportions p?

Solution. We can't even begin to address the answer to this question until we derive a confidence interval for a proportion for a small, finite population! Surprised

Theorem. An approximate (1−α)100% confidence interval for a proportion p of a small population is:

\(\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n} \cdot \dfrac{N-n}{N-1}}\)

Proof. We'll use the example above, where possible, to make the proof more concrete. Suppose we take a random sample, X1, X2, ..., Xn, without replacement, of size n from a population of size N. In the case of the example, N = 2000. Suppose also, unknown to us, that for a particular survey question there are N1 respondents who would respond "yes" to the question, and therefore NN1 respondents who would respond "no." That is, our small finite population looks like this:

box

If that's the case, the true proportion (but unknown to us) of yes respondents is:

\(p=P(Yes)=\dfrac{N_1}{N}\)

while the true proportion (but unknown to us) of no respondents is:

\(1-p=P(No)=1-\dfrac{N_1}{N}=\dfrac{N-N_1}{N}\)

Now, let X denote the number of respondents in the sample who say yes, so that:

\(X=\sum\limits_{i=1}^n X_i\)

if Xi = 1 if respondent i answers yes, and Xi = 0 if respondent i answers no.  Then, the proportion in the sample who say yes is:

\(\hat{p}=\dfrac{\sum\limits_{i=1}^n X_i}{n}\)

Then, \(X=\sum\limits_{i=1}^n X_i\) is a hypergeometric random variable with mean:

\(E(X)=n\dfrac{N_1}{N}=np\)

and variance: $$Var(X)=n{N_1\over N}\left(1-{N_1\over N}\right) \left({N-n\over N-1}\right)=np(1-p)\left({N-n\over N-1}\right)$$

It follows that \(\hat{p}=X/n\) has mean \(E(\hat{p})=p\) and variance:

\(Var(\hat{p})=\dfrac{p(1-p)}{n}\left(\dfrac{N-n}{N-1}\right)\)

Then, the Central Limit Theorem tells us that:

\(\dfrac{\hat{p}-p}{\sqrt{\dfrac{p(1-p)}{n} \left(\dfrac{N-n}{N-1}\right) }}\)

follows an approximate standard normal distribution. Now, it's just a matter of doing the typical confidence interval derivation, in which we start with a probability statement, manipulate the quantity inside the parentheses, and substitute sample estimates where necessary. We've done that a number of times now, so skipping all of the details here, we get that an approximate (1−α)100% confidence interval for p of a small population is:

\(\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n} \cdot \dfrac{N-n}{N-1}}\)

By the way, it is worthwhile noting that if the sample n is much smaller than the population size N, that is, if n << N, then:

\(\dfrac{N-n}{N-1}\approx 1\)

and the confidence interval for p of a small population becomes quite similar to the confidence interval for p of a large population:

\(\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)

india villageExample (continued)

A researcher is studying the population of a small town in India of N = 2000 people.  She's interested in estimating for several yes/no questions on a survey.

How many people n does she have to randomly sample (wihtout replacement) to ensure that her estimates \(\hat{p}\) are within ε = 0.04 of the true proportion p?

Solution. Now that we know the correct formula for the confidence interval for p of a small population, we can follow the same procedure we did for determining the sample size for estimating a proportion p of a large population.  The researcher's goal is to estimate p so that the error is no larger than 0.04. That is, the goal is to calculate a 95% confidence interval such that:

\(\hat{p}\pm \epsilon=\hat{p}\pm 0.04\)

Now, we know the formula for an approximate (1−α)100% confidence interval for a proportion p of a small population is:

\(\hat{p}\pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n} \cdot \dfrac{N-n}{N-1}}\)

So, again, we should proceed by equating the terms appearing after each of the above ± signs, and solving for n. That is, equate:

\(\epsilon=z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}\cdot \dfrac{N-n}{N-1}}\)

and solve for n. Doing the algebra yields:

\(n=\dfrac{z^2_{\alpha/2}\hat{p}(1-\hat{p})/\epsilon^2}{\dfrac{N-1}{N}+\dfrac{z^2_{\alpha/2}\hat{p}(1-\hat{p})}{N\epsilon^2}}\)

That looks simply dreadful! Let's make it look a little more friendly to the eyes:

\(n=\dfrac{m}{1+\dfrac{m-1}{N}}\)

where m is defined as the sample size necessary for estimating the proportion p for a large population, that is, when a correction for the population being small and finite is not made. That is:

\(m=\dfrac{z^2_{\alpha/2}\hat{p}(1-\hat{p})}{\epsilon^2}\)

Now, before we make the calculation for our particular example, let's take a step back and summarize what we have just learned.

Definition. The sample size necessary for estimating a population proportion p of a small finite population with (1−α)100% confidence and error no larger than ε is:

\(n=\dfrac{m}{1+\dfrac{m-1}{N}}\)

where:

\(m=\dfrac{z^2_{\alpha/2}\hat{p}(1-\hat{p})}{\epsilon^2}\)

is the sample size necessary for estimating the proportion p for a large population.

 

villageExample (continued)

A researcher is studying the population of a small town in India of N = 2000 people.  She's interested in estimating for several yes/no questions on a survey.

How many people n does she have to randomly sample (wihtout replacement) to ensure that her estimates \(\hat{p}\) are within ε = 0.04 of the true proportion p?

Solution. Okay, once and for all, let's calculate this very patient researcher's sample size!  Because the researcher has many different questions on the survey, it would behoove her to use a sample proportion of 0.50 in her calculations. If the maximum error ε is 0.04, the sample proportion is 0.5, and the researcher doesn't make the finite population correction, then she needs:

\(m=\dfrac{(1.96^2)(\frac{1}{4})}{0.04^2}=600.25\)

or 601 people to estimate p with 95% confidence. But, upon making the correction for the small, finite population, we see that the researcher really only needs:

\(n=\dfrac{m}{1+\dfrac{m-1}{N}}=\dfrac{601}{1+\dfrac{601-1}{2000}}=462.3\)

or 463 people to estimate p with 95% confidence.

Effect of Population Size N

The following table illustrates how the sample size n that is necessary for estimating a population proportion p (with 95% confidence) is affected by the size of the population N. If \(\hat{p}=0.5\), then the sample size n is:

table

This table suggests, perhaps not surprisingly, that as the size of the population N decreases, so does the necessary size n of the sample.