Lesson 10: The Binomial Distribution

Introduction

In this lesson, and some of the lessons that follow in this section, we'll be looking at specially named discrete probability mass functions, such as the geometric distribution, the hypergeometric distribution, and the poisson distribution. As you can probably gather by the name of this lesson, we'll be exploring the well-known binomial distribution in this lesson.

The basic idea behind this lesson, and the ones that follow, is that when certain conditions are met, we can derive a general formula for the probability mass function of a discrete random variable X. We can then use that formula to calculate probabilities concerning X rather than resorting to first principles. Sometimes the probability calculations can be tedious. In those cases, we might want to take advantage of cumulative probability tables that others have created. We'll do exactly that for the binomial distribution. We'll also derive formulas for the mean, variance, and standard deviation of a binomial random variable.

Objectives

The Probability Mass Function

Example

Beaver StadiumWe previously looked at an example in which three fans were randomly selected at a football game in which Penn State is playing Notre Dame. Each fan was identified as either a Penn State fan (P) or a Notre Dame fan (N), yielding the following sample space:

S = {PPP, PPN, PNP, NPP, NNP, NPN, PNN, NNN}

We let X = the number of Penn State fans selected. The possible values of X were, therefore, either 0, 1, 2, or 3. Now, we could find probabilities of individual events, P(PPP) or P(PPN), for example. Alternatively, we could find P(X = x), the probability that X takes on a particular value x. Let's do that (again)! This time though we will be less interested in obtaining the actual probabilities as we will be in looking for a pattern in our calculations, so that we can derive a formula for calculating similar probabilities.

Solution. Since the game is a home game, let's again suppose that 80% of the fans attending the game are Penn State fans, while 20% are Notre Dame fans. That is, P(P) = 0.8 and P(N) = 0.2. Then, by independence:

P(X = 0) = P(NNN) = 0.2 × 0.2 × 0.2 = 1 × (0.8)0 × (0.2)3

And, by independence and mutual exclusivity of NNP, NPN, and PNN:

P(X = 1) = P(NNP) + P(NPN) + P(PNN) = 3 × 0.8 × 0.2 × 0.2 = 3 × (0.8)1 × (0.2)2

Likewise, by independence and mutual exclusivity of PPN, PNP, and NPP:

P(X = 2) = P(PPN) + P(PNP) + P(NPP) = 3 × 0.8 × 0.8 × 0.2 = 3 × (0.8)2 × (0.2)1

Finally, by independence:

P(X = 3) = P(PPP) = 0.8 × 0.8 × 0.8 = 1 × (0.8)3 × (0.2)0

Do you see a pattern in our calculations? It seems that, in each case, we multiply the number of ways of obtaining x Penn State fans first by the probability of x Penn State fans (0.8x) and then by the probability of 3 − x Nebraska fans (0.23−x).

This example lends itself to the creation of a general formula for the probability mass function of a binomial random variable X.

Definition. The probability mass function of a binomial random variable X is:

\(f(x)=\dbinom{n}{x} p^x (1-p)^{n-x}\)

We denote the binomial distribution as b(n, p). That is, we say:

X ~ b(n, p)

where the tilde (~) is read "as distributed as," and n and p are called parameters of the distribution.

Let's verify that the given p.m.f. is a valid one!

Now that we know the formula for the probability mass function of a binomial random variable, we better spend some time making sure we can recognize when we actually have one!

Is X Binomial?

Definition. A discrete random variable X is a binomial random variable if:

  1. An experiment, or trial, is performed in exactly the same way n times.
  2. Each of the n trials has only two possible outcomes.  One of the outcomes is called a "success," while the other is called a "failure." Such a trial is called a Bernoulli trial.
  3. The n trials are independent.
  4. The probability of success, denoted p, is the same for each trial. The probability of failure is q = 1 − p.
  5. The random variable X = the number of successes in the n trials.

gold coinExample

A coin is weighted in such a way so that there is a 70% chance of getting a head on any particular toss. Toss the coin, in exactly the same way, 100 times. Let X equal the number of heads tossed. Is X a binomial random variable?

Solution. Yes, X is a binomial random variable, because:

  1. The coin is tossed in exactly the same way 100 times.
  2. Each toss results in either a head (success) or a tail (failure).
  3. One toss doesn't affect the outcome of another toss.  The trials are independent.
  4. The probability of getting a head is 0.70 for each toss of the coin.
  5. X equals the number of heads (successes).

Example

A college administrator randomly samples students until he finds four that have volunteered to work for a local organization. Let X equal the number of students sampled. Is X a binomial random variable?

Solution. No, X is not a binomial random variable, because the number of trials n was not fixed in advance, and X does not equal the number of volunteers in the sample.

yarn skeinsExample

A Quality Control Inspector (QCI) investigates a lot containing 15 skeins of yarn. The QCI randomly samples (without replacement) 5 skeins of yarn from the lot. Let X equal the number of skeins with acceptable color. Is X a binomial random variable?

Solution. No, X is not a binomial random variable, because p, the probability that a randomly selected skein has acceptable color changes from trial to trial. For example, suppose, unknown to the QCI, that 9 of the 15 skeins of yarn in the lot are acceptable. For the first trial, p equals 9/15. However, for the second trial, p equals either 9/14 or 8/14 depending on whether an acceptable or unacceptable skein was selected in the first trial. Rather than being a binomial random variable, X is a hypergeometric random variable. If we continue to assume that 9 of the 15 skeins of yarn in the lot are acceptable, then X has the following probability mass function:

 \(f(x)=P(X=x)=\dfrac{\dbinom{9}{x} \dbinom{6}{5-x}}{\dbinom{15}{5}}\) for x = 0, 1, 2,..., 5

sport utility vehicleExample

A Gallup Poll of n = 1000 random adult Americans is conducted. Let X equal the number in the sample who own a sport utility vehicle (SUV). Is X a binomial random variable?

Solution. No, X is technically a hypergeometric random variable, not a binomial random variable, because, just as in the previous example, sampling takes place without replacement. Therefore, p, the probability of selecting an SUV owner, has the potential to change from trial to trial. To make this point concrete, suppose that Americans own a total of N = 270,000,000 cars. Suppose too that half (135,000,000) of the cars are SUVs, while the other half (135,000,000) are not. Then, on the first trial, p equals ½ (from 135,000,000 divided by 270,000,000). Suppose an SUV owner was selected on the first trial. Then, on the second trial, p equals 134,999,999 divided by 269,999,999, which equals.... punching into a calculator... 0.499999... Hmmmmm!  Isn't that 0.499999... close enough to ½ to just call it ½? Yes...that's what we do!

In general, when the sample size n is small in relation to the population size N, we assume a random variable X, whose value is determined by sampling without replacement, follows (approximately) a binomial distribution. On the other hand, if the sample size n is close to the population size N, then we assume the random variable X follows a hypergeometric distribution.

Cumulative Binomial Probabilities

medical symbolExample

By some estimates, twenty-percent (20%) of Americans have no health insurance. Randomly sample n = 15 Americans. Let X denote the number in the sample with no health insurance. What is the probability that exactly 3 of the 15 sampled have no health insurance?

Solution. Since n = 15 is small relative to the population of N = 300,000,000 Americans, and all of the other criteria pass muster (two possible outcomes, independent trials, ....), the random variable X can be assumed to follow a binomial distribution with n = 15 and p = 0.20. Using the probability mass function for a binomial random variable, the calculation is then relatively straightforward:

\(P(X=3)=\dbinom{15}{3}(0.20)^3 (0.80)^{12}=0.25\)      

That is, there is a 25% chance, in sampling 15 random Americans, that we would find exactly 3 that had no health insurance.

What is the probability that at most one of those sampled has no health insurance?

Solution. "At most one" means either 0 or 1 of those sampled have no health insurance. That is, we need to find:

P(≤ 1) = P(X = 0) + P(X = 1) 

Using the probability mass function for a binomial random variable with n = 15 and p = 0.20, we have:

\(P(X \leq 1)=\dbinom{15}{0}(0.2)^0 (0.8)^{15}+ \dbinom{15}{1}(0.2)^1(0.8)^{14}=0.0352+0.1319=0.167\)

That is, we have a 16.7% chance, in sampling 15 random Americans, that we would find at most one that had no health insurance.

What is the probability that more than seven have no health insurance?

Solution. Yikes! "More than seven" in the sample means 8, 9, 10, 11, 12, 13, 14, 15. As the following picture illustrates, there are two ways that we can calculate P(X > 7):

Probability X is more than 7

We could calculate P(X > 7) by adding up P(X = 8), P(X = 9), up to P(X = 15). Alternatively, we could calculate P(X > 7) by finding P(X ≤ 7) and subtracting it from 1. But to find P(X ≤ 7), we'd still have to add up P(X = 0), P(X = 1), up to P(X = 7). Either way, it becomes readily apparent that answering this question is going to involve more work than the previous two questions. It would clearly be helpful if we had an alternative to using the binomial p.m.f. to calculate binomial probabilities. The alternative typically used involves cumulative binomial probabilities.

An Aside On Cumulative Probability Distributions

Definition. The function:

F(x) = P(Xx)

is called a cumulative probability distribution. For a discrete random variable X, the cumulative probability distribution F(x) is determined by:

\(F(x)=\sum\limits_{m=0}^x f(m)=f(0)+f(1)+\cdots+f(x)\)

You'll first want to note that the probability mass function, f(x), of a discrete random variable X is distinguished from the cumulative probability distribution, F(x), of a discrete random variable X by the use of a lowercase f and an uppercase F. That is, the notation f(3) means P(= 3), while the notation F(3) means P(X ≤ 3).

Now the standard procedure is to report probabilities for a particular distribution as cumulative probabilities, whether in statistical software such as Minitab, a TI-80-something calculator, or in a table like Table II in the back of your textbook. If you take a look at the table, you'll see that it goes on for five pages.  Let's just take a look at the top of the first page of the table in order to get a feel for how the table works:

In summary, to use the table in the back of your textbook, as well as that found in the back of most probability textbooks, to find cumulative binomial probabilities, do the following:

  1. Find n, the number in the sample, in the first column on the left.
  2. Find the column containing p, the probability of success.
  3. Find the x in the second column on the left for which you want to find F(x) = P(Xx).

Let's try it out on our health insurance example.

medical symbolExample

Again, by some estimates, twenty-percent (20%) of Americans have no health insurance. Randomly sample n = 15 Americans. Let X denote the number in the sample with no health insurance. Use the cumulative binomial probability table in the back of your book to find the probability that at most 1 of the 15 sampled has no health insurance.

Solution. The probability that at most 1 has no health insurance can be written as P(X ≤ 1). To find P(X ≤ 1) using the binomial table, we: 

  1. Find = 15 in the first column on the left.
  2. Find the column containing p = 0.20.
  3. Find the 1 in the second column on the left, since we want to find F(1) = P(X ≤ 1).

Now, all we need to do is read the probability value where the p = 0.20 column and the (n = 15, x = 1) row intersect. What do you get?

Binomial Table with n = 15 p = 0.20

Do you need a hint?

We've used the cumulative binomial probability table to determine that the probability that at most 1 of the 15 sampled has no health insurance is 0.1671. For kicks, since it wouldn't take a lot of work in this case, you might want to verify that you'd get the same answer using the binomial p.m.f.

What is the probability that more than 7 have no health insurance?

Solution. As we determined previously, we can calculate P(X > 7) by finding P(X ≤ 7) and subtracting it from 1:

More than 7

The good news is that the cumulative binomial probability table makes it easy to determine P(X ≤ 7). To find P(X ≤ 7) using the binomial table, we: 

  1. Find = 15 in the first column on the left.
  2. Find the column containing = 0.20.
  3. Find the 7 in the second column on the left, since we want to find F(7) = P(X ≤ 7).

Now, all we need to do is read the probability value where the p = 0.20 column and the (n = 15, x = 7) row intersect. What do you get?

Binomial Table with n = 15 p = 0.20

Do you need a hint?

The cumulative binomial probability table tells us that P(X ≤ 7) = 0.9958. Therefore:

P(X > 7) = 1 − 0.9958 = 0.0042

That is, the probability that more than 7 in a random sample of 15 would have no health insurance is 0.0042.

What is the probability that exactly 3 have no health insurance?

Solution. We can calculate P(X = 3) by finding P(X ≤ 2) and subtracting it from P(X ≤ 3), as illustrated here:

Probability exactly 3

 To find P(X ≤ 2) and P(X ≤ 3) using the binomial table, we: 

  1. Find = 15 in the first column on the left.
  2. Find the column containing = 0.20.
  3. Find the 3 in the second column on the left, since we want to find F(3) = P(X ≤ 3). And, find th2 in the second column on the left, since we want to find F(2) = P(X ≤ 2).

Now, all we need to do is (1) read the probability value where the p = 0.20 column and the (n = 15, x = 3) row intersect, and (2) read the probability value where the p = 0.20 column and the (n = 15, x = 2) row intersect. What do you get?

Binomial Table with n = 15 p = 0.20

Do you need a hint?

 The cumulative binomial probability table tells us that finding P(X ≤ 3) = 0.6482 and P(X ≤ 2) = 0.3980. Therefore:

P(X = 3) = P(X ≤ 3) − P(X ≤ 2) = 0.6482 − 0.3980 = 0.2502

That is, there is about a 25% chance that exactly 3 people in a random sample of 15 would have no health insurance. Again, for kicks, since it wouldn't take a lot of work in this case, you might want to verify that you'd get the same answer using the binomial p.m.f.

What is the probability that at least 1 has no health insurance?

Solution. We can calculate P(X ≥ 1) by finding P(X ≤ 0) and subtracting it from 1, as illustrated here:

Probability at least 1

 To find P(X ≤ 0) using the binomial table, we: 

  1. Find = 15 in the first column on the left.
  2. Find the column containing = 0.20.
  3. Find the 0 in the second column on the left, since we want to find F(0) = P(X ≤ 0).

Now, all we need to do is read the probability value where the p = 0.20 column and the (n = 15, x = 0) row intersect. What do you get?

Binomial Table with n = 15 p = 0.20

Do you need a hint?

The cumulative binomial probability table tells us that P(X ≤ 0) = 0.0352. Therefore:

P(X ≥ 1) = 1 − 0.0352 = 0.9648

That is, the probability that at least one person in a random sample of 15 would have no health insurance is 0.9648.

What is the probability that fewer than 5 have no health insurance?

Solution. "Fewer than 5" means 0, 1, 2, 3, or 4. That is, P(X < 5) P(X ≤ 4), and P(X ≤ 4) can be readily found using the cumulative binomial table. To find P(X ≤ 4), we: 

  1. Find = 15 in the first column on the left.
  2. Find the column containing = 0.20.
  3. Find the 4 in the second column on the left, since we want to find F(4) = P(X ≤ 4).

Now, all we need to do is read the probability value where the p = 0.20 column and the (n = 15, x = 4) row intersect. What do you get?

Binomial Table with n = 15 p = 0.20

Do you need a hint?

The cumulative binomial probability table tells us that P(X ≤ 4) = 0.8358. That is, the probability that fewer than 5 people in a random sample of 15 would have no health insurance is 0.8358.

We have now taken a look at an example involving all of the possible scenarios... at most x, more than x, exactly x, at least x, and fewer than x... of the kinds of binomial probabilities that you might need to find. Oops! Surprised Have you noticed that p, the probability of success, in the binomial table in the back of the book only goes up to 0.50. What happens if your p equals 0.60 or 0.70? All you need to do in that case is turn the problem on its head! For example, suppose you have n = 10 and p = 0.60, and you are looking for the probability of at most 3 successes. Just change the definition of a success into a failure, and vice versa! That is, finding the probability of at most 3 successes is equivalent to 3 or more failures with the probability of a failure being 0.40. Shall we make this more concrete by looking at a specific example?

electric meterExample

Many utility companies promote energy conservation by offering discount rates to consumers who keep their energy usage below certain established subsidy standards. A recent EPA report notes that 70% of the island residents of Puerto Rico have reduced their electricity usage sufficiently to qualify for discounted rates. If ten residential subscribers are randomly selected from San Juan, Puerto Rico, what is the probability that at least four qualify for the favorable rates?

Solution. If we let X denote the number of subscribers who qualify for favorable rates, then X is a binomial random variable with n = 10 and p = 0.70. And, if we let Y denote the number of subscribers who don't qualify for favorable rates, then Y, which equals 10 − X, is a binomial random variable with n = 10 and q = 1 − p = 0.30. We are interested in finding P(X ≥ 4). We can't use the cumulative binomial tables, because they only go up to p = 0.50. The good news is that we can rewrite P(X ≥ 4) as a probability statement in terms of Y:

P(X ≥ 4) = P(−X ≤ −4) = P(10 − X ≤ 10 − 4) = P(Y ≤ 6)

Now it's just a matter of looking up the probability in the right place on our cumulative binomial table. To find P(Y ≤ 6), we: 

  1. Find = 10 in the first column on the left.
  2. Find the column containing = 0.30.
  3. Find the 6 in the second column on the left, since we want to find F(6) = P(Y ≤ 6).

Now, all we need to do is read the probability value where the p = 0.30 column and the (n = 10, y = 6) row intersect. What do you get?

Binomial table with n = 10 and p = 0.30

Do you need a hint?

The cumulative binomial probability table tells us that P(Y ≤ 6) P(X ≥ 4) = 0.9894. That is, the probability that at least four people in a random sample of ten would qualify for favorable rates is 0.9894.

If you are in need of calculating binomial probabilities for more specific probabilities of success (p), such as 0.37 or 0.61, you can use statistical software, such as Minitab, to determine the cumulative binomial probabilities. You can then still use the methods illustrated here on this page to find the specific probabilities (more than x, fewer than x, ...) that you need.

Effect of n and p on Shape

Other than briefly looking at the picture of the histogram at the top of the cumulative binomial probability table in the back of your book, we haven't spent much time thinking about what a binomial distribution actually looks like. Well, let's do that now!  The bottom-line take-home message is going to be that the shape of the binomial distribution is directly related, and not surprisingly, to two things:

  1. n, the number of independent trials
  2. p, the probability of success

For small p and small n, the binomial distribution is what we call skewed right. That is, the bulk of the probability falls in the smaller numbers 0, 1, 2,..., and the distribution tails off to the right. For example, here's a picture of the binomial distribution when n = 15 and p = 0.2:

graph

For large p and small n, the binomial distribution is what we call skewed left. That is, the bulk of the probability falls in the larger numbers n, n−1, n−2,...  and the distribution tails off to the left. For example, here's a picture of the binomial distribution when n = 15 and p = 0.8:

graph

For p = 0.5 and large and small n, the binomial distribution is what we call symmetric. That is, the distribution is without skewness.  For example, here's a picture of the binomial distribution when n = 15 and p = 0.5: 

graph

For small p and large n, the binomial distribution approaches symmetry. For example, if p = 0.2 and n is small, we'd expect the binomial distribution to be skewed to the right. For large n, however, the distribution is nearly symmetric. For example, here's a picture of the binomial distribution when n = 40 and p = 0.2:

graph

You might find it educational to play around yourself with various values of the n and p parameters to see their effect on the shape of the binomial distribution.

Interactivity

In order to participate in this interactivity, you will need to make sure that you have already downloaded a free version of the Mathematica player:

Download Mathematica Icon

Once you've downloaded Mathematica, you then need to download this demo (by clicking on Download Live Version). Once, you've downloaded and opened the demo in Mathematica, you can participate in the interactivity:

  1. First, use the sliders (or the plus signs +) to set n = 5 and p = 0.2. Notice that the binomial distribution is skewed to the right.
  2. Then, as you move the sample size slider to the right in order to increase n, notice that the distribution moves from being skewed to the right to approaching symmetry.
  3. Now, set p = 0.5. Then, as you move the sample size slider in either direction, notice that regardless of the value of n, the binomial distribution is symmetric.
  4. Then, do whatever you want with the sliders until you thing you fully understand the effect of n and p on the shape of the binomial distribution. Smile

The Mean and Variance

Theorem. If X is a binomial random variable, then the mean of X is:

 μ = np

 Proof .

Theorem. If X is a binomial random variable, then the variance of X is:

\(\sigma^2=np(1-p)\)

and the standard deviation of X is:

\(\sigma=\sqrt{np(1-p)}\)

The proof of this theorem is quite extensive, so we will break it up into three parts. Here's the first part:

Proof.

Here's the second part:

Proof. The definition of the expected value of a function gives us:

\(E[X(X-1)]=\sum\limits_{x=0}^n x(x-1)\times f(x)=\sum\limits_{x=0}^n x(x-1)\times \dfrac{n!}{x!(n-x)!}p^x(1-p)^{n-x}\)

The first two terms of the summation equal zero when x = 0 and x = 1. Therefore, the bottom index on the summation can be changed from x = 0 to x = 2, as it is here:

\(E[X(X-1)]=\sum\limits_{x=2}^n x(x-1)\times \dfrac{n!}{x!(n-x)!}p^x(1-p)^{n-x}\)

Now, let's see how we can simplify that summation:

And, here's the final part that ties all of our previous work together:

Proof.

radish seedsExample

The probability that a planted radish seed germinates is 0.80. A gardener plants nine seeds. Let X denote the number of radish seeds that successfully germinate? What is the average number of seeds the gardener could expect to germinate?

Solution. Because X is a binomial random variable, the mean of X is np. Therefore, the gardener could expect, on average, 9 × 0.80 = 7.2 seeds to germinate.

What does it mean that the average is 7.2 seeds? Obviously, a seed either germinates or not. You can't have two-tenths of a seed germinating. Recall that the mean is a long-run (population) average. What the 7.2 means is... if the gardener conducted this experiment... that is, planting nine radish seeds and observing the number that germinated... over and over and over again, the average number of seeds that would germinate would be 7.2. The number observed for any particular experiment would be an integer (that is, whole seeds), but when you take the average of all of the integers from the repeated experiments, you need not obtain an integer, as is the case here.  In general, the average of a discrete random variable need not be an integer.

What is the variance and standard deviation of X?

Solution. The variance of X is:

np(1 − p) = 9 × 0.80 × 0.2 = 1.44

Therefore, the standard deviation of X is the square root of 1.44, or 1.20.