4.2 - Binomial Distributions

Printer-friendly versionPrinter-friendly version

Unit Summary

  • Binomial Experiment and Binomial Random Variable
  • Binomial Distribution and How to Evaluate It
  • Mean \(\mu\) and Standard Deviation \(\sigma\) of Binomial Random Variable
  • Some Application of Binomial Distribution's Mean and Standard Deviation

reading assignmentReading Assignment
An Introduction to Statistical Methods and Data Analysis, (see your Course Schedule)

 

Binomial Experiments and Binomial Random Variables

A Binary categorical variable is a variable that has two possible outcomes. For example, gender (male/female), having a tattoo (yes/no) are both examples of a binary categorical variable.

 

The Binomial Distribution and How to Evaluate It

The binomial distribution is a special discrete distribution where there are two distinct complimentary outcomes, a “success” and a “failure”.

Let's use the example from the previous page investigating the number of prior convictions for prisoners at a state prison at which there were 500 prisoners.

Let Success = no priors (0)
Let Failure = priors (1, 2, 3, or 4)

A Note on Notation!

\(\pi\) (or p) = probability of success

Some common notation for “success” that you may see will be either p or \(\pi\) to represent the probability of “success” and usually q to represent the probability of “failure”. \(\pi\) is what is used in text and online notes.  “Success” is defined as whatever the researcher decides…not just a positive outcome.  The symbol \(\pi\) is this case does NOT refer the numerical value 3.14

Looking back on our example, we can find that:

\(\pi = 0.16\), or \(p = 0.16\)
\(q = 0.84\)
\(\pi + q = 1\), or \(p + q = 1\)

A special discrete random variable is the binomial. We have a binomial experiment if ALL of the following four conditions are satisfied:

  1. The experiment consists of n identical trials.
  2. Each trial results in one of the two outcomes, called success and failure.
  3. The probability of success, denoted \(\pi\), remains the same from trial to trial.
  4. The n trials are independent. That is, the outcome of any trial does not affect the outcome of the others.

If the four conditions are satisfied, then a binomial random variable will be distributed with a mean, \(\mu\), of n × p and a standard deviation of \(\sqrt {n*p (1-p)}\).

A Note on Notation!

\(\mu=n \pi\)

\(\sigma=\sqrt{n \pi (1-\pi)}\)

where π is the probability of success in a given trial, and n is the number of trials in the binomial experiment.

Note: \(\pi\) is just a symbol for the probability of success and NOT the value 3.14....  Alternatively we could use p instead of \(\pi\) and the formula for mean and standard deviation would be as follows:

\(\mu=n p\)

\(\sigma=\sqrt{np (1-p)}\)

In general, we see the mean of a binomial is the number of trials times the probability of success. The standard deviation is the square root of the mean times the probability of failure.

image of a 911 dispatch centerExample: FBI Crime Survey

An FBI survey shows that about 80% of all property crimes go unsolved. Suppose that in your town 3 such crimes are committed and they are each deemed independent of each other.

What is the probability that a) 1 of  the three of these crimes will be solved and b) that at least one of the crimes will be solved?

First, we must determine if this situation satisfies ALL four conditions of a binomial experiment stated above:

  1. Does it satisfy fixed number of trials?  YES the number of trials is fixed at 3 (n = 3.)
  2. Does it have only 2 outcomes? YES (Solved and unsolved) 
  3. Do all the trials have the same probability of success?  YES ( p = 0.2) 
  4. Are all crimes independent?  YES (Stated in the description.) 

a) Let's solve the first part that only 1 of the three crimes will be solved.  To do this we find the probability that one of the crimes would be solved.  With three such events (crimes) there are three sequences in which only one is solved

Solved First, Unsolved Second, Unsolved Third = 0.2 × 0.8 × 0.8 = 0.128
Unsolved First, Solved Second, Unsolved Third = 0.8 × 0.2 × 0.8 = 0.128
Unsolved First, Unsolved Second, Solved Third = 0.8 × 0.8 × 0.8 = 0.128

We add these three probabilities up and get 0.384.  Looking at this from a formula standpoint, we have three possibile sequences, each involving one solved and two unsolved events.  Putting this together gives us the following:

\[(3) * (0.2) * (0.8)^2 = 0.384\]

The example above and its formula illustrates the motivation behind the binomial formula for finding exact probabilities P(X = x).

\[\frac{n!}{x! (n-x)!} p^x (1 – p)^{n-x}\]

Let's apply this formula in our example.  In our example n = 3 and X = 1.

If we fill in the formula above using the data from our example it would be:

\[\frac{3!}{1! (3-1)!} 0.2^1 (1 – 0.2)^{3-1}=3(0.2)(0.8)^2 = 0.384\]

A Note on Notation! - the '!' sign

The exclamation point (!) is used in math to represent factorial operations.   The factorial of a number means to take that number and multiply it by every number that comes before it - down to one (excluding 0).  For example, 3! = 3 × 2 × 1 = 6

Remember 1! = 1
Remember 0! = 1

b) OK, now let's tackle the second part of the question. What is the probability that at least one of the crimes will be solved?

Here we are looking to solve P(X ≥ 1).

The long way to solve for P(X ≥ 1).  This would be to solve: \(P(x=1) + (Px=2) + P (x=3)\) as follows:

\(P(x=2) = 3! / 2!1! * 0.22 * 0.8\)
\(P(x=3) = 3! / 3!0! * 0.23 * 0.8\)
\(P(x=1) = \cdots \)

We add up all of the above probabilities and get 0.488

OR, we could simplify our work by using the complement rule.  Here the complement to P(X ≥ 1) is equal to 1 - P(X < 1) which is equal to 1 - P(X = 0).  We have carried out this solution below.

\(1 – P(x<1) = 1– P(x= 0) = 1– \frac{3!}{3! (3-3)!} 0.2^0 (1 – 0.2)^{3-3}=1(1)(0.8)^3  = 1 – 0.512 = 0.488\)

In such a situation where three crimes happen, what is the expected number of crimes that remain unsolved and the standard deviation?  Here we are applying the formulas from above.

\(\mu = 3 * (0.8) = 2.4\)
\(SD = \sqrt {3*0.8 * 0.2} = \sqrt {0.48} = 0.69\}\)

Below is another example in which we illustrate how to use the formula to compute binomial probabilities again. In this example we are using Y to represent the random variable and \(\pi\) to represent the probability of success, (similar what some texts might use). Remember, \(\pi\) is just a symbol for the probability of success and NOT the value 3.14....

Cross-fertilizing a red and a white flower produces red flowers 25% of the time. Now we cross-fertilize five pairs of red and white flowers and produce five offspring.

Find the probability that there will be no red flowered plants in the five offspring.

Y = # of red flowered plants in the five offspring. Here, the number of red flowered plants has a binomial distribution with n = 5, \(\pi\) = 0.25.

\[P(Y=0)= \frac{5!}{0!(5-0)!} {\pi}^0 {(1-\pi)}^5=1 (0.25)^0 (0.75)^5 =0.237\]

  Now, find the probability that there will be four or more red flowered plants.  

Try to figure out your answer first, then click the graphic to compare answers.

The mean of a distribution is also called the expected value of the distribution.

Of the five cross-fertilized offspring, how many red flowered plants do you expect?  

Try to figure out your answer first, then click the graphic to compare answers.

Note: Y can only take values 0, 1, 2, ..., n, but the expected value (mean) of Y may be some value other than those that can be assumed by Y.

What is the standard deviation of Y, the number of red flowered plants in the five cross-fertilized offspring?  

Try to figure out your answer first, then click the graphic to compare answers.

Some Application of Binomial Distribution's Mean and Standard Deviation

A pharmaceutical company claims that a new treatment is successful in reducing fever in more than 60% of the cases. The treatment was tried on 40 randomly selected cases and 11 were successful. Do you doubt the company's claim? (i.e., Can you reject the company's claim?)

Answer: If the claim is valid, then Y (the number of successful cases) has a binomial distribution with n = 40 and π which is greater than 0.6.

We will first consider the boundary case, π = 0.6. Is y = 11 a likely outcome from Y when \(\pi\) = 0.6?

Answer: When \(\pi = 0.6\), \(\mu = n \pi = 40 (0.6) = 24\)

\( \sigma=\sqrt{n \pi (1-\pi)}=\sqrt{40\cdot0.6\cdot0.4}=3.1\)
\(\mu - 3\sigma = 24 - 3(3.1) = 14.7\) , \(\mu + 3\sigma = 24 + 3(3.1) = 33.3\)

The observed value, y = 11 is less than 14.7 and not within 3σ of its mean, thus 11 is unlikely to be observed when π = 0.6. Same argument carries through to π greater than 0.6.

Here is Dr. Andrew Wiesner working through this problem:

[Summary Transcript]

An Alternative Approach: Compute the probability of observing a value as small as or smaller than 11, assuming \(\pi\) = 0.6. If the probability is large, do not doubt the claim. If the probability is small, doubt the claim. Using Minitab, we get the following output:

Probability Density Function

Binomial with n = 40 and p = 0.6

x
P(X = x)
0
0.0000000
1
0.0000000
2
0.0000000
3
0.0000000
4
0.0000000
5
0.0000000
6
0.0000000
7
0.0000000
8
0.0000002
9
0.0000013
10
0.0000059
11
0.0000242

We thus obtain that P(X ≤ 11) = 0.0000316. The probability is very small. We, thus doubt the claim. (Note: It is incorrect to just compute the probability at 11 since that is usually very small if sample size is large.)  Here is Dr. Andrew Wiesner again, working through this alternative approach:

[Summary Transcript]

 

How are probability values and cumulative probability values related?  This is an important relationship to understand.  Click on this VIEWLET link to understand the relationship between probability and cumulative probability of binomial distribution.