Probability Distributions

Introduction

Learning objectives for this lesson

Upon completion of this lesson, you should be able to:

Random Variables

A random variable is numerical characteristic of each event in a sample space, or equivalently, each individual in a population.

Examples:

Random variables are classified into two broad types

Examples of discrete random variable:

Examples of continuous random variables:

Note : In practice, we don't measure accurately enough to truly see all possible values of a continuous random variable. For instance, in reality somebody may have exercised 4.2341567 hours last week but they probably would round off to 4. Nevertheless, hours of exercise last week is inherently a continuous random variable.

Probability Distributions: Discrete Random Variables

For a discrete random variable, its probability distribution (also called the probability distribution function) is any table, graph, or formula that gives each possible value and the probability of that value. Note: The total of all probabilities across the distribution must be 1, and each individual probability must be between 0 and 1, inclusive.

Examples:

(1) Probability Distribution for Number of Heads in 4 flips of a coin.
Heads

0

1

2

3

4

Probability

1/16

4/16

6/16

4/16

1/16

This could be found by listing all 16 possible sequences of heads and tails for four flips, and then counting how many sequences there are for each possible number of heads.

(2) Probabilty Distribution for number of tattoos each student has in a population of students.
Tattoos

0

1

2

3

4

Probability

0.850

0.120

0.015

0.010

0.005

This could be found be doing a census of a large student population.  

Cumulative Probabilities

Often, we wish to know the probability that a variable is less than or equal to some value. This is called the cumulative probability because to find the answer, we simply add probabilities for all values qualifying as "less than or equal" to the specified value.

Example: Suppose we want to know the probability that the number of heads in four flips is 1 or less. The qualifying values are 0 and 1, so we add probabilities for those two possibilities. If we let X represent number of heads we get on four flips of a coin, then:

\( P(X<2)=P(X \leq 1 )=P(X=0)+P(X=1)=(1/16)+(4/16)=5/16 \) 

The cumulative distribution is a listing of all possible values along with the cumulative probability for each value

Examples:

(1) Probability Distribution and Cumulative Distribution for Number of Heads in 4 flips.
Heads

0

1

2

3

4

Probability

1/16

4/16

6/16

4/16

1/16

Cumulative Probability

1/16

5/16

11/16

15/16

1

Each cumulative probability was found by adding probabilities (in second row) up to the particular column of the table. As an example, for 2 heads, we add probabilities for 0, 1, and 2 heads to get 11/16. This is the probability the number of heads is two or less.

(2) Probability Distribution and Cumulative Distribution for number of tattoos each student has in a population of students.
Tattoos

0

1

2

3

4

Probability

0.850

0.120

0.015

0.010

0.005

Cumulative Probability

0.850

0.970

0.985

0.995

1

As an example, probability a randomly selected student has 2 or fewer tattoos = 0.985 (calculated as 0.850  + 0.120 + 0.015).

Mean, also called Expected Value, of a Discrete Variable

The phrase expected value is a synonym for mean value in the long run (meaning for many repeats or a large sample size). For a discrete random variable, the calculation is Sum of (value × probability) where we sum over all values (after separately calculating value × probability for each value), expressed as:

\( E(X)=\sum x_i p_i\),

meaning we take each observed X value and multiply it by its respective probability. We then add these products to reach our expected value labeled E(X). [NOTE: the letter X is a common symbol used to represent a random variable. Any letter can be used.]

Example : A fair six-sided die is tossed. You win \$2 if the result is a “1”, you win \$1 if the result is a “6” but otherwise you lose \$1.

The probability distribution for X = amount won or lost
X

+2

+1

-1

Probability

1/6

1/6

4/6

\( Expected\ Value= (2 \times \frac {1}{6})+(1 \times \frac {1}{6})+(-1 \times \frac {4}{6})= - \frac {1}{6}=-\$ 0.17 \)

The interpretation is that if you play many times, the average outcome is losing 17 cents per play.

Example : Using the probability distribution for number of tattoos given above (not the cumulative!),

The mean number of tattoos per student is

\( Expected\ Value=(0 \times 0.85)+(1 \times 0.12)+ (2 \times 0.015) +(3 \times 0.010) +(4 \times 0.005) =0.20 \)


 

Standard Deviation of a Discrete Variable

Knowing the expected value is not the only important characteristic one may want to know about a set of discrete numbers: one may also need to know the spread, or variability, of these data. For instance, you may "expect" to win \$20 when playing a particular game (which appears good!), but the spread for this might be from losing \$20 to winning \$60. Knowing such information can influence you decision on whether to play.

To calculate the standard deviation we first must calculate the variance. From the variance, we take the square root and this provides us the standard deviation. Your book provides the following formula for calculating the variance:

\( \sigma ^2= \sum (x_i-\mu)^2 p_i \) and the standard deviation is:\( \sigma = \sqrt {\sum (x_i-\mu)^2 p_i}\)

In this expression we substitute our result for E(X) into \( \mu\) , and \( \mu\) is simply the symbol used to represent the mean of some population .

However, an easier formula to use and remember for calculating the standard deviation is the following:

\( \sigma ^2= \sum x_i^2 p_i -\mu ^2\) and again we substitute E(X) for μ.

The standard deviation is then found by taking the square root of the variance. Notice in the summation part of this equation that we only square each observed X value and not the respective probability.

Example : Going back to the first example used above for expectation involving the die, we would calculate the standard deviation for this discrete distribution by first calculating the variance:

\( \sigma ^2= \sum x_i^2 p_i -\mu ^2 = (2^2 \times \frac{1}{6})+(1^2 \times \frac{1}{6})+(-1)^2 \times \frac{4}{6}-(- \frac{1}{6})^2\)

\(= \frac{4}{6}+\frac {1}{6}+ \frac{4}{6}-\frac{1}{36} = \frac{53}{36}=1.472 \)

So the standard deviation would be the square root of 1.472, or 1.213

Binomial Random Variable

This is a specific type of discrete random variable. A binomial random variable counts how often a particular event occurs in a fixed number of tries or trials. For a variable to be a binomial random variable, ALL of the following conditions must be met:

Examples of binomial random variables:

Notation

n = number of trials (sample size)

 p = probability event of interest occurs on any one trial

Example : For the guessing at true questions example above, n = 30 and p = .5 (chance of getting any one question right).

Probabilities for binomial random variables

The conditions for being a binomial variable lead to a somewhat complicated formula for finding the probability any specific value occurs (such as the probability you get 20 right when you guess as 20 True-False questions.)

We'll use Minitab to find probabilities for binomial random variables. Don't worry about the “by hand” formula. However, for those of you who are curious, the by hand formula for the probability of getting a specific outcome in a binomial experiment is:

\[P(x)= \frac {n!}{x!(n-x)!} p^x (1-p)^{n-x}\]

Evaluating the Binomial Distribution

One can use the formula to find the probability or alternatively, use Minitab or SPSS to find the probability. In the homework, you may use the method that you are more comfortable with unless specified otherwise.

Find P(x) for n = 20, x =3, and π = 0.4.

To calculate binomial random variable probabilities in Minitab:

  1. Open Minitab without data.
  2. From the menu bar select Calc > Probability Distributions > Binomial.
  3. Choose Probability since we want to find the probability x = 3.
  4. Enter 20 in the text box for number of trials.
  5. Enter 0.4 in the text box for probability of success (note for Minitab versions over 14 this now labeled event probability).
  6. Since we do not have a column of data select the radio button for Input Constant and enter 3.
  7. Click Ok.

The window in Minitab to calculate the probability with binomial distribution

Minitab output:

Probability Density Function

Binomial with n = 20 and p = 0.4

x
P(X = x)
3.00
0.0123

watch!

To calculate binomial random variable probabilities in Minitab Express:

  1. Open Minitab Express without data.
  2. From the menu bar, select Statistics > Probability Distributions > CDF/PDF > Probability (PDF).
  3. Since we want to find the probability that x = 3, enter 3 into the "Value" box
  4. In the "Distribution" drop down menu, select Binomial.
  5. Enter 20 into the "Number of trials" box, and 0.4 into the "Event probability" box.
  6. Select "Display a table of probability density values" to show the output.
  7. Click Ok

The result should be the following output:

minitab express output of binomial probabilities

watch!

 

In the following example, we illustrate how to use the formula to compute binomial probabilities. If you don't like to use the formula, you can also just use Minitab to find the probabilities.

Example by hand:Cross-fertilizing a red and a white flower produces red flowers 25% of the time. Now we cross-fertilize five pairs of red and white flowers and produce five offspring.

Find the probability that:

a. There will be no red flowered plants in the five offspring.

X = # of red flowered plants in the five offspring. Here, the number of red flowered plants has a binomial distribution with n = 5, p = 0.25.

\(P(X=0)=\frac{5!}{0!(5-0)!} p^0 (1-p)^5 =1 \times 0.25^0 \times 0.75^5 =0.237\)

b. Cumulative Probability There will less than two red flowered plants.

Answer:

\begin{align}
P(X\ is\ 1\ or\ less)&=P(X=0)+P(X=1)\\
&= \frac{5!}{0!(5-0)!} 0.25^0 (1-0.25)^5+\frac{5!}{1!(5-1)!} 0.25^1 (1-0.25)^4\\
& = 0.237 +0.395=0.632 \\
\end{align}

In the previous example, part a was finding the P(X = x) and part b was finding P(X ≤ x). This latter expression is called finding a cumulative probability because you are finding the probability that has accumulated from the minimum to some point, i.e. from 0 to 1 in this example

To use Minitab to solve a cumulative probability binomial problem, return to Calc > Probability Distributions > Binomial as shown above. Now however, select the radio button for Cumulative Probability and then enter the respective Number of Trials (i.e. 5), Event Probability (i.e. 0.25), and click the radio button for Input Constant and enter the x-value (i.e. 1).

Expected Value and Standard Deviation for Binomial random variable

The formula given earlier for discrete random variables could be used, but the good news is that for binomial random variables a shortcut formula for expected value (the mean) and standard deviation are:

\(Expected\ Value=np\)    \(Standard\ Deviation=\sqrt {np(1-p)}\)

After you use this formula a couple of times, you'll realize this formula matches your intuition. For instance, the “expected” number of correct (random) guesses at 30 True-False questions is np = (30)(.5) = 15 (half of the questions). For a fair six-sided die rolled 60 times, the expected value of the number of times a “1” is tossed is np = (60)(1/6) = 10. The standard deviation for both of these would be, for the True-False test 

\(\sqrt{30 \times 0.5 \times (1-0.5)}=\sqrt{7.5}=2.74\) 

and for the die

\(\sqrt{60 \times \frac{1}{6}\times (1-\frac {1}{6})}=\sqrt{ \frac{50}{6}}=2.89\)   

Continuous Random Variable

Density Curves

Previously we discussed discrete random variables, and now we consider the contuous type. A continuous random variable is such that all values (to any number of decimal places) within some interval are possible outcomes. A continuous random variable has an infinite number of possible values so we can't assign probabilities to each specific value. If we did, the total probability would be infinite, rather than 1, as it is supposed to be

To describe probabilities for a continuous random variable, we use a probability density function. A probability density function is a curve such that the area under the curve within any interval of values along the horizontal gives the probability for that interval.


Normal Random Variables

The most commonly encountered type of continuous random variable is a normal random variable , which has a symmetric bell-shaped density function. The center point of the distribution is the mean value, denoted by μ (pronounced "mew"). The spread of the distribution is determined by the variance, denoted by σ2 (pronounced "sigma squared") or by the square root of the variance called standard deviation, denoted by σ (pronounced "sigma").

Example : Suppose vehicle speeds at a highway location have a normal distribution with mean μ = 65 mph and standard deviation s = 5 mph. The probability density function is shown below. Notice that the horizontal axis shows speeds and the bell is centered at the mean (65 mph).

The normal curve is bell shaped and symmetrical about the mean.

Probability for an Interval = Area under the density curve in that interval

The next figure shows the probability that the speed of a randomly selected vehicle will be between 60 and 73 mile per hour, with this probability equal to the area under the curve between 60 and 73.

The area under the normal curve between 60 and 73 is the probability that the speed of a vehicle is between 60 and 73 miles per hour.

Empirical Rule Review

Recall that our first lesson we learned that for bell-shaped data, about 95% of the data values will be in the interval mean ± (2 × std. dev) . In our example, this is 65 ± (2 × 5), or 55 to 75. The next figure shows that the probability is about 0.95 (about 95%) that a randomly selected vehicle speed is between 55 and 75.

The area under the normal curve between 55 and 75 is 0.95.

The Empirical Rule also stated that about 99.7% (nearly all) of a bell-shaped dataset will be in the interval mean ± (3 × std. dev) . This is 65 ± (3 × 5), or 50 to 80 for example. Notice that this interval roughly gives the complete range of the density curve shown above.


Finding Probabilities for a Normal Random Variable

Remember that the cumulative probability for a value is the probability less than or equal to that value. Minitab, SPSS, Excel, and the TI-83 series of calculators will give the cumulative probability for any value of interest in a specific normal curve.

For our example of vehicle speeds, here is Minitab output showing that the probability = 0.9452 that the speed of a randomly selected vehicle is less than or equal to 73 mph.

Minitab output will give the probability of X less than or equal to 73, which is 0.9452.

We can find this probability using either Minitab or SPSS:

To calculate normal random variable probabilities in Minitab:

  1. Open Minitab without data.
  2. From the menu bar select Calc > Probability Distribution > Normal.
  3. Select the radio button for Cumulative Probability (this is the default option)
  4. The window in Minitab to compute the cumulative probability with normal distribution

  5. In the text box for Mean enter 65
  6. In the text box for Standard Deviation enter 5
  7. Since we do not have a column of data select the radio button for Input Constant and enter 73
  8. Click OK
  9. The output is as follows:

In Minitab output, the probability of X less than or equal to 73 is 0.9452.

watch!

To calculate normal random variable probabilities in Minitab:

  1. Open Minitab Express without any data.
  2. From the menu bar, select Statistics > Probability Distributions > CDF/PDF > Cumulative (CDF).
  3. Since you want to know the probability that the speed of a randomly selected vehicle is less than or equal to 73 mph, make sure the "Form of inpu" drop-down menu says "A single value" and enter 73 into the "Value" box.
  4. Make sure the "Distribution" drop-down menu says "Normal", and enter 65 into the "Mean" box and 5 into the "Standard deviation" box.
  5. Select "Display a table of cumulative probabilities" to show the output.
  6. Click OK.

The result should be the following output:

minitab express output of the normal random variable probabilities.

watch!

Here is a figure that illustrates the cumulative probability we found using this procedure.

The area under the normal curve to the left of 73 is 0.9452.

"Greater than" Probabilities

Sometimes we want to know the probability that a variable has a value greater than some value. For instance, we might want to know the probability that a randomly selected vehicle speed is greater than 73 mph, written P(X > 73).

For our example, probability speed is greater than 73 = 1 - 0.9452 = 0.0548.

•  The general rule for a "greater than" situation is

P (greater than a value) = 1 - P(less than or equal to the value)

Example : Using Minitab we can find that the probability = 0.1587 that a speed is less than or equal to 60 mph. Thus the probability a speed is greater than 60 mph = 1 - 0.1587 = 0.8413.

The relevant Minitab output and a figure showing the cumulative probability for 60 mph follows:

In Minitab output, the probability of X less than or equal to 60 is 0.1587.

The area under the normal curve to the left of 60 is 0.1587

"In between" Probabilities

Suppose we want to know the probability a normal random variable is within a specified interval. For instance, suppose we want to know the probability a randomly selected speed is between 60 and 73 mph. The simplest approach is to subtract the cumulative probability for 60 mph from the cumulative probability for 73. The answer is

Probability speed is between 60 and 73 = 0.9452 − 0.1587 = 0.7875.

This can be written as P(60 < X < 73) = 0.7875, where X is speed.

•  The general rule for an "in between" probability is

P( between a and b ) = cumulative probability for value b − cumulative probability for value a

Finding Cumulative Probabilities

Using the Standard Normal Table in the appendix of textbook or see a copy at Standard Normal Table

Table A.1 in the textbook gives normal curve cumulative probabilities for standardized scores.

The cumulative probability for a value equals the cumulative probability for that value's z-score. Here, probability speed less than or equal 73 mph = probability z-score less than or equal 1.60. How did we arrive at this z-score?

Example

In our vehicle speed example, the standardized scores for 73 mph is

\[z=\frac{73-65}{5}=1.60\]

We look in the ".00" column of the "1.6" row (1.6 plus .00 equals 1.60) to find that the cumulative probability for z = 1.60 is 0.9452, the same value we got earlier as the cumulative probability for speed = 73 mph.

Based on the normal table, the corresponding p-value to z-score equal to 1.60 is 0.9452.

 

Example

For speed = 60 the z-score is

\[z=\frac{60-65}{5}=-1.00\]

Table A.1 gives this information:

Based on the normal table, the corresponding p-value to z-score equal to minus 1.00 is 0.1587.

The cumulative probability is .1587 for z = -1.00 and this is also the cumulative probability for a speed of 60 mph.

 

Example

Suppose pulse rates of adult females have a normal curve distribution with mean μ =75 and standard deviation s = 8. What is the probability that a randomly selected female has a pulse rate greater than 85 ? Be careful ! Notice we want a "greater than" and the interval we want is entirely above average, so we know the answer must be less than 0.5.

If we use Table A.1, the first step is to calculate a z-score of 85.

\[z=\frac{85-75}{8}=1.25\]

Information from Table A.1 is

Based on the normal table, the corresponding p-value to z-score equal to 1.25 is 0.8944.

Use the "05" column to find that the cumulative probability for z = 1.25 is 0.8944.

This is not yet the answer. This is the probability the pulse is less than or equal to 85. We want a greater than probability so the answer is:

P(greater than 85) = 1 - P(less than or equal 85) = 1 − 0.8944 = 0.1056.


Finding Percentiles

We may wish to know the value of a variable that is a specified percentile of the values.

To calculate percentiles in Minitab:

  1. Open Minitab without data.
  2. From the menu bar select Calc>Probability Distribution> Normal.
  3. Select the radio button for Inverse Cumulative Probability
  4. In the text box for Mean enter 65
  5. In the text box for Standard Deviation enter 5
  6. Since we do not have a column of data select the radio button for Input Constant and enter 0.9999
  7. Click OK
  8. The output is as follows:

In Minitab output, the percentile corresponding to probability 0.9999 is 83.5951

watch!

To calculate percentiles in Minitab Express:

  1. Open Minitab without data.
  2. From the menu bar, select Statistics > Probability Distributions > CDF/PDF > Inverse (ICDF).
  3. Since we do not have a column of data, make sure the form of input is "A single value", and enter 0.9999.
  4. Make sure the distribution is "Normal", and enter the mean of 65 and the standard deviation of 5.
  5. Under output, select "Display a table of inverse cumulative probabilities".
  6. Click OK.

 The result should be the following output:

minitab express output of calculating percentiles

watch!

Note:

In Minitab output, the percentile corresponding to probability 0.9999 is 83.5951

In Minitab output, the percentile corresponding to probability 0.25 is 69.6041

Normal Approximation to the Binomial

Remember binomial random variables from last week's discussion? A binomial random variable can also be approximated by using normal random variable methods discussed above. This approximation can take place as long as:

  1. The population size must be at least 10 times the sample size.
  2. np = 10 and n(1 − p) = 10. [These constraints take care of population shapes that are unbalanced because p is too close to 0 or to 1.]

The mean of a binomial random variable is easy to grasp intuitively: Say the probability of success for each observation is 0.2 and we make 10 observations. Then on the average we should have 10 * 0.2 = 2 successes. The spread of a binomial distribution is not so intuitive, so we will not justify our formula for standard deviation.

If sample count X of successes is a binomial random variable for n fixed observations with probability of success p for each observation, then X has a mean and standard deviation as discussed in section 8.4 of:

\(Mean=np\) and \(Standard\ Deviation=\sqrt {np(1-p)}\)

And as long as the above 2 requirements are for n and p are satisfied, we can approximate X with a normal random variable having the same mean and standard deviation and use the normal calculations discussed previously in these notes to solve for probabilities for X.

Review of Finding Probabilities

Chalk Talk Click on the 'Inspect' icons below for an audio/visual example for each situation described. When reviewing any of these examples keep in mind that they apply when:

1. The variable in question follows a normal, or bell-shaped, distribution

2. If the variable is not in standardized, then you need to standardized the value first by \(z=\frac{value-mean}{s.d.} = \frac{x-\mu}{\sigma}\).

Inspect
Finding "Less Than" Probability [transcript]
Inspect
Finding "Greater Than" Probability [transcript]
Inspect
Finding "Between" Probability [transcript]
Inspect
Finding "Either / Or" Probability [transcript]