Lesson 5: Probability Distributions

diagram of the 'big picture' with inference highlighted

In this lesson we will begin to explore the concept of statistical inference. We will look at both discrete and continuous probability distributions. The concepts of standard error and the Central Limit Theorem will be introduced which will serve as the base for the remaining lessons in this course.

Lesson 5 Learning Objectives

Upon completion of this lesson, you will be able to:

Review

Before we begin new content, we should review a few terms from previous lessons that we will see again in this lesson:

Discrete: Data that can only take on set number of values

Continuous: Quantitative data that can take on any value between the minimum and maximum, and any value between two other values

Probability: The likelihood of an event occuring; \(P(A)=\frac{number \: of \:events \:considered\: outcome \:A}{number \:of\: total \:events}\)

\(P(A\;\cap\;B)\): Intersection of A and B; "probability of A and B"

\(P(A\;\cup\;B)\): Union of A and B; "probability of A or B" (this also includes the probability of A and B)

Mean: The numerical average; calculated as the sum of all of the data values divided by the number of values; represented as \(\overline{X}\).

Standard deviation: Roughly the average difference between individual data and the mean; for a sample, represented as s, \(s=\sqrt{\frac{\sum (x-\overline{x})^{2}}{n-1}}\)

Sample: A subset of the population from which data is actually collected

Population: The entire set of possible observations in which we are interested

Statistic: A measure concerning a sample (e.g., sample mean)

Parameter: A measure concerning a population (e.g., population mean)

Descriptive statistics: Methods for summarizing data (e.g., mean, median, mode, range, variance, graphs)

Inferential statistics: Methods for using sample data to make conclusions about a population

z score: Distance between an individual score and the mean in standard deviation units; also known as a standardized score.

Empirical Rule: For bell-shaped distributions, about 68% of the data will be within one standard deviation of the mean, about 95% will be within two standard deviations of the mean, and about 99.7% will be within three standard deviations of the mean

distribution displaying the features of the empirical rule

5.1 - Random Variables

Random Variables

Random variable: a numerical characteristic that takes on different values due to chance

Examples

Coin Flips
The number of heads in four flips of a coin (a numerical property of each different sequence of flips) is a random variable because the results will vary between trials.

Heights
Sample of 100 are repeatedly pulled from the population of all Penn State students and their heights are measured. The mean height of samples of 100 Penn State students is a random variable because the statistic will vary between samples. While most sample means will be similar to the population mean, they will not all equal the population mean due to random sampling variation.

 

Random variables are classified into two broad types: discrete and continuous. A discrete random variable has a countable set of distinct possible values. A continuous random variable is such that any value (to any number of decimal places) within some interval is a possible value.

 

Examples


Discrete Random Variables:
  • Number of heads in 4 flips of a coin (possible outcomes are 0, 1, 2, 3, 4)
  • Number of classes missed last week (possible outcomes are 0, 1, 2, 3, ..., up to the maximum number of classes)
  • Amount won or lost when betting $1 on the Pennsylvania Daily number lottery

Continuous Random Variables:

  • Heights of individuals
  • Time to finish a test
  • Hours spent exercising last week.

 

Note : In practice, we don't measure accurately enough to truly see all possible values of a continuous random variable. For instance, in reality somebody may have exercised 4.2341567 hours last week but they probably would round off to 4. Nevertheless, hours of exercise last week is inherently a continuous random variable.

5.2 - Discrete Random Variables

Probability distribution: A table, graph, or formula that gives the probability of a given outcome's occurrence

Probability Distribution

For a discrete random variable, its probability distribution (also called the probability distribution function) is any table, graph, or formula that gives each possible value and the probability of that value.  

Note: The total of all probabilities across the distribution must be 1, and each individual probability must be between 0 and 1, inclusive.

 

image of a referee flipping a coinExample

What if we flipped a fair coin four times? What are the possible outcomes and what is the probability of each?

Figure 1 below is a probability distribution for the number of heads in 4 flips of a coin. Given that P(Heads)=.50, the probability of not flipping heads at all is 1/16, or .0625. In 6.25% of all trials, we can expect that there will be no heads. This may be written as P(X=0)=.0625. Similarly, the probability of flipping heads once in four trials is 4/16, or .25. In 25% of all trials, we can expect that heads will be flipped exactly once. This may be written as P(X=1)=.25.

This probability distribution could be constructed by listing all 16 possible sequences of heads and tails for four flips (i.e., HHHH, HTHH, HTTH, HTTT, etc.), and then counting how many sequences there are for each possible number of heads.

Figure 1. Probability Distribution for Number of Heads in 4 Flips of a Coin
Heads

0

1

2

3

4

Probability

1/16

4/16

6/16

4/16

1/16

image of a person getting a tattooExample

A census was conducted at a university. All students were asked how many tattoos they had.

Figure 2 presents a probability distribution for the discrete variable of number of tattoos for each student. From this table we can find that 85% of students in the population do not have a tattoo, 12% of students in the population have one tattoo, 1.5% of students in the population have two tattoos, and so on. This could be written as P(X=0)=.85, P(X=1)=.12, P(X=2)=.015, etc.
 

Figure 2. Probabilty Distribution for Number of Tattoos Each Student Has in a Population of Students
Tattoos

0

1

2

3

4

Probability

.850

.120

.015

.010

.005

Cumulative Probabilities

Cumulative probability: Likelihood of an outcome less than or equal to a given value occuring

To find a  cumulative probability we add the probabilities for all values qualifying as "less than or equal" to the specified value.

Example

Suppose we want to know the probability that the number of heads in four flips is less than two. If we let X represent number of heads we get on four flips of a coin, then:

Because this is a discrete distribution, the probability of flipping less than two heads is equal to flipping one or zero heads:

\(P(X<2)=P(X=0\cup1)\)

The probability of flipping 1 head and the probability of flipping 0 heads are mutually exclusive events. Thus,  \(P(0 \cup1)=P(X=0)+P(X=1)\)

We can use the values from Figure 1 above to solve this equation.

\(P(X=0)+P(X=1)=(1/16)+(4/16)=5/16 \) 

 

Cumulative distribution: A listing of all possible values along with the probability of that value and all lower values occuring (i.e., the cumulative probability)

 

Example

Cumulative probabilities are found by adding the probability up to each column of the table. In Figure 3 we find the cumulative probability for one head by adding the probabilities for zero and one. The cumulative probability for two heads is found by adding the probabilities for zero, one, and two. We continue with this procedure until we reach the maximum number of heads, in this case four, which should have a cumulative probability of 1.00 because 100% of trials must have four or fewer heads.

Figure 3. Probability Distribution and Cumulative Distribution for Number of Heads in 4 Flips.
Heads

0

1

2

3

4

Probability

1/16

4/16

6/16

4/16

1/16

Cumulative Probability

1/16

5/16

11/16

15/16

1

Example

Let's construct a cumulative distribution for the data concerning number of tattoos.

Figure 4. Probability Distribution and Cumulative Distribution for Number of Tattoos Each Student Has in a Population of Students.
Tattoos

0

1

2

3

4

Probability

.850

.120

.015

.010

.005

Cumulative Probability

.850

.970

.985

.995

1

Note that the cumulative probability for the last column is always 1. That is, 100% of trials will be less than or equal to the maximum value.

5.3 - Expected Value of a Discrete Random Variable

Expected Value (i.e., Mean) of a Discrete Random Variable

Law of Large Numbers: Given a large number of repeated trials, the average of the results will be approximately equal to the expected value

Expected value: The mean value in the long run for many repeated samples, symbolized as \(E(X)\)

Expected Value for a Discrete Random Variable

\[E(X)=\sum x_i p_i\]

\(x_i\)= value of the ith outcome
\(p_i\) = probability of the ith outcome

According to this formula, we take each observed X value and multiply it by its respective probability. We then add these products to reach our expected value. You may have seen this before referred to as a weighted average. It is known as a weighted average because it takes into account the probability of each outcome and weighs it accordingly. This is in contrast to an unweighted average which would not take into account the probability of each outcome and weigh each possibility equally.

Let's look at a few examples of expected values for a discrete random variable:

image of a six-sided diceExample

 

A fair six-sided die is tossed. You win \$2 if the result is a “1,” you win \$1 if the result is a “6,” but otherwise you lose \$1.

The Probability Distribution for X = Amount Won or Lost
X

+\$2

+\$1

-\$1

Probability

1/6

1/6

4/6

\( E(X)= \$2(\frac {1}{6})+\$1 (\frac {1}{6})+(-\$1)(\frac {4}{6})= -\$ 0.17 \)

The interpretation is that if you play many times, the average outcome is losing 17 cents per play. Thus, over time you should expect to lose money.

 

Example

 

Using the probability distribution for number of tattoos, let's find the mean number of tattoos per student.

Probabilty Distribution for Number of Tattoos Each Student Has in a Population of Students
Tattoos

0

1

2

3

4

Probability

.850

.120

.015

.010

.005

\( E(X)=0 (.85)+1(.12)+ 2(.015) +3 (.010) +4(.005) =.20 \)

The mean number of tattoos per student is .20.

 


Symbols for Population Parameters

Recall from Lesson 3, in a sample, the mean is symbolized by \(\overline{x}\) and the standard deviation by \(s\). Because the probabilities that we are working with here are computed using the population, they are symbolized using lower case Greek letters. The population mean is symbolized by \(\mu\) (lower case "mu") and the population standard deviation by \(\sigma \) (lower case "sigma").

 Sample StatisticPopulation Parameter
Mean\(\overline{x}\)\(\mu\)
Variance\(s^{2}\)\(\sigma ^{2}\)
Standard Deviation\(s\)\(\sigma \)

Also recall that the standard deviation is equal to the square root of the variance. Thus, \(\sigma=\sqrt{(\sigma ^{2})}\)

Standard Deviation of a Discrete Random Variable

Knowing the expected value is not the only important characteristic one may want to know about a set of discrete numbers: one may also need to know the spread, or variability, of these data. For instance, you may "expect" to win \$20 when playing a particular game (which appears good!), but the spread for this might be from losing \$20 to winning \$60. Knowing such information can influence you decision on whether to play.

To calculate the standard deviation we first must calculate the variance. From the variance, we take the square root and this provides us the standard deviation. Conceptually, the variance of a discrete random variable is the sum of the difference between each value and the mean times the probility of obtaining that value, as seen in the conceptual formulas below:

Conceptual Formulas

Variance for a Discrete Random Variable

\( \sigma ^2= \sum [(x_i-\mu)^2 p_i] \)

Standard Deviation for a Discrete Random Variable

\( \sigma = \sqrt {\sum [(x_i-\mu)^2 p_i}]\)

\(x_i\)= value of the ith outcome
\(\mu= E(X)=\sum x_i p_i\)
\(p_i\) = probability of the ith outcome

In these expressions we substitute our result for E(X) into \( \mu\) because \( \mu\) is the symbol used to represent the mean of a population .

However, there is an easier computational formula. The compuational formula will give you the same result as the conceptual formula above, but the calculations are simplier.

Computational Formulas

Variance for a Discrete Random Variable

\( \sigma ^2= [\sum (x_i^2 p_i )]-\mu ^2\)

Standard Deviation for a Discrete Random Variable

\( \sigma = \sqrt {[\sum (x_i^2 p_i)] -\mu ^2}\)

\(x_i\)= value of the ith outcome
\(\mu= E(X)=\sum x_i p_i\)
\(p_i\) = probability of the ith outcome

Notice in the summation part of this equation that we only square each observed X value and not the respective probability. Also note that the \(\mu\) is outside of the summation.

Example

Going back to the first example used above for expectation involving the dice game, we would calculate the standard deviation for this discrete distribution by first calculating the variance:

The Probability Distribution for X = Amount Won or Lost
X

+\$2

+\$1

-\$1

Probability

1/6

1/6

4/6

\( \sigma ^2= [\sum x_i^2 p_i ]-\mu ^2 = [2^2 (\frac{1}{6})+1^2 (\frac{1}{6})+(-1)^2 (\frac{4}{6})]-(- \frac{1}{6})^2\)

\(=[ \frac{4}{6}+\frac {1}{6}+ \frac{4}{6}]-\frac{1}{36} = \frac{53}{36}=1.472 \)

The variance of this discrete random variable is 1.472.

\(\sigma=\sqrt{(\sigma ^{2})}\)

\(\sigma=\sqrt{1.472}=1.213\)

The standard deviation of this discrete random vairable is 1.213.

5.4 - Binomial Random Variable

Binomial random variable: A specific type of discrete random variable that counts how often a particular event occurs in a fixed number of tries or trials

For a variable to be a binomial random variable, ALL of the following conditions must be met:

  1. There are a fixed number of trials (a fixed sample size)
  2. On each trial, the event of interest either occurs or does not
  3. The probability of occurrence (or not) is the same on each trial
  4. Trials are independent of one another

Examples of Binomial Random Variables

  • Number of correct guesses at 30 true-false questions when you randomly guess all answers
  • Number of winning lottery tickets when you buy 10 tickets of the same kind
  • Number of left-handers in a randomly selected sample of 100 unrelated people
  • Number of tails when flipping a coin 10 times

Notation

n = number of trials

 p = probability event of interest occurs on any one trial

Example

Number of correct guesses at 30 true-false questions when you randomly guess all answers
There are 30 trials, therefore n = 30
There are two possible outcomes (true and false) that are equally probable, therefore p = 1/2 = .5

Probabilities for Binomial Random Variables

The conditions for being a binomial variable lead to a somewhat complicated formula for finding the probability any specific value occurs (such as the probability you get 20 right when you guess as 30 True-False questions.)

We'll use Minitab Express to find probabilities for binomial random variables. However, for those of you who are curious, the by hand formula for the probability of getting a specific outcome in a binomial experiment is:

Binomial Random Variable Probability

\[P(x)= \frac {n!}{x!(n-x)!} p^x (1-p)^{n-x}\]

n = number of trials
x = number of successes
p = probability event of interest occurs on any one trial

! is the symbol for factorial. For a review of factorials, see the course algebra review page.

One can use the formula to find the probability or alternatively, use Minitab Express to find the probability. In the homework, you may use the method that you are more comfortable with unless specified otherwise.

In the following Minitab Express example we will find P(x) for n = 20, x =3, and p = 0.4

To calculate binomial random variable probabilities in Minitab:

  1. Open Minitab without data.
  2. From the menu bar select Calc > Probability Distributions > Binomial.
  3. Choose Probability since we want to find the probability x = 3.
  4. Enter 20 in the text box for number of trials.
  5. Enter 0.4 in the text box for probability of success (note for Minitab versions over 14 this now labeled event probability).
  6. Since we do not have a column of data select the radio button for Input Constant and enter 3.
  7. Click Ok.

The window in Minitab to calculate the probability with binomial distribution

Minitab output:

Probability Density Function

Binomial with n = 20 and p = 0.4

x
P(X = x)
3.00
0.0123

watch!

To calculate binomial random variable probabilities in Minitab Express:

  1. Open Minitab Express without data.
  2. From the menu bar, select Statistics > Probability Distributions > CDF/PDF > Probability (PDF).
  3. Since we want to find the probability that x = 3, enter 3 into the "Value" box
  4. In the "Distribution" drop down menu, select Binomial.
  5. Enter 20 into the "Number of trials" box, and 0.4 into the "Event probability" box.
  6. Select "Display a table of probability density values" to show the output.
  7. Click Ok

The result should be the following output:

minitab express output of binomial probabilities

watch!

 

In the following example, we illustrate how to use the formula to compute binomial probabilities by hand. If you don't like to use the formula, you can also use Minitab Express to find the probabilities.

image of a red flowerExample

Red Flowers

Cross-fertilizing a red and a white flower produces red flowers 25% of the time. Now we cross-fertilize five pairs of red and white flowers and produce five offspring. Find the probability that there will be no red flowered plants in the five offspring.

X = # of red flowered plants in the five offspring.

The number of red flowered plants has a binomial distribution with n = 5, p = .25

\(P(X=0)=\frac{5!}{0!(5-0)!} p^0 (1-p)^5 =1 \times .25^0 \times .75^5 =.237\)

There is a 23.7% chance that none of the five plants will be red flowered.

Cumulative probability: Likelihood that a certain number of successes or fewer will occur.

Binomial random variable probabilities are mutually exclusive, therefore we can use the addition rule that we learned in Lesson 4.

Example

Red Flowers, cont.

Continuing with the red flowers example, what if we wanted to know the probability that there would be one or fewer red flowered plants?

\begin{align}
P(X\ is\ 1\ or\ less)&=P(X=0)+P(X=1)\\
&= \frac{5!}{0!(5-0)!} .25^0 (1-.25)^5+\frac{5!}{1!(5-1)!} .25^1 (1-.25)^4\\
& = .237 +.395=.632 \\
\end{align}

There is a 63.2% chance that one or fewer of the five plants will be red flowered.

 

In the red flowers example, we first computed P(X = x) and then P(X ≤ x). This latter expression is called finding a cumulative probability because you are finding the probability that has accumulated from the minimum to some point, i.e. from 0 to 1 in this example

To use Minitab Express to solve a cumulative probability binomial problem, return to Statistics > Probability Distributions> CDF/PDF > Cumulative Distribution Function (CDF). For Value enter 1. For distribution select the binomial. There are 5 trials and the event probability is .25

To use Minitab to solve a cumulative probability binomial problem, return to Calc > Probability Distributions > Binomial as shown above. Now however, select the radio button for Cumulative Probability. For Number of Trials enter 5 and the event probability is .25. Click the radio button for Input Constant and enter the x value of 1.

 

Expected Value and Standard Deviation for Binomial Random Variable

The formula given earlier for discrete random variables could be used, but the good news is that for binomial random variables a shortcut formula for expected value (the mean) and standard deviation can also be used.

Bionomial Random Variable Formulas

\[\mu=np\]

\[\sigma=\sqrt {np(1-p)}\]

n = number of trials
p = probability event of interest occurs on any one trial

After you use this formula a couple of times, you'll realize this formula matches your intuition. For instance, the “expected” number of correct (random) guesses at 30 True-False questions is np = (30)(.5) = 15 (half of the questions). For a fair six-sided die rolled 60 times, the expected value of the number of times a “1” is tossed is np = (60)(1/6) = 10.

The standard deviations for these would be, for the True-False test, \(\sigma=\sqrt{30 \times 0.5 \times (1-0.5)}=\sqrt{7.5}=2.74\), and for the die, \(\sigma=\sqrt{60 \times \frac{1}{6}\times (1-\frac {1}{6})}=\sqrt{ \frac{50}{6}}=2.89\).

Example

image of the layout of a roulette wheelRoulette

A roulette wheel has 38 slots, 18 are red, 18 are black, and 2 are green.You play five games and always bet on red.

 

How many games can you expect to win?

Recall, you play five games and always bet on red.  \(n=5\) and \(p=\frac{red \;slots}{total \;slots}=\frac{18}{38}\)

\(\mu=np=5 \times \frac{18}{38}=2.3684\)

\( \sigma=\sqrt{np(1-p)}=\sqrt{5\times\frac{18}{38}\times\left(1-\frac{18}{38}\right)}=1.1165\)

Out of 5 games, you can expect to win 2.3684 with a standard deviation of 1.1165.

 

What is the probability that you will win all five games?

\(P(x)= \frac {n!}{x!(n-x)!} p^x (1-p)^{n-x}\)

\(P(X=5)= \frac {5!}{5!(5-5)!}\left( \frac{18}{38} \right)^5 \left(1-\frac{18}{38}\right)^{5-5}\)

\(P(X=5)=\frac{5!}{5!0!} \left(.4737^{5}\right) .5263^{0} = 1(.0238)(1)=.0238\)

There is a 2.38% chance that you will win all five out of five games.

 

If you win three or more games, you make a profit. If you win two or fewer games, you lose money. What is the probability that you will win no more than two games?

\(P(X\leq 2)=P(X=0)+P(X=1)+P(X=2)\)

\(P(X=0)=\frac {5!}{0!(5-0)!} \left ( \frac{18}{38} \right )^0\left(1-\frac{18}{38}\right)^{5-0}=.0404\)

\(P(X=1)=\frac {5!}{1!(5-1)!} \left ( \frac{18}{38} \right )^1\left(1-\frac{18}{38}\right)^{5-1}=.1817\)

\(P(X=2)=\frac {5!}{2!(5-2)!} \left ( \frac{18}{38} \right )^2\left(1-\frac{18}{38}\right)^{5-2}=.3271\)

\(P(X\leq 2)=.0404+.1817+.3271=.5493\)

There is a 54.93% chance that you will win no more than two games. In other words, there is a 54.93% chance that you will lose money.

5.5 - Continuous Random Variable

Density Curves

We just discussed discrete random variables, and now we consider continuous random variables. Recall, a continuous random variable is such that all values (to any number of decimal places) within some interval are possible outcomes. A continuous random variable has an infinite number of possible values so we can't assign probabilities to each specific value. If we did, the total probability would be infinite, rather than 1, as it is supposed to be.

To describe probabilities for a continuous random variable, we use a probability density function. 

Probability density function (PDF): A curve such that the area under the curve within any interval of values along the horizontal gives the probability for that interval

 

Normal Random Variables

The most commonly encountered type of continuous random variable is a normal random variable , which has a symmetric bell-shaped density function. The center point of the distribution is the mean value, denoted by \(\mu\) ("mu"). The spread of the distribution is determined by the variance, denoted by \(\sigma ^{2}\) ("sigma squared") or by the square root of the variance called standard deviation, denoted by \(\sigma\) ("sigma").

Example

The distribution of IQ scores is normal with a mean of 100 and standard deviation of 15.

In other words, \(\mu=100\) and \(\sigma=15\). The probability density function is shown below.

Normal distribution of IQ scores

Notice that the horizontal axis shows IQ score and the bell is centered at the mean of 100.

 

While we cannot determine the probability for any one given value because the distribution is continuous, we can determine the probability for a given interval of values.  The probability for an interval is equal to the area under the density curve. The total area under the curve is 1.00, or 100%.  In other words, 100% of observations fall under the curve.

Example

The next figure shows the probability that the IQ of a randomly selected individual will be between 115 and 130. This probability is equal to the shaded area under the curve between 115 and 130.

IQ Area Between 115 and 130

Soon we will learn how to use the normal distribution (i.e., z distribution) to determine what proportion of the curve is shaded.

Empirical Rule Review

The Empirical Rule can be used to estimate the proportion of observations that should fall within the intervals of one, two, and three standard deviations of the mean:

68% of observations: \(\mu\pm 1(\sigma)\)

95% of observations: \(\mu\pm 2(\sigma)\)

99.7% of observations: \(\mu\pm 3(\sigma)\)

 

Examples

Middle 95%

Given that for the distribution of IQ scores, \(\mu=100\) and \(\sigma=15\), let's apply the Empirical Rule to determine between which two scores the middle 95% of indidivuals fall.

Middle 95%: \(100\pm2(15)=[70,130]\)
The middle 95% of IQ scores fall between 70 and 130.

IQ Area Between 70 and 130

 

Middle 99.7%

The Empirical Rule also stated that about 99.7% (nearly all) of a bell-shaped dataset will be in the interval \(mean\pm 3(standard\;deviation)\).
\(100\pm 2(15)= [55, 145]\)


99.7% of IQ scores are between 55 and 145. Notice that this interval roughly gives the complete range of the density curve shown above.

5.6 - Finding Probabilities using Minitab Express

Here we will walk through a few examples of using Minitab Express to find various probabilities. We will using the following scenario: Suppose vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph.

Cumulative Probability

Remember that the cumulative probability for a value is the probability less than or equal to that value. 

Question: What is the probability that a randomly selected vehicle will be going 73 mph or slower?

Here is Minitab output showing that the probability = .9452 that the speed of a randomly selected vehicle is less than or equal to 73 mph.

Minitab output will give the probability of X less than or equal to 73, which is 0.9452.

We can find this probability using either Minitab Express or Minitab:

To calculate normal random variable probabilities in Minitab:

  1. Open Minitab without data.
  2. From the menu bar select Calc > Probability Distribution > Normal.
  3. Select the radio button for Cumulative Probability (this is the default option)
  4. The window in Minitab to compute the cumulative probability with normal distribution

  5. In the text box for Mean enter 65
  6. In the text box for Standard Deviation enter 5
  7. Since we do not have a column of data select the radio button for Input Constant and enter 73
  8. Click OK
  9. The output is as follows:

In Minitab output, the probability of X less than or equal to 73 is 0.9452.

watch!

To calculate normal random variable probabilities in Minitab:

  1. Open Minitab Express without any data.
  2. From the menu bar, select Statistics > Probability Distributions > CDF/PDF > Cumulative (CDF).
  3. Since you want to know the probability that the speed of a randomly selected vehicle is less than or equal to 73 mph, make sure the "Form of inpu" drop-down menu says "A single value" and enter 73 into the "Value" box.
  4. Make sure the "Distribution" drop-down menu says "Normal", and enter 65 into the "Mean" box and 5 into the "Standard deviation" box.
  5. Select "Display a table of cumulative probabilities" to show the output.
  6. Click OK.

The result should be the following output:

minitab express output of the normal random variable probabilities.

watch!

 

Here is a figure that illustrates the cumulative probability we found using this procedure:

Normal distribution less than 73 shaded

 

"Greater than" Probabilities

Sometimes we want to know the probability that a variable has a value greater than some value. For instance, we might want to know the probability that a randomly selected vehicle speed is greater than 73 mph, written \(P(X > 73)\).

Previously we found \(P(Speed<73)=.9452\). The general rule for a "greater than" situation is\(P(greater\;than\;a\;value)=1-P(less\;than\;or\;equal\;to\;the\;value)\). Thus, \(P(Speed>73)=1-.9452=.0548\). The probability that a randomly selected vehicle will be going 73 mph or greater is .0548, or 5.48%.

 

Question: What is the probability that a randomly selected vehicle will be going more than 60 mph?

Using Minitab we can find that the probability is .1587 that a speed is less than or equal to 60 mph. Thus\(P(Speed>60mph)=1-.1587 = .8413\).

The relevant Minitab output and a figure showing the cumulative probability for 60 mph follows:

In Minitab output, the probability of X less than or equal to 60 is 0.1587.

Normal distribution less than 60 shaded

 

"In between" Probabilities

Suppose we want to know the probability a normal random variable is within a specified interval. For instance, suppose we want to know the probability a randomly selected speed is between 60 and 73 mph. The simplest approach is to subtract the cumulative probability for 60 mph from the cumulative probability for 73. In other words, \(P(60<Speed<73)=P(Speed<73)-P(Speed<60)=.9452-.1587=.7875\)

This can also be written as P(60 < X < 73) = 0.7875, where X is speed.

Minitab output for PDF between 60 and 73 mph

The general rule for an "in between" probability is P( between a and b ) = cumulative probability for value b − cumulative probability for value a

This may also be written as  \(P(a<X<b)=P(X<b)-P(X<a)\).


Finding Percentiles

We may wish to know the value of a variable that is a specified percentile of the values.

To calculate percentiles in Minitab:

  1. Open Minitab without data.
  2. From the menu bar select Calc>Probability Distribution> Normal.
  3. Select the radio button for Inverse Cumulative Probability
  4. In the text box for Mean enter 65
  5. In the text box for Standard Deviation enter 5
  6. Since we do not have a column of data select the radio button for Input Constant and enter 0.9999
  7. Click OK
  8. The output is as follows:

In Minitab output, the percentile corresponding to probability 0.9999 is 83.5951

watch!

To calculate percentiles in Minitab Express:

  1. Open Minitab without data.
  2. From the menu bar, select Statistics > Probability Distributions > CDF/PDF > Inverse (ICDF).
  3. Since we do not have a column of data, make sure the form of input is "A single value", and enter 0.9999.
  4. Make sure the distribution is "Normal", and enter the mean of 65 and the standard deviation of 5.
  5. Under output, select "Display a table of inverse cumulative probabilities".
  6. Click OK.

 The result should be the following output:

minitab express output of calculating percentiles

watch!

Note:

In Minitab output, the percentile corresponding to probability 0.9999 is 83.5951

In Minitab output, the percentile corresponding to probability 0.25 is 69.6041

5.7 - Finding Probabilities using a Standard Normal Table

Recall from Lesson 3 the formula for computing the z-score for an individual: \(z=\frac{x-\overline x}{s}\). That formula used sample statistics. This formula can also be written using population parameters: \(z=\frac{x-\mu}{\sigma}\)

Use Table A in Appendix A of your textbook or see a copy at Standard Normal Table

Table A in the textbook gives normal curve cumulative probabilities for standardized scores. This is also known as a z table.

Row labels of Table A give possible z-scores up to one decimal place. The column labels give the second decimal place of the z-score.

The cumulative probability for a value equals the cumulative probability for that value's z-score.

Example: Less Than

Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph.

What is the probability that a randomly selected car is going 73 mph or less?

It's often helpful to begin by sketching a normal distibution and shading in the appropriate region. From the graph below we can see that more than half of the curve is shaded in; this means that our final result should be greater than .50

Normal distribution less than 73 shaded

Let’s use the z table to determine the proportion of the curve under 73 mph.

First, we need to compute the z score for this speed: \(z=\frac{73-65}{5}=1.60\)

Now we can use the z table to determine the proportion of the curve that is less than a z score of 1.6 by looking up 1.60. We look in the 1.6 row and the .00 column (1.6 plus .00 equals 1.60). The cumulative probability for z=1.60 is .9452, the same value that we got previously when using Minitab Express. There is a 94.52% chance of randomly selecting a vehicle that is going 73 mph or less.

Reading the z table 1.60

Example: Less Than

What is the probability that a car is going 60 mph or less?

Normal distribution less than 60 shaded

For speed = 60 the z-score is: \(z=\frac{60-65}{5}=-1.00\)

We look up -1.00 on the z table below and find a cumulative probability of .1584. There is a 15.84% chance of randomly selecting a vehicle that is going 60 mph or less.

Table A.1 gives this information:

Reading the z table -1

Example: Greater Than

Suppose pulse rates of adult females have a normal curve distribution with mean of 75 and a standard deviation of 8. What is the probability that a randomly selected female has a pulse rate greater than 85 ? Be careful ! Notice we want a "greater than" and the interval we want is entirely above average, so we know the answer must be less than .50

Normal distribution pulse greater than 85

If we use Table A.1, the first step is to calculate the z-score associated with a pulse rate of 85: \(z=\frac{85-75}{8}=1.25\).

Given that z=1.25, we can use the z-table to determine the cumulative probability:

Reading the z table 1.25

The cumulative probability for z = 1.25 is .8944. This is the proportion below a pulse rate of 85, but we want to know the proportion above a pulse rate of 85.

\(P(X>85) = 1 - P(X<85) = 1 −.8944 =.1056\)

The probability that a randomly selected female will have a pulse rate above 85 is .1056

Example: In Between

We know that for IQ scores \(\mu=100\) and \(\sigma=15\). What proportion of IQ scores fall between 100 and 130?

Normal distribution of IQ scores between 100 and 130

First we must compute the z score associate which each of these IQ scores.

For an IQ of 100, \(z=\frac{100-100}{15}=0\)

For an IQ of 130, \(z=\frac{130-100}{15}=2.00\)

We are looking for the proportion of observations that fall between a z score of 0 and a z score of 2.00.

Reading the z table 0.00 and 2.00

Using the z table above, \(P(z<0.00)=.5000\) and \(P(z<2.00)=.9772\)

\(P(0.00<z<2.00)=P(z<2.00)-P(z<0.00)=.9772-.5000=.4772\)

The proportion of IQ scores between 100 and 130 is .4772, or 47.72%.

The following table reviews the procedures that you have just learned for determining various probabilities given observations using a z table.

Type of Question

Steps

Probability less than X

1. Compute a z score for observation X

2. Look up the cumulative probability on the z table

Probability greater than X

1. Compute a z score for observation X

2. Look up the cumulative probability on the z table

3. Subtract the cumulative probability from 1

Probability between X and Y

1. Compute the z scores for both observation X and Y

2. Look up the cumulative probabilities for both z scores

3. Subtract the cumulative probability for X from the cumulative probability for Y

5.8 - Review of Finding the Proportion Under the Normal Curve

On Your Own

Practice finding the proportion of observations under the normal curve. Each question can be answered using either Minitab Express or the z table.  Work through each example then click the icon to view the solution and compare your answers.

HINT: Drawing the normal curve and shading in the region you are looking for is often helpful.

try it!  1. What proportion of the standard normal curve is less than a z score of 1.64?

 

try it!  2. What proportion of the standard normal curve falls above a z score of 1.33?

 

try it!  3. What proportion of the standard normal curve falls between a z score of -.50 and a z score of +.50?

 

try it!  4. At one private school, a minimum IQ score of 125 is necessary to be considered for admission. IQ scores have a mean of 100 and standard deviation of 15. Given this information, what proportion of children are eligible for consideration for admission to this school?

 

try it!  5. ACT scores have a mean of 18 and a standard deviation of 6. What proportion of test takers score between a 20 and 26?

 

try it!  6. A men’s clothing company is doing research on the height of adult American men in order to inform the sizing of the clothing that they offer. The height of males in the United States is normally distributed with a mean of 175 cm and a standard deviation of 15 cm. Men who are more than 30 cm different (shorter or taller) from the mean are classified by the apparel company as special cases because they do not fit in their regular length clothing. Given this information, what proportion of men would be classified as special cases?

5.9 - Summary

In this lesson we examined a number of probability distributions including discrete, binomial, and normal. The next lesson will continue to explore probability distributions with an emphasis on the distribution of sample means. It will also introduce a new distribution that is similar in shape to the normal distribution: the t distribution.

Take a moment to review what you learned in this lesson before continuing to the next.

Lesson 5 Learning Objectives

Upon completion of this lesson, you will be able to:

  • distinguish between discrete and continuous random variables.
  • find probabilities associated with a discrete probability distribution.
  • compute the mean and standard deviation of a discrete probability distribution.
  • find probabilities associated with a binomial distribution.
  • find probabilities associated with a normal probability distribution (i.e., z distribution) using Minitab Express and the standard normal table.