Lesson 5: Probability Distributions

In this lesson we will begin to explore the concept of statistical inference. We will look at both discrete and continuous probability distributions. The concepts of standard error and the Central Limit Theorem will be introduced which will serve as the base for the remaining lessons in this course.

Lesson 5 Learning Objectives

Upon completion of this lesson, you will be able to:

• distinguish between discrete and continuous random variables.
• find probabilities associated with a discrete probability distribution.
• compute the mean and standard deviation of a discrete probability distribution.
• find probabilities associated with a binomial distribution.
• find probabilities associated with a normal probability distribution (i.e., z distribution) using Minitab Express and the standard normal table.

Review

Before we begin new content, we should review a few terms from previous lessons that we will see again in this lesson:

Discrete: Data that can only take on set number of values

Continuous: Quantitative data that can take on any value between the minimum and maximum, and any value between two other values

Probability: The likelihood of an event occuring; $$P(A)=\frac{number \: of \:events \:considered\: outcome \:A}{number \:of\: total \:events}$$

$$P(A\;\cap\;B)$$: Intersection of A and B; "probability of A and B"

$$P(A\;\cup\;B)$$: Union of A and B; "probability of A or B" (this also includes the probability of A and B)

Mean: The numerical average; calculated as the sum of all of the data values divided by the number of values; represented as $$\overline{X}$$.

Standard deviation: Roughly the average difference between individual data and the mean; for a sample, represented as s, $$s=\sqrt{\frac{\sum (x-\overline{x})^{2}}{n-1}}$$

Sample: A subset of the population from which data is actually collected

Population: The entire set of possible observations in which we are interested

Statistic: A measure concerning a sample (e.g., sample mean)

Parameter: A measure concerning a population (e.g., population mean)

Descriptive statistics: Methods for summarizing data (e.g., mean, median, mode, range, variance, graphs)

Inferential statistics: Methods for using sample data to make conclusions about a population

z score: Distance between an individual score and the mean in standard deviation units; also known as a standardized score.

Empirical Rule: For bell-shaped distributions, about 68% of the data will be within one standard deviation of the mean, about 95% will be within two standard deviations of the mean, and about 99.7% will be within three standard deviations of the mean

5.1 - Random Variables

The word “random” is used often in everyday life. For example, you may hear someone say “We randomly decided to go out for dinner last night.” But is this really a random event? No, this is a conscious decision that was made on the basis of other variables such as hunger and the lack of satisfaction with other options such as cooking one’s own dinner.

In statistics, the word random has a different meaning. Something is random when it varies by chance. For example, when rolling a six sided die there are six equally possible outcomes, the observed outcome on any one roll is random. The variation of a random event such as rolling a die can be described by the probability distributions that we will see in this lesson.

Random variable: a numerical characteristic that takes on different values due to chance

Examples

Coin Flips
The number of heads in four flips of a coin (a numerical property of each different sequence of flips) is a random variable because the results will vary between trials.

Heights
Sample of 100 are repeatedly pulled from the population of all Penn State students and their heights are measured. The mean height of samples of 100 Penn State students is a random variable because the statistic will vary between samples. While most sample means will be similar to the population mean, they will not all equal the population mean due to random sampling variation.

Random variables are classified into two broad types: discrete and continuous. A discrete random variable has a countable set of distinct possible values. A continuous random variable is such that any value (to any number of decimal places) within some interval is a possible value.

Examples

Discrete Random Variables:
• Number of heads in 4 flips of a coin (possible outcomes are 0, 1, 2, 3, 4)
• Number of classes missed last week (possible outcomes are 0, 1, 2, 3, ..., up to the maximum number of classes)

 X Probability +\$2 +\$1 -\$1 1/6 1/6 4/6 $$E(X)= \2(\frac {1}{6})+\1 (\frac {1}{6})+(-\1)(\frac {4}{6})=\\frac{-1}{6}= -\ 0.17$$ The interpretation is that if you play many times, the average outcome is losing 17 cents per play. Thus, over time you should expect to lose money. Example Using the probability distribution for number of tattoos, let's find the mean number of tattoos per student.  Tattoos Probability 0 1 2 3 4 0.85 0.12 0.015 0.01 0.005 $$E(X)=0 (.85)+1(.12)+ 2(.015) +3 (.010) +4(.005) =.20$$ The mean number of tattoos per student is .20. Symbols for Population Parameters Recall from Lesson 3, in a sample, the mean is symbolized by $$\overline{x}$$ and the standard deviation by $$s$$. Because the probabilities that we are working with here are computed using the population, they are symbolized using lower case Greek letters. The population mean is symbolized by $$\mu$$ (lower case "mu") and the population standard deviation by $$\sigma$$ (lower case "sigma").  Sample Statistic Population Parameter Mean $$\overline{x}$$ $$\mu$$ Variance $$s^{2}$$ $$\sigma ^{2}$$ Standard Deviation $$s$$ $$\sigma$$ Also recall that the standard deviation is equal to the square root of the variance. Thus, $$\sigma=\sqrt{(\sigma ^{2})}$$ Standard Deviation of a Discrete Random Variable Knowing the expected value is not the only important characteristic one may want to know about a set of discrete numbers: one may also need to know the spread, or variability, of these data. For instance, you may "expect" to win \$20 when playing a particular game (which appears good!), but the spread for this might be from losing \$20 to winning \$60. Knowing such information can influence you decision on whether to play.

To calculate the standard deviation we first must calculate the variance. From the variance, we take the square root and this provides us the standard deviation. Conceptually, the variance of a discrete random variable is the sum of the difference between each value and the mean times the probility of obtaining that value, as seen in the conceptual formulas below:

Conceptual Formulas

Variance for a Discrete Random Variable

$$\sigma ^2= \sum [(x_i-\mu)^2 p_i]$$

Standard Deviation for a Discrete Random Variable

$$\sigma = \sqrt {\sum [(x_i-\mu)^2 p_i}]$$

$$x_i$$= value of the ith outcome
$$\mu= E(X)=\sum x_i p_i$$
$$p_i$$ = probability of the ith outcome

In these expressions we substitute our result for E(X) into $$\mu$$ because $$\mu$$ is the symbol used to represent the mean of a population .

However, there is an easier computational formula. The compuational formula will give you the same result as the conceptual formula above, but the calculations are simplier.

Computational Formulas

Variance for a Discrete Random Variable

$$\sigma ^2= [\sum (x_i^2 p_i )]-\mu ^2$$

Standard Deviation for a Discrete Random Variable

$$\sigma = \sqrt {[\sum (x_i^2 p_i)] -\mu ^2}$$

$$x_i$$= value of the ith outcome
$$\mu= E(X)=\sum x_i p_i$$
$$p_i$$ = probability of the ith outcome

Notice in the summation part of this equation that we only square each observed X value and not the respective probability. Also note that the $$\mu$$ is outside of the summation.

Example

Going back to the first example used above for expectation involving the dice game, we would calculate the standard deviation for this discrete distribution by first calculating the variance:

 X Probability +\$2 +\$1 -\\$1 1/6 1/6 4/6

$$\sigma ^2= [\sum x_i^2 p_i ]-\mu ^2 = [2^2 (\frac{1}{6})+1^2 (\frac{1}{6})+(-1)^2 (\frac{4}{6})]-(- \frac{1}{6})^2$$

$$=[ \frac{4}{6}+\frac {1}{6}+ \frac{4}{6}]-\frac{1}{36} = \frac{53}{36}=1.472$$

The variance of this discrete random variable is 1.472.

$$\sigma=\sqrt{(\sigma ^{2})}$$

$$\sigma=\sqrt{1.472}=1.213$$

The standard deviation of this discrete random vairable is 1.213.

Video Review: Working With Discrete Random Variables

This video walks through one example of a discrete random variable. It includes the construction of a cumulative probability distribution and the calculation of the mean and standard deviation.

5.4 - Binomial Random Variable

Binomial random variable: A specific type of discrete random variable that counts how often a particular event occurs in a fixed number of tries or trials

For a variable to be a binomial random variable, ALL of the following conditions must be met:

1. There are a fixed number of trials (a fixed sample size)
2. On each trial, the event of interest either occurs or does not
3. The probability of occurrence (or not) is the same on each trial
4. Trials are independent of one another

Examples of Binomial Random Variables

• Number of correct guesses at 30 true-false questions when you randomly guess all answers
• Number of winning lottery tickets when you buy 10 tickets of the same kind
• Number of left-handers in a randomly selected sample of 100 unrelated people
• Number of tails when flipping a coin 10 times

Notation

n = number of trials

p = probability event of interest occurs on any one trial

Example

Number of correct guesses at 30 true-false questions when you randomly guess all answers
There are 30 trials, therefore n = 30
There are two possible outcomes (true and false) that are equally probable, therefore p = 1/2 = .5

Probabilities for Binomial Random Variables

The conditions for being a binomial variable lead to a somewhat complicated formula for finding the probability any specific value occurs (such as the probability you get 20 right when you guess as 30 True-False questions.)

We'll use Minitab Express to find probabilities for binomial random variables. However, for those of you who are curious, the by hand formula for the probability of getting a specific outcome in a binomial experiment is:

Binomial Random Variable Probability

$P(x)= \frac {n!}{x!(n-x)!} p^x (1-p)^{n-x}$

n = number of trials
x = number of successes
p = probability event of interest occurs on any one trial

! is the symbol for factorial. For a review of factorials, see the course algebra review page.

One can use the formula to find the probability or alternatively, use Minitab Express to find the probability. In the homework, you may use the method that you are more comfortable with unless specified otherwise.

In the following Minitab Express example we will find P(x) for n = 20, x =3, and p = 0.4

To calculate binomial random variable probabilities in Minitab:

1. Open Minitab without data.
2. From the menu bar select Calc > Probability Distributions > Binomial.
3. Choose Probability since we want to find the probability x = 3.
4. Enter 20 in the text box for number of trials.
5. Enter 0.4 in the text box for probability of success (note for Minitab versions over 14 this now labeled event probability).
6. Since we do not have a column of data select the radio button for Input Constant and enter 3.
7. Click Ok.

Minitab output:

Probability Density Function

Binomial with n = 20 and p = 0.4

 x P(X = x) 3.00 0.0123

Video Review

To calculate binomial random variable probabilities in Minitab Express:

1. Open Minitab Express without data.
2. From the menu bar, select Statistics > Probability Distributions > CDF/PDF > Probability (PDF).
3. Since we want to find the probability that x = 3, enter 3 into the "Value" box
4. In the "Distribution" drop down menu, select Binomial.
5. Enter 20 into the "Number of trials" box, and 0.4 into the "Event probability" box.
6. Select "Display a table of probability density values" to show the output.
7. Click Ok

The result should be the following output:

Video Review

In the following example, we illustrate how to use the formula to compute binomial probabilities by hand. If you don't like to use the formula, you can also use Minitab Express to find the probabilities.

Example

Red Flowers

Cross-fertilizing a red and a white flower produces red flowers 25% of the time. Now we cross-fertilize five pairs of red and white flowers and produce five offspring. Find the probability that there will be no red flowered plants in the five offspring.

X = # of red flowered plants in the five offspring.

The number of red flowered plants has a binomial distribution with n = 5, p = .25

$$P(X=0)=\frac{5!}{0!(5-0)!} .25 ^0 (1- .25)^5 =1 \times .25^0 \times .75^5 =.237$$

There is a 23.7% chance that none of the five plants will be red flowered.

Cumulative probability: Likelihood that a certain number of successes or fewer will occur.

Binomial random variable probabilities are mutually exclusive, therefore we can use the addition rule that we learned in Lesson 4.

Example

Red Flowers, cont.

Continuing with the red flowers example, what if we wanted to know the probability that there would be one or fewer red flowered plants?

\begin{align}
P(X\ is\ 1\ or\ less)&=P(X=0)+P(X=1)\\
&= \frac{5!}{0!(5-0)!} .25^0 (1-.25)^5+\frac{5!}{1!(5-1)!} .25^1 (1-.25)^4\\
& = .237 +.395=.632 \\
\end{align}

There is a 63.2% chance that one or fewer of the five plants will be red flowered.

In the red flowers example, we first computed P(X = x) and then P(X ≤ x). This latter expression is called finding a cumulative probability because you are finding the probability that has accumulated from the minimum to some point, i.e. from 0 to 1 in this example

To use Minitab Express to solve a cumulative probability binomial problem, return to Statistics > Probability Distributions> CDF/PDF > Cumulative Distribution Function (CDF). For Value enter 1. For distribution select the binomial. There are 5 trials and the event probability is .25

To use Minitab to solve a cumulative probability binomial problem, return to Calc > Probability Distributions > Binomial as shown above. Now however, select the radio button for Cumulative Probability. For Number of Trials enter 5 and the event probability is .25. Click the radio button for Input Constant and enter the x value of 1.

Expected Value and Standard Deviation for Binomial Random Variable

The formula given earlier for discrete random variables could be used, but the good news is that for binomial random variables a shortcut formula for expected value (the mean) and standard deviation can also be used.

Binomial Random Variable Formulas

$\mu=np$

$\sigma=\sqrt {np(1-p)}$

n = number of trials
p = probability event of interest occurs on any one trial

After you use this formula a couple of times, you'll realize this formula matches your intuition. For instance, the “expected” number of correct (random) guesses at 30 True-False questions is np = (30)(.5) = 15 (half of the questions). For a fair six-sided die rolled 60 times, the expected value of the number of times a “1” is tossed is np = (60)(1/6) = 10.

The standard deviations for these would be, for the True-False test, $$\sigma=\sqrt{30 (0.5) (1-0.5)}=\sqrt{7.5}=2.74$$, and for the die, $$\sigma=\sqrt{60 \left( \frac{1}{6}\right) \left(1-\frac {1}{6}\right)}=\sqrt{ \frac{50}{6}}=2.89$$.

Example

Roulette

A roulette wheel has 38 slots, 18 are red, 18 are black, and 2 are green.You play five games and always bet on red.

How many games can you expect to win?

Recall, you play five games and always bet on red.  $$n=5$$ and $$p=\frac{red \;slots}{total \;slots}=\frac{18}{38}$$

$$\mu=np=5 \left( \frac{18}{38}=2.3684\right)$$

$$\sigma=\sqrt{np(1-p)}=\sqrt{5\left(\frac{18}{38} \right) \left(1-\frac{18}{38}\right)}=1.1165$$

Out of 5 games, you can expect to win 2.3684 (with a standard deviation of 1.1165).

What is the probability that you will win all five games?

$$P(x)= \frac {n!}{x!(n-x)!} p^x (1-p)^{n-x}$$

$$P(X=5)= \frac {5!}{5!(5-5)!}\left( \frac{18}{38} \right)^5 \left(1-\frac{18}{38}\right)^{5-5}$$

$$P(X=5)=\frac{5!}{5!0!} \left(.4737^{5}\right) .5263^{0} = 1(.0238)(1)=.0238$$

There is a 2.38% chance that you will win all five out of five games.

If you win three or more games, you make a profit. If you win two or fewer games, you lose money. What is the probability that you will win no more than two games?

$$P(X\leq 2)=P(X=0)+P(X=1)+P(X=2)$$

$$P(X=0)=\frac {5!}{0!(5-0)!} \left ( \frac{18}{38} \right )^0\left(1-\frac{18}{38}\right)^{5-0}=.0404$$

$$P(X=1)=\frac {5!}{1!(5-1)!} \left ( \frac{18}{38} \right )^1\left(1-\frac{18}{38}\right)^{5-1}=.1817$$

$$P(X=2)=\frac {5!}{2!(5-2)!} \left ( \frac{18}{38} \right )^2\left(1-\frac{18}{38}\right)^{5-2}=.3271$$

$$P(X\leq 2)=.0404+.1817+.3271=.5493$$

There is a 54.93% chance that you will win no more than two games. In other words, there is a 54.93% chance that you will lose money.

Video Review: Working with Binomial Random Variables

Binomial Random Variable: Flipping a Coin Example

A fair coin is flipped 10 times. This example is used to demonstrate calculations concerning binomial random variables. Hand calculations are performed and Minitab Express is used.

Binomial Random Variable: Guessing on a Multiple Choice Quiz Example

A class is taking a multiple choice quiz. There are 6 questions, each with 4 options. The professor accidently brought a quiz from a different, much more advanced class. All students randomly guess on each item. This is a binomial random variable. We compute the mean and standard deviation by hand. We compute the probability that a student will pass (i.e., at least 60%), the probability that they will get all questions incorrect, and the probability that they will get all questions correct all using Minitab Express.

5.5 - Continuous Random Variable

Density Curves

We just discussed discrete random variables, and now we consider continuous random variables. Recall, a continuous random variable is such that all values (to any number of decimal places) within some interval are possible outcomes. A continuous random variable has an infinite number of possible values so we can't assign probabilities to each specific value. If we did, the total probability would be infinite, rather than 1, as it is supposed to be.

To describe probabilities for a continuous random variable, we use a probability density function.

Probability density function (PDF): A curve such that the area under the curve within any interval of values along the horizontal gives the probability for that interval

Normal Random Variables

The most commonly encountered type of continuous random variable is a normal random variable , which has a symmetric bell-shaped density function. The center point of the distribution is the mean value, denoted by $$\mu$$ ("mu"). The spread of the distribution is determined by the variance, denoted by $$\sigma ^{2}$$ ("sigma squared") or by the square root of the variance called standard deviation, denoted by $$\sigma$$ ("sigma").

Example

The distribution of IQ scores is normal with a mean of 100 and standard deviation of 15.

In other words, $$\mu=100$$ and $$\sigma=15$$. The probability density function is shown below.

Notice that the horizontal axis shows IQ score and the bell is centered at the mean of 100.

While we cannot determine the probability for any one given value because the distribution is continuous, we can determine the probability for a given interval of values.  The probability for an interval is equal to the area under the density curve. The total area under the curve is 1.00, or 100%.  In other words, 100% of observations fall under the curve.

Example

The next figure shows the probability that the IQ of a randomly selected individual will be between 115 and 130. This probability is equal to the shaded area under the curve between 115 and 130.

Soon we will learn how to use the normal distribution (i.e., z distribution) to determine what proportion of the curve is shaded.

Empirical Rule Review

The Empirical Rule can be used to estimate the proportion of observations that should fall within the intervals of one, two, and three standard deviations of the mean:

Middle 68% of observations: $$\mu\pm 1(\sigma)$$

Middle 95% of observations: $$\mu\pm 2(\sigma)$$

Middle 99.7% of observations: $$\mu\pm 3(\sigma)$$

Examples

Middle 95%

Given that for the distribution of IQ scores, $$\mu=100$$ and $$\sigma=15$$, let's apply the Empirical Rule to determine between which two scores the middle 95% of indidivuals fall.

Middle 95%: Approximately $$100\pm2(15)=[70,130]$$

Middle 99.7%

The Empirical Rule also stated that about 99.7% (nearly all) of a bell-shaped dataset will be in the interval $$mean\pm 3(standard\;deviation)$$.

Middle 99.7%: Approximately $$100\pm 3(15)= [55, 145]$$

5.6 - Finding Probabilities using Software

Minitab Express or Minitab can be used to find the proportion of a normal distribution in a given range. For example, if we know that at one highway location vehicles' speeds are normally distributed with a mean of 65 mph and a standard deviation of 5 mph, we can use that information to determine what proportion of vehicles are going under or over the speed limit. In the next pages we will use this scenario to identify the proportion of vehicles at that spot going different speeds. You will also learn how to compute proportions using software.

5.6.1 - Cumulative Probabilities

The cumulative probability for a value is the probability less than or equal to that value. In notation, this is $$P(X\leq x)$$

Let's look at an example.

Scenario: Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph.

Question: What is the probability that a randomly selected vehicle will be going 73 mph or slower?

Here is Minitab Express output showing that $$P(X \leq 73)=0.945201$$. This tells us that the probability that a randomly selected vehicle will be going 73 mph or slower is 0.945201.

We could also say that 94.5201% of vehicles are going 73 mph or slower.

The videos below will walk you through how to obtain this output using Minitab Express or Minitab.

To calculate normal random variable probabilities in Minitab:

1. Open Minitab without data.
2. From the menu bar select Calc > Probability Distribution > Normal.
3. Select the radio button for Cumulative Probability (this is the default option)
4. In the text box for Mean enter 65
5. In the text box for Standard Deviation enter 5
6. Since we do not have a column of data select the radio button for Input Constant and enter 73
7. Click OK
8. The output is as follows:

Video Review

To calculate normal random variable probabilities in Minitab:

1. Open Minitab Express without any data.
2. From the menu bar, select Statistics > Probability Distributions > CDF/PDF > Cumulative (CDF).
3. Since you want to know the probability that the speed of a randomly selected vehicle is less than or equal to 73 mph, make sure the "Form of inpu" drop-down menu says "A single value" and enter 73 into the "Value" box.
4. Make sure the "Distribution" drop-down menu says "Normal", and enter 65 into the "Mean" box and 5 into the "Standard deviation" box.
5. Select "Display a table of cumulative probabilities" to show the output.
6. Click OK.

The result should be the following output:

Video Review

We could also find this proportion by constructing a probability distribution.

In Minitab Express:

1. Open Minitab Express without any data.
2. From the menu bar, select Statistics > Probability Distributions > Distribution Plot
3. Click Display Probability
4. For Distribution, select Normal (this is the default).  In this scenario, our mean is 65 and our standard deviation is 5.
5. Under Shade the area corresponding to the following: select A specified x value and Left tail.  The X value is 73.

The result is the following output which shows us that of .945201 the distribution is less than 73 mph.

In Minitab:

1. Open Minitab Express without any data.
2. From the menu bar, select Graph > Probability Distribution Plots
3. Click View Probability
4. For Distribution, select Normal (this is the default). In this scenario, our mean is 65 and our standard deviation is 5.
5. Under the Shaded Area tab, select X Value and Left Tail. The X value is 73.

The result is the following output which shows us that .9452 of the distribution is less than 73 mph.

5.6.2 - "Greater than" Probabilities

Sometimes we want to know the probability that a variable has a value greater than some value. For instance, we might want to know the probability that a randomly selected vehicle speed is greater than 73 mph, written $$P(X > 73)$$.

Previously we found $$P(X<73)=.9452$$. The general rule for a "greater than" situation is$$P(X > x)=1-P(X \leq x)$$. Thus, $$P(X>73)=1-.9452=.0548$$. The probability that a randomly selected vehicle will be going 73 mph or greater is .0548.

If we did not know $$P(X \leq73)$$ we could compute this probability by constructing a probability distribution in Minitab Express or Minitab.

In Minitab Express:

1. Open Minitab Express without any data.
2. From the menu bar, select Statistics > Probability Distributions > Distribution Plot
3. Click Display Probability
4. For Distribution, select Normal (this is the default). In this scenario, our mean is 65 and our standard deviation is 5.
5. Under Shade the area corresponding to the following: select A specified x value and Right tail.  The X value is 73.

The result is the following output which shows us that 0.0547993 of the distribution is greater than 73 mph.

In Minitab:

1. Open Minitab Express without any data.
2. From the menu bar, select Graph > Probability Distribution Plots
3. Click View Probability
4. For Distribution, select Normal (this is the default). In this scenario, our mean is 65 and our standard deviation is 5.
5. Under the Shaded Area tab, select X Value and Right Tail. The X value is 73.

The result is the following output which shows us that 0.05480 of the distribution is greater than 73 mph.

5.6.3 - "In between" Probabilities

Suppose we want to know the probability a normal random variable is within a specified interval. For instance, suppose we want to know the probability a randomly selected vehicle is between 60 and 73 mph?

We could compute the probability that the speed is less than 73 mph and the probability that the speed is less than 60 mph and subtract the two. In other words:

$$P (60 < X < 73) = P(X<73) - P(X<60)$$

Or, we could use statistical software to find this range:

In Minitab Express:

1. Open Minitab Express without any data.
2. From the menu bar, select Statistics > Probability Distributions > Distribution Plot
3. Click Display Probability
4. For Distribution, select Normal (this is the default). In this scenario, our mean is 65 and our standard deviation is 5.
5. Under Shade the area corresponding to the following: select A specified x value and Middle. The X value 1 is 60 and X value 2 is 73.

The result is the following output which shows us that 0.786545 of the distribution is between 60 mph and 73 mph.

In Minitab:

1. Open Minitab Express without any data.
2. From the menu bar, select Graph > Probability Distribution Plots
3. Click View Probability
4. For Distribution, select Normal (this is the default). In this scenario, our mean is 65 and our standard deviation is 5.
5. Under the Shaded Area tab, select X Value and Middle. The X Value 1 is 60 and X Value 2 is 73.

The result is the following output which shows us that 0.7865 of the distribution is between 60 mph and 73 mph.

5.6.4 - Finding Percentiles

Percentile: Proportion of values below a given value

For example, if your test score is in the 88th percentile, then you scored better than 88% of test takers.

We may wish to know the value of a variable that is a specified percentile. For example, what speed is the 99.99th percentile of speeds at the highway location in our earlier example? Recall, the mean vehicle speed is 65 mph with a standard deviation of 5 mph.

To calculate percentiles in Minitab:

1. Open Minitab without data
2. From the menu bar select Calc>Probability Distribution> Normal
3. Select Inverse Cumulative Probability
4. Our mean is 65 and our standard deviation is 5
5. Select the radio button for Input Constant and enter .9999
6. Click OK
7. The output is as follows:

Video Review

To calculate percentiles in Minitab Express:

1. Open Minitab Express without data
2. On a PC: From the menu bar, select Statistics > Probability Distributions > CDF/PDF > Inverse (ICDF)

3. On a MAC: From the menu bar, select Statistics > Probability Distributions > Inverse Cumulative Distribution Function
4. Form of input is A single value
5. Value  is .9999.
6. Distribution is Normal
7. Our mean is 65 and our standard deviation is 5
8. Under Output, select Display a table of inverse cumulative probabilities
9. Click OK

The result should be the following output:

Video Review

5.7 - Finding Probabilities using a Standard Normal Table

When finding the probability associated with a score on a normal distribution it may be necessary to first convert the observation to a z score in order to use the z table to find a probability. Recall from Lesson 2 the formula for computing the z-score for an individual observation:

z Score

$z=\frac{x - \overline{x}}{s}$

z = z score
x = original individual score
$$\overline{x}$$ = mean of the original distribution
s = standard deviation of the original distribution

This formula can also be written using population parameters: $$z=\frac{x-\mu}{\sigma}$$

We will be using Table A in Appendix A of the Agresti, Franklin, and Klingenberg textbook. Table A in the textbook gives normal curve cumulative probabilities for standardized scores. This is also known as a z table. Row labels of Table A give possible z-scores up to one decimal place. The column labels give the second decimal place of the z-score. The cumulative probability for a value equals the cumulative probability for that value's z-score.

The examples on the following pages will walk you through examples of finding the probability less than, greater than, or between two values on the normal distribution. Remember, when finding the probability associated with an observation that is on a scale other than the standard normal distirbution (i.e., $$\mu=0$$ and $$\sigma=1$$), you must first translate the score to a z score before using the table.

5.7.1 - Probability Less Than Examples

Example #2

Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph.

What is the probability that a randomly selected car is going 73 mph or less?

It's often helpful to begin by sketching a normal distibution and shading in the appropriate region. From the graph below we can see that more than half of the curve is shaded in; this means that our final result should be greater than .50

Let’s use the z table to determine the proportion of the curve under 73 mph.

First, we need to compute the z score for this speed: $$z=\frac{73-65}{5}=1.60$$

Now we can use the z table to determine the proportion of the curve that is less than a z score of 1.6 by looking up 1.60. We look in the 1.6 row and the .00 column (1.6 plus .00 equals 1.60). The cumulative probability for z=1.60 is .9452, the same value that we got previously when using Minitab Express. There is a 94.52% chance of randomly selecting a vehicle that is going 73 mph or less.

Example #3

Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph.

What is the probability that a randomly selected car is going 60 mph or less?

For speed = 60 the z-score is: $$z=\frac{60-65}{5}=-1.00$$

We look up -1.00 on the z table below and find a cumulative probability of .1584. There is a 15.84% chance of randomly selecting a vehicle that is going 60 mph or less.

Table A.1 gives this information:

Example #4

IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. What IQ score separates the bottom 30% from the top 70%?

This is also known as the 30th percentile.

First we must look up the z score that separates the bottom 30% from the top 70% of the distribution:

The z value that separates the bottom 30% from the top 70% is approximately - 0.52

We can translate this z score into an IQ score given $$\mu=100$$ and $$\sigma=15$$

$$IQ=15(-0.52)+100=92.2$$

The IQ score that separates the bottom 30% from the top 70% is 92.2

5.7.2 - Probability Greater Than Examples

Example #1

Scores on the SAT-M are normally distributed with a mean of 500 and standard deviation of 100. What SAT-M score is needed to be in the top 5% of the population?

First we will look up the z score that separates the top 5% from the bottom 95%:

The z score that separates the top 5% from the bottom 95% is approximately +1.64. We can find the SAT-M score that corresponds to this z score.

$$SAT-M=100(1.64)+500=664$$

An SAT-M score of 664 is needed to be in the top 5% of the population.

Example #2

Suppose pulse rates of adult females have a normal curve distribution with mean of 75 and a standard deviation of 8. What is the probability that a randomly selected female has a pulse rate greater than 85

If we use Table A.1, the first step is to calculate the z-score associated with a pulse rate of 85: $$z=\frac{85-75}{8}=1.25$$.

Given that z=1.25, we can use the z-table to determine the cumulative probability:

The cumulative probability for z = 1.25 is .8944. This is the proportion below a pulse rate of 85, but we want to know the proportion above a pulse rate of 85.

$$P(X>85) = 1 - P(X<85) = 1 −.8944 =.1056$$

The probability that a randomly selected female will have a pulse rate above 85 is .1056 or 10.56%

5.7.3 - Probability Between Examples

Example #2

We know  IQ scores have a mean of 100 and standard deviation of 15. What proportion of IQ scores fall between 100 and 130?

First we must compute the z score associate which each of these IQ scores.

For an IQ of 100, $$z=\frac{100-100}{15}=0$$

For an IQ of 130, $$z=\frac{130-100}{15}=2.00$$

We are looking for the proportion of observations that fall between a z score of 0 and a z score of 2.00.

Using the z table above, $$P(z<0.00)=.5000$$ and $$P(z<2.00)=.9772$$

$$P(0.00<z<2.00)=P(z<2.00)-P(z<0.00)=.9772-.5000=.4772$$

The proportion of IQ scores between 100 and 130 is .4772, or 47.72%.

5.8 - Review of Finding the Proportion Under the Normal Curve

Video Review: Working with Continuous Random Variables

Finding the Score for a Given Proportion

This video walks through one example. A group of instructors have decided to assign grades on a curve. Given the mean and standard deviation of their students' scores, they want to know what point ranges are associated with which grades. Minitab Express is used.

Practice finding the proportion of observations under the normal curve. Each question can be answered using either Minitab Express or the z table.  Work through each example then click the icon to view the solution and compare your answers.

HINT: Drawing the normal curve and shading in the region you are looking for is often helpful.

1. What proportion of the standard normal curve is less than a z score of 1.64?

2. What proportion of the standard normal curve falls above a z score of 1.33?

3. What proportion of the standard normal curve falls between a z score of -.50 and a z score of +.50?

4. At one private school, a minimum IQ score of 125 is necessary to be considered for admission. IQ scores have a mean of 100 and standard deviation of 15. Given this information, what proportion of children are eligible for consideration for admission to this school?

5. ACT scores have a mean of 18 and a standard deviation of 6. What proportion of test takers score between a 20 and 26?

6. A men’s clothing company is doing research on the height of adult American men in order to inform the sizing of the clothing that they offer. The height of males in the United States is normally distributed with a mean of 175 cm and a standard deviation of 15 cm. Men who are more than 30 cm different (shorter or taller) from the mean are classified by the apparel company as special cases because they do not fit in their regular length clothing. Given this information, what proportion of men would be classified as special cases?

5.9 - Summary

In this lesson we examined a number of probability distributions including discrete, binomial, and normal. The next lesson will continue to explore probability distributions with an emphasis on the distribution of sample means. It will also introduce a new distribution that is similar in shape to the normal distribution: the t distribution.

Take a moment to review what you learned in this lesson before continuing to the next.

Lesson 5 Learning Objectives

Upon completion of this lesson, you will be able to:

• distinguish between discrete and continuous random variables.
• find probabilities associated with a discrete probability distribution.
• compute the mean and standard deviation of a discrete probability distribution.
• find probabilities associated with a binomial distribution.
• find probabilities associated with a normal probability distribution (i.e., z distribution) using Minitab Express and the standard normal table.