In this lesson we will begin to explore the concept of statistical inference. We will look at both discrete and continuous probability distributions. The concepts of standard error and the Central Limit Theorem will be introduced which will serve as the base for the remaining lessons in this course.

**Lesson 5 Learning Objectives**

Upon completion of this lesson, you will be able to:

- distinguish between discrete and continuous random variables.
- find probabilities associated with a discrete probability distribution.
- compute the mean and standard deviation of a discrete probability distribution.
- find probabilities associated with a binomial distribution.
- find probabilities associated with a normal probability distribution (i.e., z distribution) using Minitab Express and the standard normal table.

Before we begin new content, we should review a few terms from previous lessons that we will see again in this lesson:

**Discrete**: Data that can only take on set number of values

**Continuous**: Quantitative data that can take on any value between the minimum and maximum, and any value between two other values

**Probability**: The likelihood of an event occuring; \(P(A)=\frac{number \: of \:events \:considered\: outcome \:A}{number \:of\: total \:events}\)

**\(P(A\;\cap\;B)\)**: Intersection of A and B; "probability of A and B"

**\(P(A\;\cup\;B)\)**: Union of A and B; "probability of A or B" (this also includes the probability of A and B)

**Mean**: The numerical average; calculated as the sum of all of the data values divided by the number of values; represented as \(\overline{X}\).

**Standard deviation**: Roughly the average difference between individual data and the mean; for a sample, represented as s, \(s=\sqrt{\frac{\sum (x-\overline{x})^{2}}{n-1}}\)

**Sample**: A subset of the population from which data is actually collected

**Population**: The entire set of possible observations in which we are interested

**Statistic**: A measure concerning a sample (e.g., sample mean)

**Parameter**: A measure concerning a population (e.g., population mean)

**Descriptive statistics**: Methods for summarizing data (e.g., mean, median, mode, range, variance, graphs)

**Inferential statistics**: Methods for using sample data to make conclusions about a population

**z score**: Distance between an individual score and the mean in standard deviation units; also known as a standardized score.

**Empirical Rule**: For bell-shaped distributions, about 68% of the data will be within one standard deviation of the mean, about 95% will be within two standard deviations of the mean, and about 99.7% will be within three standard deviations of the mean

The word “random” is used often in everyday life. For example, you may hear someone say “We randomly decided to go out for dinner last night.” But is this really a random event? No, this is a conscious decision that was made on the basis of other variables such as hunger and the lack of satisfaction with other options such as cooking one’s own dinner.

In statistics, the word random has a different meaning. Something is random when it varies by chance. For example, when rolling a six sided die there are six equally possible outcomes, the observed outcome on any one roll is random. The variation of a random event such as rolling a die can be described by the probability distributions that we will see in this lesson.

**Random variable**: a numerical characteristic that takes on different values due to chance

**Coin Flips**

The number of heads in four flips of a coin (a numerical property of each different sequence of flips) is a **random variable** because the results will vary between trials.

**Heights**

Sample of 100 are repeatedly pulled from the population of all Penn State students and their heights are measured. The mean height of samples of 100 Penn State students is a **random variable** because the statistic will vary between samples. While most sample means will be similar to the population mean, they will not all equal the population mean due to random sampling variation.

Random variables are classified into two broad types: discrete and continuous. A **discrete random variable ** has a countable set of distinct possible values. A **continuous random variable ** is such that any value (to any number of decimal places) within some interval is a possible value.

- Number of heads in 4 flips of a coin (possible outcomes are 0, 1, 2, 3, 4)
- Number of classes missed last week (possible outcomes are 0, 1, 2, 3, ..., up to the maximum number of classes)
- Amount won or lost when betting $1 on the Pennsylvania Daily number lottery

**Continuous Random Variables:**

- Heights of individuals
- Time to finish a test
- Hours spent exercising last week

**Note **: In practice, we don't measure accurately enough to truly see all possible values of a continuous random variable. For instance, in reality somebody may have exercised 4.2341567 hours last week but they probably would round off to 4. Nevertheless, hours of exercise last week is inherently a continuous random variable.

**Probability distribution**: A table, graph, or formula that gives the probability of a given outcome's occurrence

For a discrete random variable, its **probability distribution **(also called the **probability distribution function**) is any table, graph, or formula that gives each possible value and the probability of that value. ** **

**Note**: The total of all probabilities across the distribution must be 1, and each individual probability must be between 0 and 1, inclusive.

What if we flipped a fair coin four times? What are the possible outcomes and what is the probability of each?

Figure 1 below is a probability distribution for the number of heads in 4 flips of a coin. Given that P(Heads)=.50, the probability of not flipping heads at all is 1/16, or .0625. In 6.25% of all trials, we can expect that there will be no heads. This may be written as P(X=0)=.0625. Similarly, the probability of flipping heads once in four trials is 4/16, or .25. In 25% of all trials, we can expect that heads will be flipped exactly once. This may be written as P(X=1)=.25.

This probability distribution could be constructed by listing all 16 possible sequences of heads and tails for four flips (i.e., HHHH, HTHH, HTTH, HTTT, etc.), and then counting how many sequences there are for each possible number of heads. Or, in section 5.4 you will see how these could be computed using binomial random variable techniques.

Heads | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

Probability | 1/16 | 4/16 | 6/16 | 4/16 | 1/16 |

A census was conducted at a university. All students were asked how many tattoos they had.

Figure 2 presents a probability distribution for the discrete variable of number of tattoos for each student. From this table we can find that 85% of students in the population do not have a tattoo, 12% of students in the population have one tattoo, 1.5% of students in the population have two tattoos, and so on. This could be written as P(X=0)=.85, P(X=1)=.12, P(X=2)=.015, etc.** **

Tattoos | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

Probability | .850 | .120 | .015 | .010 | .005 |

**Cumulative probability**: Likelihood of an outcome less than or equal to a given value occuring

To find a **cumulative probability **we add the probabilities for all values qualifying as "less than or equal" to the specified value.

Suppose we want to know the probability that the number of heads in four flips is less than two. If we let X represent number of heads we get on four flips of a coin, then:

Because this is a discrete distribution, the probability of flipping less than two heads is equal to flipping one or zero heads:

\(P(X<2)=P(X=0\cup1)\)

The probability of flipping 1 head and the probability of flipping 0 heads are mutually exclusive events. Thus, \(P(0 \cup1)=P(X=0)+P(X=1)\)

We can use the values from Figure 1 above to solve this equation.

\(P(X=0)+P(X=1)=(1/16)+(4/16)=5/16 \)

**Cumulative distribution**: A listing of all possible values along with the probability of that value and all lower values occuring (i.e., the **cumulative probability**)

Cumulative probabilities are found by adding the probability up to each column of the table. In Figure 3 we find the cumulative probability for one head by adding the probabilities for zero and one. The cumulative probability for two heads is found by adding the probabilities for zero, one, and two. We continue with this procedure until we reach the maximum number of heads, in this case four, which should have a cumulative probability of 1.00 because 100% of trials must have four or fewer heads.

Heads | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

Probability | 1/16 | 4/16 | 6/16 | 4/16 | 1/16 |

Cumulative Probability | 1/16 | 5/16 | 11/16 | 15/16 | 1 |

Let's construct a cumulative distribution for the data concerning number of tattoos.

Tattoos | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

Probability | .850 | .120 | .015 | .010 | .005 |

Cumulative Probability | .850 | .970 | .985 | .995 | 1 |

Note that the cumulative probability for the last column is always 1. That is, 100% of trials will be less than or equal to the maximum value.

**Law of Large Numbers: **Given a large number of repeated trials, the average of the results will be approximately equal to the expected value

**Expected value**: The mean value in the long run for many repeated samples, symbolized as \(E(X)\)

**Expected Value for a Discrete Random Variable**

\(x_i\)= value of the i^{th }outcome

\(p_i\) = probability of the i^{th} outcome

According to this formula, we take each observed X value and multiply it by its respective probability. We then add these products to reach our expected value. You may have seen this before referred to as a **weighted average**. It is known as a weighted average because it takes into account the probability of each outcome and weighs it accordingly. This is in contrast to an unweighted average which would not take into account the probability of each outcome and weigh each possibility equally.

Let's look at a few examples of expected values for a discrete random variable:

A fair six-sided die is tossed. You win \$2 if the result is a “1,” you win \$1 if the result is a “6,” but otherwise you lose \$1.

X | +\$2 | +\$1 | -\$1 |
---|---|---|---|

Probability | 1/6 | 1/6 | 4/6 |

\( E(X)= \$2(\frac {1}{6})+\$1 (\frac {1}{6})+(-\$1)(\frac {4}{6})=\$\frac{-1}{6}= -\$ 0.17 \)

The interpretation is that if you play many times, the average outcome is losing 17 cents per play. Thus, over time you should expect to lose money.

Using the probability distribution for number of tattoos, let's find the mean number of tattoos per student.

Tattoos | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

Probability | .850 | .120 | .015 | .010 | .005 |

\( E(X)=0 (.85)+1(.12)+ 2(.015) +3 (.010) +4(.005) =.20 \)

The mean number of tattoos per student is .20.

Recall from Lesson 3, in a sample, the mean is symbolized by \(\overline{x}\) and the standard deviation by \(s\). Because the probabilities that we are working with here are computed using the population, they are symbolized using lower case Greek letters. The population mean is symbolized by \(\mu\) (lower case "mu") and the population standard deviation by \(\sigma \) (lower case "sigma").

Sample Statistic | Population Parameter | |

Mean | \(\overline{x}\) | \(\mu\) |

Variance | \(s^{2}\) | \(\sigma ^{2}\) |

Standard Deviation | \(s\) | \(\sigma \) |

Also recall that the standard deviation is equal to the square root of the variance. Thus, \(\sigma=\sqrt{(\sigma ^{2})}\)

Knowing the expected value is not the only important characteristic one may want to know about a set of discrete numbers: one may also need to know the spread, or variability, of these data. For instance, you may "expect" to win \$20 when playing a particular game (which appears good!), but the spread for this might be from losing \$20 to winning \$60. Knowing such information can influence you decision on whether to play.

To calculate the standard deviation we first must calculate the variance. From the variance, we take the square root and this provides us the standard deviation. Conceptually, the variance of a discrete random variable is the sum of the difference between each value and the mean times the probility of obtaining that value, as seen in the conceptual formulas below:

**Conceptual Formulas**

**Variance for a Discrete Random Variable**

\( \sigma ^2= \sum [(x_i-\mu)^2 p_i] \)

**Standard Deviation for a Discrete Random Variable**

\( \sigma = \sqrt {\sum [(x_i-\mu)^2 p_i}]\)

\(x_i\)= value of the i^{th }outcome

\(\mu= E(X)=\sum x_i p_i\)

\(p_i\) = probability of the i^{th} outcome

In these expressions we substitute our result for E(X) into \( \mu\)* *because \( \mu\)* *is the symbol used to represent the mean of a population .

However, there is an **easier** computational formula. The compuational formula will give you the same result as the conceptual formula above, but the calculations are simplier.

**Computational Formulas**

**Variance for a Discrete Random Variable**

\( \sigma ^2= [\sum (x_i^2 p_i )]-\mu ^2\)

**Standard Deviation for a Discrete Random Variable**

\( \sigma = \sqrt {[\sum (x_i^2 p_i)] -\mu ^2}\)** **

\(x_i\)= value of the i^{th }outcome

\(\mu= E(X)=\sum x_i p_i\)

\(p_i\) = probability of the i^{th} outcome

Notice in the summation part of this equation that we only square each observed X value and not the respective probability. Also note that the \(\mu\) is outside of the summation.

Going back to the first example used above for expectation involving the dice game, we would calculate the standard deviation for this discrete distribution by first calculating the variance:

X | +\$2 | +\$1 | -\$1 |
---|---|---|---|

Probability | 1/6 | 1/6 | 4/6 |

\( \sigma ^2= [\sum x_i^2 p_i ]-\mu ^2 = [2^2 (\frac{1}{6})+1^2 (\frac{1}{6})+(-1)^2 (\frac{4}{6})]-(- \frac{1}{6})^2\)

\(=[ \frac{4}{6}+\frac {1}{6}+ \frac{4}{6}]-\frac{1}{36} = \frac{53}{36}=1.472 \)

The variance of this discrete random variable is 1.472.

\(\sigma=\sqrt{(\sigma ^{2})}\)

\(\sigma=\sqrt{1.472}=1.213\)

The standard deviation of this discrete random vairable is 1.213.

This video walks through one example of a discrete random variable. It includes the construction of a cumulative probability distribution and the calculation of the mean and standard deviation.

**Binomial random variable**: A specific type of discrete random variable that counts how often a particular event occurs in a fixed number of tries or trials

For a variable to be a **binomial random variable**, **ALL** of the following conditions must be met:

- There are a fixed number of trials (a fixed sample size)
- On each trial, the event of interest either occurs or does not
- The probability of occurrence (or not) is the same on each trial
- Trials are independent of one another

- Number of correct guesses at 30 true-false questions when you randomly guess all answers
- Number of winning lottery tickets when you buy 10 tickets of the same kind
- Number of left-handers in a randomly selected sample of 100 unrelated people
- Number of tails when flipping a coin 10 times

**Notation **

n= number of trials

p= probability event of interest occurs on any one trial

Number of correct guesses at 30 true-false questions when you randomly guess all answers

There are 30 trials, therefore n = 30

There are two possible outcomes (true and false) that are equally probable, therefore p = 1/2 = .5

The conditions for being a binomial variable lead to a somewhat complicated formula for finding the probability any specific value occurs (such as the probability you get 20 right when you guess as 30 True-False questions.)

We'll use Minitab Express to find probabilities for binomial random variables. However, for those of you who are curious, the by hand formula for the probability of getting a specific outcome in a binomial experiment is:

**Binomial Random Variable Probability**

\[P(x)= \frac {n!}{x!(n-x)!} p^x (1-p)^{n-x}\]

n = number of trials

x = number of successes

p = probability event of interest occurs on any one trial

! is the symbol for factorial. For a review of factorials, see the course algebra review page.

One can use the formula to find the probability or alternatively, use Minitab Express to find the probability. In the homework, you may use the method that you are more comfortable with unless specified otherwise.

In the following Minitab Express example we will find *P*(*x*) for *n* = 20, *x* =3, and p = 0.4

To calculate binomial random variable probabilities in Minitab:

- Open Minitab without data.
- From the menu bar select Calc > Probability Distributions > Binomial.
- Choose Probability since we want to find the probability
*x*= 3. - Enter 20 in the text box for number of trials.
- Enter 0.4 in the text box for probability of success (note for Minitab versions over 14 this now labeled event probability).
- Since we do not have a column of data select the radio button for Input Constant and enter 3.
- Click Ok.

*Minitab output:*

Probability Density Function

Binomial with *n* = 20 and *p* = 0.4

x | P(X = x) |

3.00 | 0.0123 |

**Video Review**

To calculate binomial random variable probabilities in Minitab Express:

- Open Minitab Express without data.
- From the menu bar, select Statistics > Probability Distributions > CDF/PDF > Probability (PDF).
- Since we want to find the probability that
*x*= 3, enter 3 into the "Value" box - In the "Distribution" drop down menu, select Binomial.
- Enter 20 into the "Number of trials" box, and 0.4 into the "Event probability" box.
- Select "Display a table of probability density values" to show the output.
- Click Ok

The result should be the following output:

**Video Review**

In the following example, we illustrate how to use the formula to compute binomial probabilities by hand. If you don't like to use the formula, you can also use Minitab Express to find the probabilities.

**Red Flowers**

Cross-fertilizing a red and a white flower produces red flowers 25% of the time. Now we cross-fertilize five pairs of red and white flowers and produce five offspring. Find the probability that there will be no red flowered plants in the five offspring.

*X* = # of red flowered plants in the five offspring.

The number of red flowered plants has a binomial distribution with n = 5, p = .25

\(P(X=0)=\frac{5!}{0!(5-0)!} .25 ^0 (1- .25)^5 =1 \times .25^0 \times .75^5 =.237\)

There is a 23.7% chance that none of the five plants will be red flowered.

**Cumulative probability**: Likelihood that a certain number of successes or fewer will occur.

Binomial random variable probabilities are mutually exclusive, therefore we can use the addition rule that we learned in Lesson 4.

Continuing with the red flowers example, what if we wanted to know the probability that there would be one or fewer red flowered plants?

\begin{align}

P(X\ is\ 1\ or\ less)&=P(X=0)+P(X=1)\\

&= \frac{5!}{0!(5-0)!} .25^0 (1-.25)^5+\frac{5!}{1!(5-1)!} .25^1 (1-.25)^4\\

& = .237 +.395=.632 \\

\end{align}

There is a 63.2% chance that one or fewer of the five plants will be red flowered.

In the red flowers example, we first computed P(X = x) and then P(X ≤ x). This latter expression is called finding a **cumulative probability** because you are finding the probability that has accumulated from the minimum to some point, i.e. from 0 to 1 in this example

**To use Minitab Express to solve a cumulative probability binomial problem**, return to Statistics > Probability Distributions> CDF/PDF > Cumulative Distribution Function (CDF). For Value enter 1. For distribution select the binomial. There are 5 trials and the event probability is .25

**To use Minitab to solve a cumulative probability binomial problem**, return to Calc > Probability Distributions > Binomial as shown above. Now however, select the radio button for Cumulative Probability. For Number of Trials enter 5 and the event probability is .25. Click the radio button for Input Constant and enter the x value of 1.

The formula given earlier for discrete random variables could be used, but the good news is that for binomial random variables a shortcut formula for expected value (the mean) and standard deviation can also be used.

**Binomial Random Variable Formulas**

\[\mu=np\]

\[\sigma=\sqrt {np(1-p)}\]

n = number of trials

p = probability event of interest occurs on any one trial

After you use this formula a couple of times, you'll realize this formula matches your intuition. For instance, the “expected” number of correct (random) guesses at 30 True-False questions is *np *= (30)(.5) = 15 (half of the questions). For a fair six-sided die rolled 60 times, the expected value of the number of times a “1” is tossed is *np *= (60)(1/6) = 10.

The standard deviations for these would be, for the True-False test, \(\sigma=\sqrt{30 (0.5) (1-0.5)}=\sqrt{7.5}=2.74\), and for the die, \(\sigma=\sqrt{60 \left( \frac{1}{6}\right) \left(1-\frac {1}{6}\right)}=\sqrt{ \frac{50}{6}}=2.89\).

**Roulette**

A roulette wheel has 38 slots, 18 are red, 18 are black, and 2 are green.You play five games and always bet on red.

**How many games can you expect to win?**

Recall, you play five games and always bet on red. \(n=5\) and \(p=\frac{red \;slots}{total \;slots}=\frac{18}{38}\)

\(\mu=np=5 \left( \frac{18}{38}=2.3684\right)\)

\( \sigma=\sqrt{np(1-p)}=\sqrt{5\left(\frac{18}{38} \right) \left(1-\frac{18}{38}\right)}=1.1165\)

Out of 5 games, you can expect to win 2.3684 (with a standard deviation of 1.1165).

**What is the probability that you will win all five games? **

\(P(x)= \frac {n!}{x!(n-x)!} p^x (1-p)^{n-x}\)

\(P(X=5)= \frac {5!}{5!(5-5)!}\left( \frac{18}{38} \right)^5 \left(1-\frac{18}{38}\right)^{5-5}\)

\(P(X=5)=\frac{5!}{5!0!} \left(.4737^{5}\right) .5263^{0} = 1(.0238)(1)=.0238\)

There is a 2.38% chance that you will win all five out of five games.

If you win three or more games, you make a profit. If you win two or fewer games, you lose money. **What is the probability that you will win no more than two games? **

\(P(X\leq 2)=P(X=0)+P(X=1)+P(X=2)\)

\(P(X=0)=\frac {5!}{0!(5-0)!} \left ( \frac{18}{38} \right )^0\left(1-\frac{18}{38}\right)^{5-0}=.0404\)

\(P(X=1)=\frac {5!}{1!(5-1)!} \left ( \frac{18}{38} \right )^1\left(1-\frac{18}{38}\right)^{5-1}=.1817\)

\(P(X=2)=\frac {5!}{2!(5-2)!} \left ( \frac{18}{38} \right )^2\left(1-\frac{18}{38}\right)^{5-2}=.3271\)

\(P(X\leq 2)=.0404+.1817+.3271=.5493\)

There is a 54.93% chance that you will win no more than two games. In other words, there is a 54.93% chance that you will lose money.

A fair coin is flipped 10 times. This example is used to demonstrate calculations concerning binomial random variables. Hand calculations are performed and Minitab Express is used.

A class is taking a multiple choice quiz. There are 6 questions, each with 4 options. The professor accidently brought a quiz from a different, much more advanced class. All students randomly guess on each item. This is a binomial random variable. We compute the mean and standard deviation by hand. We compute the probability that a student will pass (i.e., at least 60%), the probability that they will get all questions incorrect, and the probability that they will get all questions correct all using Minitab Express.

We just discussed discrete random variables, and now we consider *continuous random variables*. Recall, a **continuous random variable **is such that all values (to any number of decimal places) within some interval are possible outcomes. A continuous random variable has an infinite number of possible values so we can't assign probabilities to each specific value. If we did, the total probability would be infinite, rather than 1, as it is supposed to be.

To describe probabilities for a continuous random variable, we use a *probability density function. *

**Probability density function (PDF)**:** **A curve such that the area under the curve within any interval of values along the horizontal gives the probability for that interval

The most commonly encountered type of continuous random variable is a **normal random variable **, which has a symmetric bell-shaped density function. The center point of the distribution is the mean value, denoted by \(\mu\) ("mu"). The spread of the distribution is determined by the variance, denoted by \(\sigma ^{2}\) ("sigma squared") or by the square root of the variance called standard deviation, denoted by \(\sigma\) ("sigma").

The distribution of IQ scores is normal with a mean of 100 and standard deviation of 15.

In other words, \(\mu=100\) and \(\sigma=15\). The probability density function is shown below.

Notice that the horizontal axis shows IQ score and the bell is centered at the mean of 100.

While we cannot determine the probability for any one given value because the distribution is continuous, we can determine the probability for a given interval of values. The probability for an interval is equal to the area under the density curve. The total area under the curve is 1.00, or 100%. In other words, 100% of observations fall under the curve.

The next figure shows the probability that the IQ of a randomly selected individual will be between 115 and 130. This probability is equal to the shaded area under the curve between 115 and 130.

Soon we will learn how to use the normal distribution (i.e., z distribution) to determine what proportion of the curve is shaded.

The Empirical Rule can be used to estimate the proportion of observations that should fall within the intervals of one, two, and three standard deviations of the mean:

Middle 68% of observations: \(\mu\pm 1(\sigma)\)

Middle 95% of observations: \(\mu\pm 2(\sigma)\)

Middle 99.7% of observations: \(\mu\pm 3(\sigma)\)

**Middle 95%**

Given that for the distribution of IQ scores, \(\mu=100\) and \(\sigma=15\), let's apply the Empirical Rule to determine between which two scores the middle 95% of indidivuals fall.

Middle 95%: Approximately \(100\pm2(15)=[70,130]\)

**Middle 99.7%**

The Empirical Rule also stated that about 99.7% (nearly all) of a bell-shaped dataset will be in the interval \(mean\pm 3(standard\;deviation)\).

Middle 99.7%: Approximately \(100\pm 3(15)= [55, 145]\)

Minitab Express or Minitab can be used to find the proportion of a normal distribution in a given range. For example, if we know that at one highway location vehicles' speeds are normally distributed with a mean of 65 mph and a standard deviation of 5 mph, we can use that information to determine what proportion of vehicles are going under or over the speed limit. In the next pages we will use this scenario to identify the proportion of vehicles at that spot going different speeds. You will also learn how to compute proportions using software.

The** cumulative probability **for a value is the probability less than or equal to that value. In notation, this is \(P(X\leq x)\)

Let's look at an example.

**Scenario: **Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph.

**Question: **What is the probability that a randomly selected vehicle will be going 73 mph or slower?

Here is Minitab Express output showing that \(P(X \leq 73)=0.945201\). This tells us that the probability that a randomly selected vehicle will be going 73 mph or slower is 0.945201.

We could also say that 94.5201% of vehicles are going 73 mph or slower.

The videos below will walk you through how to obtain this output using Minitab Express or Minitab.

To calculate normal random variable probabilities in Minitab:

- Open Minitab without data.
- From the menu bar select Calc > Probability Distribution > Normal.
- Select the radio button for Cumulative Probability (this is the default option)
- In the text box for Mean enter 65
- In the text box for Standard Deviation enter 5
- Since we do not have a column of data select the radio button for Input Constant and enter 73
- Click OK
- The output is as follows:

**Video Review**

To calculate normal random variable probabilities in Minitab:

- Open Minitab Express without any data.
- From the menu bar, select Statistics > Probability Distributions > CDF/PDF > Cumulative (CDF).
- Since you want to know the probability that the speed of a randomly selected vehicle is less than or equal to 73 mph, make sure the "Form of inpu" drop-down menu says "A single value" and enter 73 into the "Value" box.
- Make sure the "Distribution" drop-down menu says "Normal", and enter 65 into the "Mean" box and 5 into the "Standard deviation" box.
- Select "Display a table of cumulative probabilities" to show the output.
- Click OK.

The result should be the following output:

**Video Review**

We could also find this proportion by constructing a probability distribution.

**In Minitab Express:**

- Open Minitab Express without any data.
- From the menu bar, select
*Statistics > Probability Distributions > Distribution Plot* - Click
*Display Probability* - For
*Distribution*, select*Normal*(this is the default). In this scenario, our mean is 65 and our standard deviation is 5. - Under
*Shade the area corresponding to the following:*select*A specified x value*and*Left tail*. The*X value*is 73.

The result is the following output which shows us that of .945201 the distribution is less than 73 mph.

**In Minitab:**

- Open Minitab Express without any data.
- From the menu bar, select
*Graph > Probability Distribution Plots* - Click
*View Probability* - For
*Distribution*, select*Normal*(this is the default). In this scenario, our mean is 65 and our standard deviation is 5. - Under the
*Shaded Area*tab, select*X Value*and*Left Tail.*The*X value*is 73.

The result is the following output which shows us that .9452 of the distribution is less than 73 mph.

Sometimes we want to know the probability that a variable has a value greater than some value. For instance, we might want to know the probability that a randomly selected vehicle speed is greater than 73 mph, written \(P(X > 73)\).

Previously we found \(P(X<73)=.9452\). The general rule for a "greater than" situation is\(P(X > x)=1-P(X \leq x)\). Thus, \(P(X>73)=1-.9452=.0548\). The probability that a randomly selected vehicle will be going 73 mph or greater is .0548.

If we did not know \(P(X \leq73)\) we could compute this probability by constructing a probability distribution in Minitab Express or Minitab.

**In Minitab Express:**

- Open Minitab Express without any data.
- From the menu bar, select
*Statistics > Probability Distributions > Distribution Plot* - Click
*Display Probability* - For
*Distribution*, select*Normal*(this is the default). In this scenario, our mean is 65 and our standard deviation is 5. - Under
*Shade the area corresponding to the following:*select*A specified x value*and*Right tail.*The*X value*is 73.

The result is the following output which shows us that 0.0547993 of the distribution is greater than 73 mph.

**In Minitab:**

- Open Minitab Express without any data.
- From the menu bar, select
*Graph > Probability Distribution Plots* - Click
*View Probability* - For
*Distribution*, select*Normal*(this is the default). In this scenario, our mean is 65 and our standard deviation is 5. - Under the
*Shaded Area*tab, select*X Value*and*Right Tail.*The*X value*is 73.

The result is the following output which shows us that 0.05480 of the distribution is greater than 73 mph.

Suppose we want to know the probability a normal random variable is **within** a specified interval. For instance, suppose we want to know the probability a randomly selected vehicle is between 60 and 73 mph?

We could compute the probability that the speed is less than 73 mph and the probability that the speed is less than 60 mph and subtract the two. In other words:

\(P (60 < X < 73) = P(X<73) - P(X<60)\)

Or, we could use statistical software to find this range:

**In Minitab Express:**

- Open Minitab Express without any data.
- From the menu bar, select
*Statistics > Probability Distributions > Distribution Plot* - Click
*Display Probability* - For
*Distribution*, select*Normal*(this is the default). In this scenario, our mean is 65 and our standard deviation is 5. - Under
*Shade the area corresponding to the following:*select*A specified x value*and*Middle.*The*X value**1*is 60 and*X value 2*is 73.

The result is the following output which shows us that 0.786545 of the distribution is between 60 mph and 73 mph.

**In Minitab:**

- Open Minitab Express without any data.
- From the menu bar, select
*Graph > Probability Distribution Plots* - Click
*View Probability* - For
*Distribution*, select*Normal*(this is the default). In this scenario, our mean is 65 and our standard deviation is 5. - Under the
*Shaded Area*tab, select*X Value*and*Middle.*The*X Value 1*is 60 and*X Value 2*is 73.

The result is the following output which shows us that 0.7865 of the distribution is between 60 mph and 73 mph.

**Percentile: **Proportion of values below a given value

For example, if your test score is in the 88th percentile, then you scored better than 88% of test takers.

We may wish to know the value of a variable that is a specified percentile. For example, what speed is the 99.99th percentile of speeds at the highway location in our earlier example? Recall, the mean vehicle speed is 65 mph with a standard deviation of 5 mph.

To calculate percentiles in Minitab:

- Open Minitab without data
- From the menu bar select
*Calc>Probability Distribution> Normal* - Select
*Inverse Cumulative Probability* - Our mean is 65 and our standard deviation is 5
- Select the radio button for
*Input Constant*and enter .9999 - Click
*OK* - The output is as follows:

**Video Review**

To calculate percentiles in Minitab Express:

- Open Minitab Express without data
- On a PC: From the menu bar, select
*Statistics > Probability Distributions > CDF/PDF > Inverse (ICDF)* *Form of input*is*A single value**Value*is .9999.*Distribution*is*Normal*- Our mean is 65 and our standard deviation is 5
- Under
*Output*, select*Display a table of inverse cumulative probabilities* - Click
*OK*

On a MAC: From the menu bar, select

The result should be the following output:

**Video Review**

When finding the probability associated with a score on a normal distribution it may be necessary to first convert the observation to a z score in order to use the z table to find a probability. Recall from Lesson 2 the formula for computing the z-score for an individual observation:

**z Score**

\[z=\frac{x - \overline{x}}{s}\]

*z* = z score*x* = original individual score

\(\overline{x}\) = mean of the original distribution*s* = standard deviation of the original distribution

This formula can also be written using population parameters: \(z=\frac{x-\mu}{\sigma}\)

We will be using Table A in Appendix A of the Agresti, Franklin, and Klingenberg textbook. Table A in the textbook gives normal curve cumulative probabilities for standardized scores. This is also known as a z table. Row labels of Table A give possible *z*-scores up to one decimal place. The column labels give the second decimal place of the *z*-score. The cumulative probability for a value equals the cumulative probability for that value's *z*-score.

The examples on the following pages will walk you through examples of finding the probability less than, greater than, or between two values on the normal distribution. Remember, when finding the probability associated with an observation that is on a scale other than the standard normal distirbution (i.e., \(\mu=0\) and \(\sigma=1\)), you must first translate the score to a z score before using the table.

Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph.

What is the probability that a randomly selected car is going 73 mph or less?

It's often helpful to begin by sketching a normal distibution and shading in the appropriate region. From the graph below we can see that more than half of the curve is shaded in; this means that our final result should be greater than .50

Let’s use the z table to determine the proportion of the curve under 73 mph.

First, we need to compute the z score for this speed: \(z=\frac{73-65}{5}=1.60\)

Now we can use the z table to determine the proportion of the curve that is less than a z score of 1.6 by looking up 1.60. We look in the 1.6 row and the .00 column (1.6 plus .00 equals 1.60). The cumulative probability for z=1.60 is .9452, the same value that we got previously when using Minitab Express. There is a 94.52% chance of randomly selecting a vehicle that is going 73 mph or less.

Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph.

What is the probability that a randomly selected car is going 60 mph or less?

For speed = 60 the z-score is: \(z=\frac{60-65}{5}=-1.00\)

We look up -1.00 on the z table below and find a cumulative probability of .1584. There is a 15.84% chance of randomly selecting a vehicle that is going 60 mph or less.

Table A.1 gives this information:

IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. What IQ score separates the bottom 30% from the top 70%?

This is also known as the 30th percentile.

First we must look up the z score that separates the bottom 30% from the top 70% of the distribution:

The z value that separates the bottom 30% from the top 70% is approximately - 0.52

We can translate this z score into an IQ score given \(\mu=100\) and \(\sigma=15\)

\(IQ=15(-0.52)+100=92.2\)

The IQ score that separates the bottom 30% from the top 70% is 92.2

Scores on the SAT-M are normally distributed with a mean of 500 and standard deviation of 100. What SAT-M score is needed to be in the top 5% of the population?

First we will look up the z score that separates the top 5% from the bottom 95%:

The z score that separates the top 5% from the bottom 95% is approximately +1.64. We can find the SAT-M score that corresponds to this z score.

\(SAT-M=100(1.64)+500=664\)

An SAT-M score of 664 is needed to be in the top 5% of the population.

Suppose pulse rates of adult females have a normal curve distribution with mean of 75 and a standard deviation of 8. What is the probability that a randomly selected female has a pulse rate **greater than 85**?

If we use Table A.1, the first step is to calculate the z-score associated with a pulse rate of 85: \(z=\frac{85-75}{8}=1.25\).

Given that z=1.25, we can use the z-table to determine the cumulative probability:

The cumulative probability for z = 1.25 is .8944. This is the proportion below a pulse rate of 85, but we want to know the proportion above a pulse rate of 85.

\(P(X>85) = 1 - P(X<85) = 1 −.8944 =.1056\)

The probability that a randomly selected female will have a pulse rate above 85 is .1056 or 10.56%

We know IQ scores have a mean of 100 and standard deviation of 15. What proportion of IQ scores fall between 100 and 130?

First we must compute the z score associate which each of these IQ scores.

For an IQ of 100, \(z=\frac{100-100}{15}=0\)

For an IQ of 130, \(z=\frac{130-100}{15}=2.00\)

We are looking for the proportion of observations that fall between a z score of 0 and a z score of 2.00.

Using the z table above, \(P(z<0.00)=.5000\) and \(P(z<2.00)=.9772\)

\(P(0.00<z<2.00)=P(z<2.00)-P(z<0.00)=.9772-.5000=.4772\)

The proportion of IQ scores between 100 and 130 is .4772, or 47.72%.

This video walks through one example. A group of instructors have decided to assign grades on a curve. Given the mean and standard deviation of their students' scores, they want to know what point ranges are associated with which grades. Minitab Express is used.

Practice finding the proportion of observations under the normal curve. Each question can be answered using either Minitab Express or the z table. Work through each example then click the icon to view the solution and compare your answers.

**HINT**: Drawing the normal curve and shading in the region you are looking for is often helpful.

1. What proportion of the standard normal curve is less than a z score of 1.64?

2. What proportion of the standard normal curve falls above a z score of 1.33?

3. What proportion of the standard normal curve falls between a z score of -.50 and a z score of +.50?

4. At one private school, a minimum IQ score of 125 is necessary to be considered for admission. IQ scores have a mean of 100 and standard deviation of 15. Given this information, what proportion of children are eligible for consideration for admission to this school?

5. ACT scores have a mean of 18 and a standard deviation of 6. What proportion of test takers score between a 20 and 26?

6. A men’s clothing company is doing research on the height of adult American men in order to inform the sizing of the clothing that they offer. The height of males in the United States is normally distributed with a mean of 175 cm and a standard deviation of 15 cm. Men who are more than 30 cm different (shorter or taller) from the mean are classified by the apparel company as special cases because they do not fit in their regular length clothing. Given this information, what proportion of men would be classified as special cases?

In this lesson we examined a number of probability distributions including discrete, binomial, and normal. The next lesson will continue to explore probability distributions with an emphasis on the distribution of sample means. It will also introduce a new distribution that is similar in shape to the normal distribution: the t distribution.

Take a moment to review what you learned in this lesson before continuing to the next.

**Lesson 5 Learning Objectives**

Upon completion of this lesson, you will be able to:

- distinguish between discrete and continuous random variables.
- find probabilities associated with a discrete probability distribution.
- compute the mean and standard deviation of a discrete probability distribution.
- find probabilities associated with a binomial distribution.
- find probabilities associated with a normal probability distribution (i.e., z distribution) using Minitab Express and the standard normal table.