In this lesson we will begin to explore the concept of statistical inference. We will look at both discrete and continuous probability distributions. The concepts of standard error and the Central Limit Theorem will be introduced which will serve as the base for the remaining lessons in this course.

**Lesson 5 Learning Objectives**

Upon completion of this lesson, you will be able to:

- distinguish between discrete and continuous random variables.
- find probabilities associated with a discrete probability distribution.
- compute the mean and standard deviation of a discrete probability distribution.
- find probabilities associated with a binomial distribution.
- find probabilities associated with a normal probability distribution (i.e., z distribution) using Minitab Express and the standard normal table.

Before we begin new content, we should review a few terms from previous lessons that we will see again in this lesson:

**Discrete**: Data that can only take on set number of values

**Continuous**: Quantitative data that can take on any value between the minimum and maximum, and any value between two other values

**Probability**: The likelihood of an event occuring; \(P(A)=\frac{number \: of \:events \:considered\: outcome \:A}{number \:of\: total \:events}\)

**\(P(A\;\cap\;B)\)**: Intersection of A and B; "probability of A and B"

**\(P(A\;\cup\;B)\)**: Union of A and B; "probability of A or B" (this also includes the probability of A and B)

**Mean**: The numerical average; calculated as the sum of all of the data values divided by the number of values; represented as \(\overline{X}\).

**Standard deviation**: Roughly the average difference between individual data and the mean; for a sample, represented as s, \(s=\sqrt{\frac{\sum (x-\overline{x})^{2}}{n-1}}\)

**Sample**: A subset of the population from which data is actually collected

**Population**: The entire set of possible observations in which we are interested

**Statistic**: A measure concerning a sample (e.g., sample mean)

**Parameter**: A measure concerning a population (e.g., population mean)

**Descriptive statistics**: Methods for summarizing data (e.g., mean, median, mode, range, variance, graphs)

**Inferential statistics**: Methods for using sample data to make conclusions about a population

**z score**: Distance between an individual score and the mean in standard deviation units; also known as a standardized score.

**Empirical Rule**: For bell-shaped distributions, about 68% of the data will be within one standard deviation of the mean, about 95% will be within two standard deviations of the mean, and about 99.7% will be within three standard deviations of the mean

The word “random” is used often in everyday life. For example, you may hear someone say “We randomly decided to go out for dinner last night.” But is this really a random event? No, this is a conscious decision that was made on the basis of other variables such as hunger and the lack of satisfaction with other options such as cooking one’s own dinner.

In statistics, the word random has a different meaning. Something is random when it varies by chance. For example, when rolling a six sided die there are six equally possible outcomes, the observed outcome on any one roll is random. The variation of a random event such as rolling a die can be described by the probability distributions that we will see in this lesson.

**Random variable**: a numerical characteristic that takes on different values due to chance

**Coin Flips**

The number of heads in four flips of a coin (a numerical property of each different sequence of flips) is a **random variable** because the results will vary between trials.

**Heights**

Sample of 100 are repeatedly pulled from the population of all Penn State students and their heights are measured. The mean height of samples of 100 Penn State students is a **random variable** because the statistic will vary between samples. While most sample means will be similar to the population mean, they will not all equal the population mean due to random sampling variation.

Random variables are classified into two broad types: discrete and continuous. A **discrete random variable ** has a countable set of distinct possible values. A **continuous random variable ** is such that any value (to any number of decimal places) within some interval is a possible value.

- Number of heads in 4 flips of a coin (possible outcomes are 0, 1, 2, 3, 4)
- Number of classes missed last week (possible outcomes are 0, 1, 2, 3, ..., up to the maximum number of classes)
- Amount won or lost when betting $1 on the Pennsylvania Daily number lottery

**Continuous Random Variables:**

- Heights of individuals
- Time to finish a test
- Hours spent exercising last week

**Note **: In practice, we don't measure accurately enough to truly see all possible values of a continuous random variable. For instance, in reality somebody may have exercised 4.2341567 hours last week but they probably would round off to 4. Nevertheless, hours of exercise last week is inherently a continuous random variable.

**Probability distribution**: A table, graph, or formula that gives the probability of a given outcome's occurrence

For a discrete random variable, its **probability distribution **(also called the **probability distribution function**) is any table, graph, or formula that gives each possible value and the probability of that value. ** **

**Note**: The total of all probabilities across the distribution must be 1, and each individual probability must be between 0 and 1, inclusive.

What if we flipped a fair coin four times? What are the possible outcomes and what is the probability of each?

Figure 1 below is a probability distribution for the number of heads in 4 flips of a coin. Given that P(Heads)=.50, the probability of not flipping heads at all is 1/16, or .0625. In 6.25% of all trials, we can expect that there will be no heads. This may be written as P(X=0)=.0625. Similarly, the probability of flipping heads once in four trials is 4/16, or .25. In 25% of all trials, we can expect that heads will be flipped exactly once. This may be written as P(X=1)=.25.

This probability distribution could be constructed by listing all 16 possible sequences of heads and tails for four flips (i.e., HHHH, HTHH, HTTH, HTTT, etc.), and then counting how many sequences there are for each possible number of heads. Or, in section 5.4 you will see how these could be computed using binomial random variable techniques.

Heads | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

Probability | 1/16 | 4/16 | 6/16 | 4/16 | 1/16 |

A census was conducted at a university. All students were asked how many tattoos they had.

Figure 2 presents a probability distribution for the discrete variable of number of tattoos for each student. From this table we can find that 85% of students in the population do not have a tattoo, 12% of students in the population have one tattoo, 1.5% of students in the population have two tattoos, and so on. This could be written as P(X=0)=.85, P(X=1)=.12, P(X=2)=.015, etc.** **

Tattoos | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

Probability | .850 | .120 | .015 | .010 | .005 |

**Cumulative probability**: Likelihood of an outcome less than or equal to a given value occuring

To find a **cumulative probability **we add the probabilities for all values qualifying as "less than or equal" to the specified value.

Suppose we want to know the probability that the number of heads in four flips is less than two. If we let X represent number of heads we get on four flips of a coin, then:

Because this is a discrete distribution, the probability of flipping less than two heads is equal to flipping one or zero heads:

\(P(X<2)=P(X=0\cup1)\)

The probability of flipping 1 head and the probability of flipping 0 heads are mutually exclusive events. Thus, \(P(0 \cup1)=P(X=0)+P(X=1)\)

We can use the values from Figure 1 above to solve this equation.

\(P(X=0)+P(X=1)=(1/16)+(4/16)=5/16 \)

**Cumulative distribution**: A listing of all possible values along with the probability of that value and all lower values occuring (i.e., the **cumulative probability**)

Cumulative probabilities are found by adding the probability up to each column of the table. In Figure 3 we find the cumulative probability for one head by adding the probabilities for zero and one. The cumulative probability for two heads is found by adding the probabilities for zero, one, and two. We continue with this procedure until we reach the maximum number of heads, in this case four, which should have a cumulative probability of 1.00 because 100% of trials must have four or fewer heads.

Heads | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

Probability | 1/16 | 4/16 | 6/16 | 4/16 | 1/16 |

Cumulative Probability | 1/16 | 5/16 | 11/16 | 15/16 | 1 |

Let's construct a cumulative distribution for the data concerning number of tattoos.

Tattoos | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

Probability | .850 | .120 | .015 | .010 | .005 |

Cumulative Probability | .850 | .970 | .985 | .995 | 1 |

Note that the cumulative probability for the last column is always 1. That is, 100% of trials will be less than or equal to the maximum value.

**Law of Large Numbers: **Given a large number of repeated trials, the average of the results will be approximately equal to the expected value

**Expected value**: The mean value in the long run for many repeated samples, symbolized as \(E(X)\)

**Expected Value for a Discrete Random Variable**

\(x_i\)= value of the i^{th }outcome

\(p_i\) = probability of the i^{th} outcome

According to this formula, we take each observed X value and multiply it by its respective probability. We then add these products to reach our expected value. You may have seen this before referred to as a **weighted average**. It is known as a weighted average because it takes into account the probability of each outcome and weighs it accordingly. This is in contrast to an unweighted average which would not take into account the probability of each outcome and weigh each possibility equally.

Let's look at a few examples of expected values for a discrete random variable:

A fair six-sided die is tossed. You win \$2 if the result is a “1,” you win \$1 if the result is a “6,” but otherwise you lose \$1.

X | +\$2 | +\$1 | -\$1 |
---|---|---|---|

Probability | 1/6 | 1/6 | 4/6 |

\( E(X)= \$2(\frac {1}{6})+\$1 (\frac {1}{6})+(-\$1)(\frac {4}{6})=\$\frac{-1}{6}= -\$ 0.17 \)

The interpretation is that if you play many times, the average outcome is losing 17 cents per play. Thus, over time you should expect to lose money.

Using the probability distribution for number of tattoos, let's find the mean number of tattoos per student.

Tattoos | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

Probability | .850 | .120 | .015 | .010 | .005 |

\( E(X)=0 (.85)+1(.12)+ 2(.015) +3 (.010) +4(.005) =.20 \)

The mean number of tattoos per student is .20.

Recall from Lesson 3, in a sample, the mean is symbolized by \(\overline{x}\) and the standard deviation by \(s\). Because the probabilities that we are working with here are computed using the population, they are symbolized using lower case Greek letters. The population mean is symbolized by \(\mu\) (lower case "mu") and the population standard deviation by \(\sigma \) (lower case "sigma").

Sample Statistic | Population Parameter | |

Mean | \(\overline{x}\) | \(\mu\) |

Variance | \(s^{2}\) | \(\sigma ^{2}\) |

Standard Deviation | \(s\) | \(\sigma \) |

Also recall that the standard deviation is equal to the square root of the variance. Thus, \(\sigma=\sqrt{(\sigma ^{2})}\)

Knowing the expected value is not the only important characteristic one may want to know about a set of discrete numbers: one may also need to know the spread, or variability, of these data. For instance, you may "expect" to win \$20 when playing a particular game (which appears good!), but the spread for this might be from losing \$20 to winning \$60. Knowing such information can influence you decision on whether to play.

To calculate the standard deviation we first must calculate the variance. From the variance, we take the square root and this provides us the standard deviation. Conceptually, the variance of a discrete random variable is the sum of the difference between each value and the mean times the probility of obtaining that value, as seen in the conceptual formulas below:

**Conceptual Formulas**

**Variance for a Discrete Random Variable**

\( \sigma ^2= \sum [(x_i-\mu)^2 p_i] \)

**Standard Deviation for a Discrete Random Variable**

\( \sigma = \sqrt {\sum [(x_i-\mu)^2 p_i}]\)

\(x_i\)= value of the i^{th }outcome

\(\mu= E(X)=\sum x_i p_i\)

\(p_i\) = probability of the i^{th} outcome

In these expressions we substitute our result for E(X) into \( \mu\)* *because \( \mu\)* *is the symbol used to represent the mean of a population .

However, there is an **easier** computational formula. The compuational formula will give you the same result as the conceptual formula above, but the calculations are simplier.

**Computational Formulas**

**Variance for a Discrete Random Variable**

\( \sigma ^2= [\sum (x_i^2 p_i )]-\mu ^2\)

**Standard Deviation for a Discrete Random Variable**

\( \sigma = \sqrt {[\sum (x_i^2 p_i)] -\mu ^2}\)** **

\(x_i\)= value of the i^{th }outcome

\(\mu= E(X)=\sum x_i p_i\)

\(p_i\) = probability of the i^{th} outcome

Notice in the summation part of this equation that we only square each observed X value and not the respective probability. Also note that the \(\mu\) is outside of the summation.

Going back to the first example used above for expectation involving the dice game, we would calculate the standard deviation for this discrete distribution by first calculating the variance:

X | +\$2 | +\$1 | -\$1 |
---|---|---|---|

Probability | 1/6 | 1/6 | 4/6 |

\( \sigma ^2= [\sum x_i^2 p_i ]-\mu ^2 = [2^2 (\frac{1}{6})+1^2 (\frac{1}{6})+(-1)^2 (\frac{4}{6})]-(- \frac{1}{6})^2\)

\(=[ \frac{4}{6}+\frac {1}{6}+ \frac{4}{6}]-\frac{1}{36} = \frac{53}{36}=1.472 \)

The variance of this discrete random variable is 1.472.

\(\sigma=\sqrt{(\sigma ^{2})}\)

\(\sigma=\sqrt{1.472}=1.213\)

The standard deviation of this discrete random vairable is 1.213.

This video walks through one example of a discrete random variable. It includes the construction of a cumulative probability distribution and the calculation of the mean and standard deviation.

**Binomial random variable**: A specific type of discrete random variable that counts how often a particular event occurs in a fixed number of tries or trials

For a variable to be a **binomial random variable**, **ALL** of the following conditions must be met:

- There are a fixed number of trials (a fixed sample size)
- On each trial, the event of interest either occurs or does not
- The probability of occurrence (or not) is the same on each trial
- Trials are independent of one another

- Number of correct guesses at 30 true-false questions when you randomly guess all answers
- Number of winning lottery tickets when you buy 10 tickets of the same kind
- Number of left-handers in a randomly selected sample of 100 unrelated people
- Number of tails when flipping a coin 10 times

**Notation **

n= number of trials

p= probability event of interest occurs on any one trial

Number of correct guesses at 30 true-false questions when you randomly guess all answers

There are 30 trials, therefore n = 30

There are two possible outcomes (true and false) that are equally probable, therefore p = 1/2 = .5

The conditions for being a binomial variable lead to a somewhat complicated formula for finding the probability any specific value occurs (such as the probability you get 20 right when you guess as 30 True-False questions.)

We'll use Minitab Express to find probabilities for binomial random variables. However, for those of you who are curious, the by hand formula for the probability of getting a specific outcome in a binomial experiment is:

**Binomial Random Variable Probability**

\[P(x)= \frac {n!}{x!(n-x)!} p^x (1-p)^{n-x}\]

n = number of trials

x = number of successes

p = probability event of interest occurs on any one trial

! is the symbol for factorial. For a review of factorials, see the course algebra review page.

One can use the formula to find the probability or alternatively, use Minitab Express to find the probability. In the homework, you may use the method that you are more comfortable with unless specified otherwise.

Minitab Express can be used to construct binomial distributions and compute the probability of \(P(X=k)\).

When flipping a fair coin the probability of flipping heads is 0.50. Let's say that we are going to be flipping a fair coin 10 times and we want to know the probability of flipping heads exactly five times.

To compute a binomial probability in Minitab Express:

- On a
**PC**: Select**STATISTICS > Distribution Plot**

On a**Mac**:**Statistics > Probability Distributions > Distribution Plot** - Select
*Display Probability* - For
*Distribution*select*Binomial* - For
*Number of trials*enter 10 - For
*Event probability* - Select
*A specified X value* - Select
*Middle* - For both
*X value 1*and*X value 2*enter 5

The output is a binomial distribution plot. The area that is shaded in red is the proportion of the distribution where X=5.

\(P(X=5)=0.246094\)

You can also use Minitab Express to find the probability of a range of outcomes. Let's say that we are going to be flipping a fair coin 10 times and we want to know the probability of flipping heads **five or more **times.

To compute a binomial probability for a range of outcomes in Minitab Express:

- On a PC: Select
**Statistics**> Distribution Plot

On a Mac:**Statistics > Probability Distributions > Distribution Plot** - Select
*Display Probability* - For
*Distribution*select*Binomial* - For
*Number of trials*enter 10 - For
*Event probability* - Select
*A specified X value* - Select
*Right tail* - For
*X value*enter 5

The output is a binomial distribution plot displaying \(P(X \geq 5)=0.623047\)

In the following example, we illustrate how to use the formula to compute binomial probabilities by hand. If you don't like to use the formula, you can also use Minitab Express to find the probabilities.

**Red Flowers**

Cross-fertilizing a red and a white flower produces red flowers 25% of the time. Now we cross-fertilize five pairs of red and white flowers and produce five offspring. Find the probability that there will be no red flowered plants in the five offspring.

*X* = # of red flowered plants in the five offspring.

The number of red flowered plants has a binomial distribution with n = 5, p = .25

\(P(X=0)=\frac{5!}{0!(5-0)!} .25 ^0 (1- .25)^5 =1 \times .25^0 \times .75^5 =.237\)

There is a 23.7% chance that none of the five plants will be red flowered.

**Cumulative probability**: Likelihood that a certain number of successes or fewer will occur.

Binomial random variable probabilities are mutually exclusive, therefore we can use the addition rule that we learned in Lesson 4.

Continuing with the red flowers example, what if we wanted to know the probability that there would be one or fewer red flowered plants?

\begin{align}

P(X\ is\ 1\ or\ less)&=P(X=0)+P(X=1)\\

&= \frac{5!}{0!(5-0)!} .25^0 (1-.25)^5+\frac{5!}{1!(5-1)!} .25^1 (1-.25)^4\\

& = .237 +.395=.632 \\

\end{align}

There is a 63.2% chance that one or fewer of the five plants will be red flowered.

In the red flowers example, we first computed P(X = x) and then P(X ≤ x). This latter expression is called finding a **cumulative probability** because you are finding the probability that has accumulated from the minimum to some point, i.e. from 0 to 1 in this example

**To use Minitab Express to solve a cumulative probability binomial problem**, return to Statistics > Probability Distributions> CDF/PDF > Cumulative Distribution Function (CDF). For Value enter 1. For distribution select the binomial. There are 5 trials and the event probability is .25

**To use Minitab to solve a cumulative probability binomial problem**, return to Calc > Probability Distributions > Binomial as shown above. Now however, select the radio button for Cumulative Probability. For Number of Trials enter 5 and the event probability is .25. Click the radio button for Input Constant and enter the x value of 1.

The formula given earlier for discrete random variables could be used, but the good news is that for binomial random variables a shortcut formula for expected value (the mean) and standard deviation can also be used.

**Binomial Random Variable Formulas**

\[\mu=np\]

\[\sigma=\sqrt {np(1-p)}\]

n = number of trials

p = probability event of interest occurs on any one trial

After you use this formula a couple of times, you'll realize this formula matches your intuition. For instance, the “expected” number of correct (random) guesses at 30 True-False questions is *np *= (30)(.5) = 15 (half of the questions). For a fair six-sided die rolled 60 times, the expected value of the number of times a “1” is tossed is *np *= (60)(1/6) = 10.

The standard deviations for these would be, for the True-False test, \(\sigma=\sqrt{30 (0.5) (1-0.5)}=\sqrt{7.5}=2.74\), and for the die, \(\sigma=\sqrt{60 \left( \frac{1}{6}\right) \left(1-\frac {1}{6}\right)}=\sqrt{ \frac{50}{6}}=2.89\).

**Roulette**

A roulette wheel has 38 slots, 18 are red, 18 are black, and 2 are green.You play five games and always bet on red.

**How many games can you expect to win?**

Recall, you play five games and always bet on red. \(n=5\) and \(p=\frac{red \;slots}{total \;slots}=\frac{18}{38}\)

\(\mu=np=5 \left( \frac{18}{38}\right)=2.3684\)

\( \sigma=\sqrt{np(1-p)}=\sqrt{5\left(\frac{18}{38} \right) \left(1-\frac{18}{38}\right)}=1.1165\)

Out of 5 games, you can expect to win 2.3684 (with a standard deviation of 1.1165).

**What is the probability that you will win all five games? **

\(P(x)= \frac {n!}{x!(n-x)!} p^x (1-p)^{n-x}\)

\(P(X=5)= \frac {5!}{5!(5-5)!}\left( \frac{18}{38} \right)^5 \left(1-\frac{18}{38}\right)^{5-5}\)

\(P(X=5)=\frac{5!}{5!0!} \left(.4737^{5}\right) .5263^{0} = 1(.0238)(1)=.0238\)

There is a 2.38% chance that you will win all five out of five games.

If you win three or more games, you make a profit. If you win two or fewer games, you lose money. **What is the probability that you will win no more than two games? **

\(P(X\leq 2)=P(X=0)+P(X=1)+P(X=2)\)

\(P(X=0)=\frac {5!}{0!(5-0)!} \left ( \frac{18}{38} \right )^0\left(1-\frac{18}{38}\right)^{5-0}=.0404\)

\(P(X=1)=\frac {5!}{1!(5-1)!} \left ( \frac{18}{38} \right )^1\left(1-\frac{18}{38}\right)^{5-1}=.1817\)

\(P(X=2)=\frac {5!}{2!(5-2)!} \left ( \frac{18}{38} \right )^2\left(1-\frac{18}{38}\right)^{5-2}=.3271\)

\(P(X\leq 2)=.0404+.1817+.3271=.5493\)

There is a 54.93% chance that you will win no more than two games. In other words, there is a 54.93% chance that you will lose money.

A fair coin is flipped 10 times. This example is used to demonstrate calculations concerning binomial random variables. Hand calculations are performed and Minitab Express is used.

A class is taking a multiple choice quiz. There are 6 questions, each with 4 options. The professor accidently brought a quiz from a different, much more advanced class. All students randomly guess on each item. This is a binomial random variable. We compute the mean and standard deviation by hand. We compute the probability that a student will pass (i.e., at least 60%), the probability that they will get all questions incorrect, and the probability that they will get all questions correct all using Minitab Express.

We just discussed discrete random variables, and now we consider *continuous random variables*. Recall, a **continuous random variable **is such that all values (to any number of decimal places) within some interval are possible outcomes. A continuous random variable has an infinite number of possible values so we can't assign probabilities to each specific value. If we did, the total probability would be infinite, rather than 1, as it is supposed to be.

To describe probabilities for a continuous random variable, we use a *probability density function. *

**Probability density function (PDF)**:** **A curve such that the area under the curve within any interval of values along the horizontal gives the probability for that interval

The most commonly encountered type of continuous random variable is a **normal random variable **, which has a symmetric bell-shaped density function. The center point of the distribution is the mean value, denoted by \(\mu\) ("mu"). The spread of the distribution is determined by the variance, denoted by \(\sigma ^{2}\) ("sigma squared") or by the square root of the variance called standard deviation, denoted by \(\sigma\) ("sigma").

The distribution of IQ scores is normal with a mean of 100 and standard deviation of 15.

In other words, \(\mu=100\) and \(\sigma=15\). The probability density function is shown below.

Notice that the horizontal axis shows IQ score and the bell is centered at the mean of 100.

While we cannot determine the probability for any one given value because the distribution is continuous, we can determine the probability for a given interval of values. The probability for an interval is equal to the area under the density curve. The total area under the curve is 1.00, or 100%. In other words, 100% of observations fall under the curve.

The next figure shows the probability that the IQ of a randomly selected individual will be between 115 and 130. This probability is equal to the shaded area under the curve between 115 and 130.

Soon we will learn how to use the normal distribution (i.e., z distribution) to determine what proportion of the curve is shaded.

The Empirical Rule can be used to estimate the proportion of observations that should fall within the intervals of one, two, and three standard deviations of the mean:

Middle 68% of observations: \(\mu\pm 1(\sigma)\)

Middle 95% of observations: \(\mu\pm 2(\sigma)\)

Middle 99.7% of observations: \(\mu\pm 3(\sigma)\)

**Middle 95%**

Given that for the distribution of IQ scores, \(\mu=100\) and \(\sigma=15\), let's apply the Empirical Rule to determine between which two scores the middle 95% of indidivuals fall.

Middle 95%: Approximately \(100\pm2(15)=[70,130]\)

**Middle 99.7%**

The Empirical Rule also stated that about 99.7% (nearly all) of a bell-shaped dataset will be in the interval \(mean\pm 3(standard\;deviation)\).

Middle 99.7%: Approximately \(100\pm 3(15)= [55, 145]\)

Minitab Express can be used to find the proportion of a normal distribution in a given range. For example, if we know that at one highway location vehicles' speeds are normally distributed with a mean of 65 mph and a standard deviation of 5 mph, we can use that information to determine what proportion of vehicles are going under or over the speed limit. In the next pages we will use this scenario to identify the proportion of vehicles at that spot going different speeds.

The** cumulative probability **for a value is the probability less than or equal to that value. In notation, this is \(P(X\leq x)\)

The proportion at or below a given value is also known as a **percentile**.

Let's look at an example in Minitab Express.

**Scenario: **Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph.

**Question: **What is the probability that a randomly selected vehicle will be going 73 mph or slower?

To calculate a probability for values **less than** a given value in Minitab Express:

- On a
**PC**: from the menu select**STATISTICS > Distribution Plot**

On a**Mac**:**Statistics > Probability Distributions > Distribution Plot** - Select
*Display Probability* - For
*Distribution*select*Normal*(Note: This is the default) - For
*Mean*enter 65 - For
*Standard deviation* - Select
*A specified X value* - Select
*Left tail* - For
*X value*enter 73

This should result in the following output:

Sometimes we want to know the probability that a variable has a value greater than some value. For instance, we might want to know the probability that a randomly selected vehicle speed is greater than 73 mph, written \(P(X > 73)\).

Previously we found \(P(X<73)=.9452\). The general rule for a "greater than" situation is\(P(X > x)=1-P(X \leq x)\). Thus, \(P(X>73)=1-.9452=.0548\). The probability that a randomly selected vehicle will be going 73 mph or greater is .0548.

If we did not know \(P(X \leq73)\) we could compute this probability by constructing a probability distribution in Minitab Express or Minitab.

To calculate a probability for values **greater than** a given value in Minitab Express:

- On a
**PC**: from the menu select**STATISTICS > Distribution Plot**

On a**Mac**: from the menu select**Statistics > Probability Distributions > Distribution Plot** - Select
*Display Probability* - For
*Distribution*select*Normal*(Note: This is the default) - For
*Mean*enter 65 - For
*Standard deviation* - Select
*A specified X value* - Select
*Right tail* - For
*X value*enter 73

This should result in the following output:

Suppose we want to know the probability a normal random variable is **within **a specified interval. For instance, suppose we want to know the probability a randomly selected vehicle is between 60 and 73 mph?

To calculate a probability for values **in between** given values in Minitab Express:

- On a
**PC**: from the menu select**STATISTICS > Distribution Plot**

On a**Mac**: from the menu select**Statistics > Probability Distributions > Distribution Plot** - Select
*Display Probability* - For
*Distribution*select*Normal*(Note: This is the default) - For
*Mean*enter 65 - For
*Standard deviation* - Select
*A specified X value* - Select
*Middle* - For
*X value 1*enter 60 and for*X value 2*enter 73

This should result in the following output:

We can also use Minitab Express to find a score or scores associated with a given proportion of a normal distribution. We will continue with our example involve car speeds with \(\mu = 65\) and \(\sigma = 5\). What speed separates the top 10% of car from the bottom 90% of cars?

To find a score associated with a proportion of a normal distribution in Minitab Express:

- On a
**PC**: from the menu select**STATISTICS > Distribution Plot**

On a**Mac**: from the menu select**Statistics > Probability Distributions > Distribution Plot** - Select
*Display Probability* - For
*Distribution*select*Normal*(Note: This is the default) - For
*Mean*enter 65 - For
*Standard deviation* - Select
*A specified probability* - Select
*Right tail* - For
*Probability*enter 0.10

Note: You could also select *Left tail* and enter 0.90 as the probability.

This should result in the following output:

Mac movie here

What speed separates the middle 90% of cars from the most extreme 10% of cars? Note that the outer 10% will be split evently between the right and left tails in this scenario.

To find a range of scores associated with proportion of a normal distribution in Minitab Express:

- On a
**PC**: from the menu select**STATISTICS > Distribution Plot**

On a**Mac**: from the menu select**Statistics > Probability Distributions > Distribution Plot** - Select
*Display Probability* - For
*Distribution*select*Normal*(Note: This is the default) - For
*Mean*enter 65 - For
*Standard deviation* - Select
*A specified probability* - Select
*Middle* - For
*Probability 1*enter 0.05 - For
*Probability 2*

Note: You could also select *Equal tails* and enter 0.10 as the area in the tails

This should result in the following output:

When finding the probability associated with a score on a normal distribution it may be necessary to first convert the observation to a z score in order to use the z table to find a probability. Recall from Lesson 2 the formula for computing the z-score for an individual observation:

**z Score**

\[z=\frac{x - \overline{x}}{s}\]

*z* = z score*x* = original individual score

\(\overline{x}\) = mean of the original distribution*s* = standard deviation of the original distribution

This formula can also be written using population parameters: \(z=\frac{x-\mu}{\sigma}\)

We will be using Table A in Appendix A of the Agresti, Franklin, and Klingenberg textbook. Table A in the textbook gives normal curve cumulative probabilities for standardized scores. This is also known as a z table. Row labels of Table A give possible *z*-scores up to one decimal place. The column labels give the second decimal place of the *z*-score. The cumulative probability for a value equals the cumulative probability for that value's *z*-score.

The examples on the following pages will walk you through examples of finding the probability less than, greater than, or between two values on the normal distribution. Remember, when finding the probability associated with an observation that is on a scale other than the standard normal distirbution (i.e., \(\mu=0\) and \(\sigma=1\)), you must first translate the score to a z score before using the table.

Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph.

What is the probability that a randomly selected car is going 73 mph or less?

It's often helpful to begin by sketching a normal distibution and shading in the appropriate region. From the graph below we can see that more than half of the curve is shaded in; this means that our final result should be greater than .50

Let’s use the z table to determine the proportion of the curve under 73 mph.

First, we need to compute the z score for this speed: \(z=\frac{73-65}{5}=1.60\)

Now we can use the z table to determine the proportion of the curve that is less than a z score of 1.6 by looking up 1.60. We look in the 1.6 row and the .00 column (1.6 plus .00 equals 1.60). The cumulative probability for z=1.60 is .9452, the same value that we got previously when using Minitab Express. There is a 94.52% chance of randomly selecting a vehicle that is going 73 mph or less.

Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph.

What is the probability that a randomly selected car is going 60 mph or less?

For speed = 60 the z-score is: \(z=\frac{60-65}{5}=-1.00\)

We look up -1.00 on the z table below and find a cumulative probability of .1584. There is a 15.84% chance of randomly selecting a vehicle that is going 60 mph or less.

Table A.1 gives this information:

IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. What IQ score separates the bottom 30% from the top 70%?

This is also known as the 30th percentile.

First we must look up the z score that separates the bottom 30% from the top 70% of the distribution:

The z value that separates the bottom 30% from the top 70% is approximately - 0.52

We can translate this z score into an IQ score given \(\mu=100\) and \(\sigma=15\)

\(IQ=15(-0.52)+100=92.2\)

The IQ score that separates the bottom 30% from the top 70% is 92.2

Scores on the SAT-M are normally distributed with a mean of 500 and standard deviation of 100. What SAT-M score is needed to be in the top 5% of the population?

First we will look up the z score that separates the top 5% from the bottom 95%:

The z score that separates the top 5% from the bottom 95% is approximately +1.64. We can find the SAT-M score that corresponds to this z score.

\(SAT-M=100(1.64)+500=664\)

An SAT-M score of 664 is needed to be in the top 5% of the population.

Suppose pulse rates of adult females have a normal curve distribution with mean of 75 and a standard deviation of 8. What is the probability that a randomly selected female has a pulse rate **greater than 85**?

If we use Table A.1, the first step is to calculate the z-score associated with a pulse rate of 85: \(z=\frac{85-75}{8}=1.25\).

Given that z=1.25, we can use the z-table to determine the cumulative probability:

The cumulative probability for z = 1.25 is .8944. This is the proportion below a pulse rate of 85, but we want to know the proportion above a pulse rate of 85.

\(P(X>85) = 1 - P(X<85) = 1 −.8944 =.1056\)

The probability that a randomly selected female will have a pulse rate above 85 is .1056 or 10.56%

We know IQ scores have a mean of 100 and standard deviation of 15. What proportion of IQ scores fall between 100 and 130?

First we must compute the z score associate which each of these IQ scores.

For an IQ of 100, \(z=\frac{100-100}{15}=0\)

For an IQ of 130, \(z=\frac{130-100}{15}=2.00\)

We are looking for the proportion of observations that fall between a z score of 0 and a z score of 2.00.

Using the z table above, \(P(z<0.00)=.5000\) and \(P(z<2.00)=.9772\)

\(P(0.00<z<2.00)=P(z<2.00)-P(z<0.00)=.9772-.5000=.4772\)

The proportion of IQ scores between 100 and 130 is .4772, or 47.72%.

This video walks through one example. A group of instructors have decided to assign grades on a curve. Given the mean and standard deviation of their students' scores, they want to know what point ranges are associated with which grades. Minitab Express is used.

Practice finding the proportion of observations under the normal curve. Each question can be answered using either Minitab Express or the z table. Work through each example then click the icon to view the solution and compare your answers.

**HINT**: Drawing the normal curve and shading in the region you are looking for is often helpful.

1. What proportion of the standard normal curve is less than a z score of 1.64?

2. What proportion of the standard normal curve falls above a z score of 1.33?

3. What proportion of the standard normal curve falls between a z score of -.50 and a z score of +.50?

4. At one private school, a minimum IQ score of 125 is necessary to be considered for admission. IQ scores have a mean of 100 and standard deviation of 15. Given this information, what proportion of children are eligible for consideration for admission to this school?

5. ACT scores have a mean of 18 and a standard deviation of 6. What proportion of test takers score between a 20 and 26?

6. A men’s clothing company is doing research on the height of adult American men in order to inform the sizing of the clothing that they offer. The height of males in the United States is normally distributed with a mean of 175 cm and a standard deviation of 15 cm. Men who are more than 30 cm different (shorter or taller) from the mean are classified by the apparel company as special cases because they do not fit in their regular length clothing. Given this information, what proportion of men would be classified as special cases?

In this lesson we examined a number of probability distributions including discrete, binomial, and normal. The next lesson will continue to explore probability distributions with an emphasis on the distribution of sample means. It will also introduce a new distribution that is similar in shape to the normal distribution: the t distribution.

Take a moment to review what you learned in this lesson before continuing to the next.

**Lesson 5 Learning Objectives**

Upon completion of this lesson, you will be able to:

- distinguish between discrete and continuous random variables.
- find probabilities associated with a discrete probability distribution.
- compute the mean and standard deviation of a discrete probability distribution.
- find probabilities associated with a binomial distribution.
- find probabilities associated with a normal probability distribution (i.e., z distribution) using Minitab Express and the standard normal table.