Upon completion of this lesson, you should be able to:

- distinguish between discrete and continuous random variables,
- explain the difference between population, parameter, sample, and statistic,
- determine if a given value represents a population parameter or sample statistic,
- find probabilities associated with a discrete probability distribution,
- compute the mean and variance of a discrete probability distribution,
- find probabilities associated with a binomial distribution,
- find probabilities associated with a normal probability distribution using the standard normal table,
- determine the standard error for the sample proportion and sample mean, and
- apply the Central Limit Theorem properly to a set of continuous data.

A **random variable ** is numerical characteristic of each event in a sample space, or equivalently, each individual in a population.

Examples:

- The number of heads in four flips of a coin (a numerical property of each different sequence of flips).
- Heights of individuals in a large population.

Random variables are classified into two broad types

- A
**discrete random variable**has a countable set of distinct possible values. - A
**continuous random variable**is such that any value (to any number of decimal places) within some interval is a possible value.

**Examples of discrete random variable:**

- Number of heads in 4 flips of a coin (possible outcomes are 0, 1, 2, 3, 4).
- Number of classes missed last week (possible outcomes are 0, 1, 2, 3, ..., up to some maximum number)
- Amount won or lost when betting $1 on the Pennsylvania Daily number lottery

**Examples of continuous random variables:**

- Heights of individuals
- Time to finish a test
- Hours spent exercising last week.

**Note **: In practice, we don't measure accurately enough to truly see all possible values of a continuous random variable. For instance, in reality somebody may have exercised 4.2341567 hours last week but they probably would round off to 4. Nevertheless, hours of exercise last week is inherently a continuous random variable.

For a discrete random variable, its **probability distribution **(also called the probability distribution function) is any table, graph, or formula that gives each possible value and the probability of that value. **Note**: The total of all probabilities across the distribution must be 1, and each individual probability must be between 0 and 1, inclusive.

**Examples: **

Heads |
0 |
1 |
2 |
3 |
4 |
---|---|---|---|---|---|

Probability |
1/16 |
4/16 |
6/16 |
4/16 |
1/16 |

This could be found by listing all 16 possible sequences of heads and tails for four flips, and then counting how many sequences there are for each possible number of heads.

Tattoos |
0 |
1 |
2 |
3 |
4 |
---|---|---|---|---|---|

Probability |
0.850 |
0.120 |
0.015 |
0.010 |
0.005 |

This could be found be doing a census of a large student population. ** **

**Cumulative Probabilities **

Often, we wish to know the probability that a variable is less than or equal to some value. This is called the **cumulative probability **because to find the answer, we simply add probabilities for all values qualifying as "less than or equal" to the specified value.

Example: Suppose we want to know the probability that the number of heads in four flips is 1 or less. The qualifying values are 0 and 1, so we add probabilities for those two possibilities. If we let X represent number of heads we get on four flips of a coin, then:

\( P(X<2)=P(X \leq 1 )=P(X=0)+P(X=1)=(1/16)+(4/16)=5/16 \)

The **cumulative distribution **is a listing of all possible values along with *the cumulative probability *for each value

**Examples: **

Heads |
0 |
1 |
2 |
3 |
4 |
---|---|---|---|---|---|

Probability |
1/16 |
4/16 |
6/16 |
4/16 |
1/16 |

Cumulative Probability |
1/16 |
5/16 |
11/16 |
15/16 |
1 |

Each cumulative probability was found by adding probabilities (in second row) up to the particular column of the table. As an example, for 2 heads, we add probabilities for 0, 1, and 2 heads to get 11/16. This is the probability the number of heads is two or less.

Tattoos |
0 |
1 |
2 |
3 |
4 |
---|---|---|---|---|---|

Probability |
0.850 |
0.120 |
0.015 |
0.010 |
0.005 |

Cumulative Probability |
0.850 |
0.970 |
0.985 |
0.995 |
1 |

As an example, probability a randomly selected student has 2 or fewer tattoos = 0.985 (calculated as 0.850 + 0.120 + 0.015).

The phrase **expected value **is a synonym for **mean** value in the long run (meaning for many repeats or a large sample size). For a discrete random variable, the calculation is Sum of (value × probability) where we sum over all values (after separately calculating value × probability for each value), expressed as:

\( E(X)=\sum x_i p_i\),

meaning we take each observed X value and multiply it by its respective probability. We then add these products to reach our expected value labeled E(X). [NOTE: the letter X is a common symbol used to represent a random variable. Any letter can be used.]

**Example **: A fair six-sided die is tossed. You win \$2 if the result is a “1”, you win \$1 if the result is a “6” but otherwise you lose \$1.

X | +2 | +1 | -1 |
---|---|---|---|

Probability | 1/6 | 1/6 | 4/6 |

\( Expected\ Value= (2 \times \frac {1}{6})+(1 \times \frac {1}{6})+(-1 \times \frac {4}{6})= - \frac {1}{6}=-\$ 0.17 \)

The interpretation is that if you play many times, the average outcome is losing 17 cents per play.

**Example **: Using the probability distribution for number of tattoos given above (not the cumulative!),

The mean number of tattoos per student is

\( Expected\ Value=(0 \times 0.85)+(1 \times 0.12)+ (2 \times 0.015) +(3 \times 0.010) +(4 \times 0.005) =0.20 \)

**Standard Deviation of a Discrete Variable **

Knowing the expected value is not the only important characteristic one may want to know about a set of discrete numbers: one may also need to know the spread, or variability, of these data. For instance, you may "expect" to win \$20 when playing a particular game (which appears good!), but the spread for this might be from losing \$20 to winning \$60. Knowing such information can influence you decision on whether to play.

To calculate the standard deviation we first must calculate the variance. From the variance, we take the square root and this provides us the standard deviation. Your book provides the following formula for calculating the variance:

\( \sigma ^2= \sum (x_i-\mu)^2 p_i \) and the standard deviation is:\( \sigma = \sqrt {\sum (x_i-\mu)^2 p_i}\)

In this expression we substitute our result for E(X) into \( \mu\)* *, and \( \mu\)* *is simply the symbol used to represent the mean of some population .

However, an **easier **formula to use and remember for calculating the standard deviation is the following:

\( \sigma ^2= \sum x_i^2 p_i -\mu ^2\) and again we substitute E(X) for μ.

The standard deviation is then found by taking the square root of the variance. Notice in the summation part of this equation that we only square each observed X value and **not **the respective probability.

**Example **: Going back to the first example used above for expectation involving the die, we would calculate the standard deviation for this discrete distribution by first calculating the variance:

\( \sigma ^2= \sum x_i^2 p_i -\mu ^2 = (2^2 \times \frac{1}{6})+(1^2 \times \frac{1}{6})+(-1)^2 \times \frac{4}{6}-(- \frac{1}{6})^2\)

\(= \frac{4}{6}+\frac {1}{6}+ \frac{4}{6}-\frac{1}{36} = \frac{53}{36}=1.472 \)

So the standard deviation would be the square root of 1.472, or 1.213

This is a specific type of discrete random variable. A binomial random variable counts how often a particular event occurs in a fixed number of tries or trials. For a variable to be a binomial random variable, **ALL** of the following conditions must be met:

- There are a fixed number of trials (a fixed sample size).
- On each trial, the event of interest either occurs or does not.
- The probability of occurrence (or not) is the same on each trial.
- Trials are independent of one another.

**Examples of binomial random variables: **

- Number of correct guesses at 30 true-false questions when you randomly guess all answers
- Number of winning lottery tickets when you buy 10 tickets of the same kind
- Number of left-handers in a randomly selected sample of 100 unrelated people

**Notation **

n= number of trials (sample size)

p= probability event of interest occurs on any one trial

**Example **: For the guessing at true questions example above, n = 30 and p = .5 (chance of getting any one question right).

**Probabilities for binomial random variables **

The conditions for being a binomial variable lead to a somewhat complicated formula for finding the probability any specific value occurs (such as the probability you get 20 right when you guess as 20 True-False questions.)

We'll use Minitab to find probabilities for binomial random variables. Don't worry about the “by hand” formula. However, for those of you who are curious, the by hand formula for the probability of getting a specific outcome in a binomial experiment is:

\[P(x)= \frac {n!}{x!(n-x)!} p^x (1-p)^{n-x}\]

One can use the formula to find the probability or alternatively, use Minitab or SPSS to find the probability. In the homework, you may use the method that you are more comfortable with unless specified otherwise.

Find *P*(*x*) for *n* = 20, *x* =3, and π = 0.4.

To calculate binomial random variable probabilities in Minitab:

- Open Minitab without data.
- From the menu bar select Calc > Probability Distributions > Binomial.
- Choose Probability since we want to find the probability
*x*= 3. - Enter 20 in the text box for number of trials.
- Enter 0.4 in the text box for probability of success (note for Minitab versions over 14 this now labeled event probability).
- Since we do not have a column of data select the radio button for Input Constant and enter 3.
- Click Ok.

*Minitab output:*

Probability Density Function

Binomial with *n* = 20 and *p* = 0.4

x | P(X = x) |

3.00 | 0.0123 |

To calculate binomial random variable probabilities in Minitab Express:

- Open Minitab Express without data.
- From the menu bar, select Statistics > Probability Distributions > CDF/PDF > Probability (PDF).
- Since we want to find the probability that
*x*= 3, enter 3 into the "Value" box - In the "Distribution" drop down menu, select Binomial.
- Enter 20 into the "Number of trials" box, and 0.4 into the "Event probability" box.
- Select "Display a table of probability density values" to show the output.
- Click Ok

The result should be the following output:

In the following example, we illustrate how to use the formula to compute binomial probabilities. If you don't like to use the formula, you can also just use Minitab to find the probabilities.

**Example by hand:**Cross-fertilizing a red and a white flower produces red flowers 25% of the time. Now we cross-fertilize five pairs of red and white flowers and produce five offspring.

Find the probability that:

a. There will be no red flowered plants in the five offspring.

X= # of red flowered plants in the five offspring. Here, the number of red flowered plants has a binomial distribution with n = 5, p = 0.25.\(P(X=0)=\frac{5!}{0!(5-0)!} p^0 (1-p)^5 =1 \times 0.25^0 \times 0.75^5 =0.237\)

b.

Cumulative ProbabilityThere will less than two red flowered plants.

Answer:\begin{align}

P(X\ is\ 1\ or\ less)&=P(X=0)+P(X=1)\\

&= \frac{5!}{0!(5-0)!} 0.25^0 (1-0.25)^5+\frac{5!}{1!(5-1)!} 0.25^1 (1-0.25)^4\\

& = 0.237 +0.395=0.632 \\

\end{align}

In the previous example, part a was finding the P(X = x) and part b was finding P(X ≤ x). This latter expression is called finding a **cumulative probability** because you are finding the probability that has accumulated from the minimum to some point, i.e. from 0 to 1 in this example

**To use Minitab to solve a cumulative probability binomial problem**, return to Calc > Probability Distributions > Binomial as shown above. Now however, select the radio button for Cumulative Probability and then enter the respective Number of Trials (i.e. 5), Event Probability (i.e. 0.25), and click the radio button for Input Constant and enter the x-value (i.e. 1).

**Expected Value and Standard Deviation for Binomial random variable **

The formula given earlier for discrete random variables could be used, but the good news is that for binomial random variables a shortcut formula for expected value (the mean) and standard deviation are:

\(Expected\ Value=np\)* *** **\(Standard\ Deviation=\sqrt {np(1-p)}\)

After you use this formula a couple of times, you'll realize this formula matches your intuition. For instance, the “expected” number of correct (random) guesses at 30 True-False questions is *np *= (30)(.5) = 15 (half of the questions). For a fair six-sided die rolled 60 times, the expected value of the number of times a “1” is tossed is *np *= (60)(1/6) = 10. The standard deviation for both of these would be, for the True-False test

\(\sqrt{30 \times 0.5 \times (1-0.5)}=\sqrt{7.5}=2.74\)

and for the die

\(\sqrt{60 \times \frac{1}{6}\times (1-\frac {1}{6})}=\sqrt{ \frac{50}{6}}=2.89\)

Previously we discussed discrete random variables, and now we consider the contuous type. A **continuous random variable **is such that all values (to any number of decimal places) within some interval are possible outcomes. A continuous random variable has an infinite number of possible values so we can't assign probabilities to each specific value. If we did, the total probability would be infinite, rather than 1, as it is supposed to be

To describe probabilities for a continuous random variable, we use a *probability density function. *A **probability density function **is a curve such that the area under the curve within any interval of values along the horizontal gives the probability for that interval.

The most commonly encountered type of continuous random variable is a **normal random variable **, which has a symmetric bell-shaped density function. The center point of the distribution is the mean value, denoted by μ (pronounced "mew"). The spread of the distribution is determined by the variance, denoted by σ^{2} (pronounced "sigma squared") or by the square root of the variance called standard deviation, denoted by σ (pronounced "sigma").

**Example **: Suppose vehicle speeds at a highway location have a normal distribution with mean μ = 65 mph and standard deviation *s* = 5 mph. The probability density function is shown below. Notice that the horizontal axis shows speeds and the bell is centered at the mean (65 mph).

**Probability for an Interval = Area under the density curve in that interval **

The next figure shows the probability that the speed of a randomly selected vehicle will be between 60 and 73 mile per hour, with this probability equal to the area under the curve between 60 and 73.

**Empirical Rule Review **

Recall that our first lesson we learned that for bell-shaped data, about 95% of the data values will be in the interval *mean ± *(2 *× std. dev*) . In our example, this is 65 ± (2 × 5), or 55 to 75. The next figure shows that the probability is about 0.95 (about 95%) that a randomly selected vehicle speed is between 55 and 75.

The Empirical Rule also stated that about 99.7% (nearly all) of a bell-shaped dataset will be in the interval *mean ± *(3 × *std. dev*) . This is 65 ± (3 × 5), or 50 to 80 for example. Notice that this interval roughly gives the complete range of the density curve shown above.

Remember that the **cumulative probability **for a value is the probability less than or equal to that value. Minitab, SPSS, Excel, and the TI-83 series of calculators will give the *cumulative probability *for any value of interest in a specific normal curve.

For our example of vehicle speeds, here is Minitab output showing that the probability = 0.9452 that the speed of a randomly selected vehicle is less than or equal to 73 mph.

We can find this probability using either Minitab or SPSS:

To calculate normal random variable probabilities in Minitab:

- Open Minitab without data.
- From the menu bar select Calc > Probability Distribution > Normal.
- Select the radio button for Cumulative Probability (this is the default option)
- In the text box for Mean enter 65
- In the text box for Standard Deviation enter 5
- Since we do not have a column of data select the radio button for Input Constant and enter 73
- Click OK
- The output is as follows:

To calculate normal random variable probabilities in Minitab:

- Open Minitab Express without any data.
- From the menu bar, select Statistics > Probability Distributions > CDF/PDF > Cumulative (CDF).
- Since you want to know the probability that the speed of a randomly selected vehicle is less than or equal to 73 mph, make sure the "Form of inpu" drop-down menu says "A single value" and enter 73 into the "Value" box.
- Make sure the "Distribution" drop-down menu says "Normal", and enter 65 into the "Mean" box and 5 into the "Standard deviation" box.
- Select "Display a table of cumulative probabilities" to show the output.
- Click OK.

The result should be the following output:

Here is a figure that illustrates the cumulative probability we found using this procedure.

Sometimes we want to know the probability that a variable has a value **greater than** some value. For instance, we might want to know the probability that a randomly selected vehicle speed is greater than 73 mph, written P(X > 73).

For our example, probability speed is greater than 73 = 1 - 0.9452 = 0.0548.

• The general rule for a "greater than" situation is

P (greater than a value) = 1 - P(less than or equal to the value)

*Example *: Using Minitab we can find that the probability = 0.1587 that a speed is less than or equal to 60 mph. Thus the probability a speed is greater than 60 mph = 1 - 0.1587 = 0.8413.

The relevant Minitab output and a figure showing the cumulative probability for 60 mph follows:

Suppose we want to know the probability a normal random variable is **within** a specified interval. For instance, suppose we want to know the probability a randomly selected speed is between 60 and 73 mph. The simplest approach is to subtract the cumulative probability for 60 mph from the cumulative probability for 73. The answer is

Probability speed is between 60 and 73 = 0.9452 − 0.1587 = 0.7875.

This can be written as P(60 < X < 73) = 0.7875, where X is speed.

• The general rule for an "in between" probability is

P( between *a *and *b *) = cumulative probability for value *b *− cumulative probability for value *a *

Table A.1 in the textbook gives normal curve cumulative probabilities for standardized scores.

- A
**standardized score**(also called*z*-score) is \(z=\frac{value-mean}{s.d.} = \frac{x-\mu}{\sigma}\). - Row labels of Table A.1 give possible
*z*-scores up to one decimal place. The column labels give the second decimal place of the*z*-score.

The cumulative probability for a value equals the cumulative probability for that value's *z*-score. Here, probability speed less than or equal 73 mph = probability z-score less than or equal 1.60. How did we arrive at this z-score?

**Example **

In our vehicle speed example, the standardized scores for 73 mph is

\[z=\frac{73-65}{5}=1.60\]

We look in the ".00" column of the "1.6" row (1.6 **plus** .00 equals 1.60) to find that the cumulative probability for z = 1.60 is 0.9452, the same value we got earlier as the cumulative probability for speed = 73 mph.

**Example**

For speed = 60 the z-score is

\[z=\frac{60-65}{5}=-1.00\]

Table A.1 gives this information:

The cumulative probability is .1587 for *z* = -1.00 and this is also the cumulative probability for a speed of 60 mph.

**Example**

Suppose pulse rates of adult females have a normal curve distribution with mean μ =75 and standard deviation s = 8. What is the probability that a randomly selected female has a pulse rate **greater than 85 **? *Be careful *! Notice we want a "greater than" and the interval we want is entirely above average, so we know the answer must be less than 0.5.

If we use Table A.1, the first step is to calculate a z-score of 85.

\[z=\frac{85-75}{8}=1.25\]

Information from Table A.1 is

Use the "05" column to find that the cumulative probability for z = 1.25 is 0.8944.

This is not yet the answer. This is the probability the pulse is less than or equal to 85. We want a greater than probability so the answer is:

P(greater than 85) = 1 - P(less than or equal 85) = 1 − 0.8944 = **0.1056. **

We may wish to know the value of a variable that is a specified percentile of the values.

- We might ask what speed is the 99.99 th percentile of speeds at the highway location in our earlier example.
- We might want to know what pulse rate is the 25 th percentile of pulse rates.

To calculate percentiles in Minitab:

- Open Minitab without data.
- From the menu bar select Calc>Probability Distribution> Normal.
- Select the radio button for Inverse Cumulative Probability
- In the text box for Mean enter 65
- In the text box for Standard Deviation enter 5
- Since we do not have a column of data select the radio button for Input Constant and enter 0.9999
- Click OK
- The output is as follows:

To calculate percentiles in Minitab Express:

- Open Minitab without data.
- From the menu bar, select Statistics > Probability Distributions > CDF/PDF > Inverse (ICDF).
- Since we do not have a column of data, make sure the form of input is "A single value", and enter 0.9999.
- Make sure the distribution is "Normal", and enter the mean of 65 and the standard deviation of 5.
- Under output, select "Display a table of inverse cumulative probabilities".
- Click OK.

The result should be the following output:

Note:

- The 99.99 th percentile of speeds (when mean = 65 and standard deviation = 5) is about 83.6 mph. Output from Minitab follows. Notice that now the specified cumulative probability is given first, and then the corresponding speed.

- The 25 th percentile of pulse rates (when μ = 75 and s= 8) is about 69.6. Relevant Minitab output is:

Remember binomial random variables from last week's discussion? A binomial random variable can also be approximated by using normal random variable methods discussed above. This approximation can take place as long as:

- The population size must be
**at least**10 times the sample size. *np*= 10**and***n*(1 −*p*) = 10. [These constraints take care of population shapes that are unbalanced because*p*is too close to 0 or to 1.]

The mean of a binomial random variable is easy to grasp intuitively: Say the probability of success for each observation is 0.2 and we make 10 observations. Then on the average we should have 10 * 0.2 = 2 successes. The spread of a binomial distribution is not so intuitive, so we will not justify our formula for standard deviation.

If sample count X of successes is a binomial random variable for *n *fixed observations with probability of success *p *for each observation, then X has a mean and standard deviation as discussed in section 8.4 of:

\(Mean=np\) and \(Standard\ Deviation=\sqrt {np(1-p)}\)

And as long as the above 2 requirements are for *n* and *p* are satisfied, we can approximate X with a normal random variable having the same mean and standard deviation and use the normal calculations discussed previously in these notes to solve for probabilities for X.

Click on the 'Inspect' icons below for an audio/visual example for each situation described. When reviewing any of these examples keep in mind that they apply when:

1. The variable in question follows a normal, or bell-shaped, distribution

2. If the variable is not in standardized, then you need to standardized the value first by \(z=\frac{value-mean}{s.d.} = \frac{x-\mu}{\sigma}\).

Finding "Less Than" Probability [transcript] | |

Finding "Greater Than" Probability [transcript] | |

Finding "Between" Probability [transcript] | |

Finding "Either / Or" Probability [transcript] |