Upon successful completion of this lesson, you should be able to:

- find the area under a normal distribution using Minitab Express and StatKey.
- compute a test statistic for a normally distributed sampling distribution.
- construct a confidence interval at any level of confidence using a normally distributed sampling distribution.
- compute the mean and standard deviation of a binomial distribution.
- find probabilities associated with a binomial distribution using Minitab Express.

Over the last three lessons you have approximated sampling distributions using bootstrapping and randomization methods. You may have noticed that many of the distributions that you constructed had similar shapes, such as those below:

These four distributions are all approximately normally distributed. You were first introduced to the normal distribution in Lesson 2 as a special type of symmetrical distribution also known as a "bell-shaped" distribution.

In this lesson we will learn about how many sampling distributions can be approximated using a normal distribution. At the end of this lesson you will also be introduced to the binomial distribution which can also be used to approximate some sampling distributions for proportions when sample sizes are smaller.

The content on this page is covered in section P.5 of the Lock^5 textbook and is also reviewed in section 5.1.

A normal distribution is a bell-shaped distribution. Theoretically, a normal distribution is continuous and may be depicted as a density curve, such as the one below:

The distribution plot above is a **standard normal distribution**. A standard normal distribution has a mean of 0 and standard deviation of 1. This is also known as a **z distribution**. You may see the notation \(N(\mu, \sigma\)) where N signifies that the distribution is normal, \(\mu\) is the mean, and \(\sigma\) is the standard deviation. A z distribution is \(N(0,1)\).

While we cannot determine the probability for any one given value because the distribution is continuous, we can determine the probability for a given interval of values. The probability for an interval is equal to the area under the density curve. The total area under the curve is 1.00, or 100%. In other words, 100% of observations fall under the curve.

In Lesson 4 you used the standard error method to construct a confidence interval, this is one application of the normal distribution. When you constructed confidence intervals using the standard error method, you used a multiplier of 2 because approximately 95% of the normal distribution falls between \(z=-2\) and \(z=+2\):

On the next few pages you'll learn how to use Minitab Express and StatKey to construct probability distribution plots. Then, we'll see how we can construct these plots for sampling distributions to construct confidence intervals or to compute p-values.

Before moving forward to learn how we can use Minitab Express and StatKey to work with normal distributions, let's review a few related topics from Lesson 2.

Recall from Lesson 2, the Empirical Rule can be used to estimate the proportion of observations that should fall within the intervals of one, two, and three standard deviations of the mean on a normal distribution:

Middle 68% of observations: \(\mu\pm 1(\sigma)\)

Middle 95% of observations: \(\mu\pm 2(\sigma)\)

Middle 99.7% of observations: \(\mu\pm 3(\sigma)\)

For the SAT-Math scores are \(N(500, 100)\). Let's apply the Empirical Rule to determine the SAT-Math scores that separate the middle 68% of scores, the middle 95% of scores, and the middle 99.7% of scores.

Middle 68%: \(500\pm1(100)=[400, 600]\)

Middle 95%: \(500\pm2(100)=[300, 700]\)

Middle 99.7%: \(500\pm 3(100)= [200, 800]\)

In Lesson 2 we wanted to describe one observation in relation to the distribution of all observations. We did this using a ** z score**.

** z score**: Distance between an individual score and the mean in standard deviation units; also known as a

*z* Score

\[z=\frac{x - \overline{x}}{s}\]

*z* = *z* score*x* = original data value

\(\overline{x}\) = mean of the original distribution*s* = standard deviation of the original distribution

This equation could also be rewritten in terms of population values: \(z=\frac{x-\mu}{\sigma}\)

With algebra, we can solve for x: \(x=\mu+z\sigma\). In doing so we can transform a score on a z distribution [\(N(0,1)\)] to any other normal distribution with a mean of \(\mu\) and standard deviation of \(\sigma\).

Minitab Express can be used to find the proportion of a normal distribution in a given range. For example, if we know that at one highway location vehicles' speeds are normally distributed with a mean of 65 mph and a standard deviation of 5 mph, we can use that information to determine what proportion of vehicles are going under or over the speed limit. In the next pages we will use this scenario to identify the proportion of vehicles at that spot going different speeds.

The** cumulative probability **for a value is the probability less than or equal to that value. In notation, this is \(P(X\leq x)\)

The proportion at or below a given value is also known as a **percentile**.

Let's look at an example in Minitab Express.

**Scenario: **Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph.

**Question: **What is the probability that a randomly selected vehicle will be going 73 mph or slower?

To calculate a probability for values **less than** a given value in Minitab Express:

- On a
**PC**: from the menu select**STATISTICS > Distribution Plot**

On a**Mac**:**Statistics > Probability Distributions > Distribution Plot** - Select
*Display Probability* - For
*Distribution*select*Normal*(Note: This is the default) - For
*Mean*enter 65 - For
*Standard deviation* - Select
*A specified X value* - Select
*Left tail* - For
*X value*enter 73

This should result in the following output:

Sometimes we want to know the probability that a variable has a value greater than some value. For instance, we might want to know the probability that a randomly selected vehicle speed is greater than 73 mph, written \(P(X > 73)\).

Previously we found \(P(X<73)=.9452\). The general rule for a "greater than" situation is\(P(X > x)=1-P(X \leq x)\). Thus, \(P(X>73)=1-.9452=.0548\). The probability that a randomly selected vehicle will be going 73 mph or greater is .0548.

If we did not know \(P(X \leq73)\) we could compute this probability by constructing a probability distribution in Minitab Express.

To calculate a probability for values **greater than** a given value in Minitab Express:

- On a
**PC**: from the menu select**STATISTICS > Distribution Plot**

On a**Mac**: from the menu select**Statistics > Probability Distributions > Distribution Plot** - Select
*Display Probability* - For
*Distribution*select*Normal*(Note: This is the default) - For
*Mean*enter 65 - For
*Standard deviation* - Select
*A specified X value* - Select
*Right tail* - For
*X value*enter 73

This should result in the following output:

Suppose we want to know the probability a normal random variable is **within **a specified interval. For instance, suppose we want to know the probability a randomly selected vehicle is between 60 and 73 mph?

To calculate a probability for values **in between** given values in Minitab Express:

- On a
**PC**: from the menu select**STATISTICS > Distribution Plot**

On a**Mac**: from the menu select**Statistics > Probability Distributions > Distribution Plot** - Select
*Display Probability* - For
*Distribution*select*Normal*(Note: This is the default) - For
*Mean*enter 65 - For
*Standard deviation* - Select
*A specified X value* - Select
*Middle* - For
*X value 1*enter 60 and for*X value 2*enter 73

This should result in the following output:

We can also use Minitab Express to find a score or scores associated with a given proportion of a normal distribution. We will continue with our example involve car speeds with \(\mu = 65\) and \(\sigma = 5\). What speed separates the top 10% of car from the bottom 90% of cars?

To find a score associated with a proportion of a normal distribution in Minitab Express:

- On a
**PC**: from the menu select**STATISTICS > Distribution Plot**

On a**Mac**: from the menu select**Statistics > Probability Distributions > Distribution Plot** - Select
*Display Probability* - For
*Distribution*select*Normal*(Note: This is the default) - For
*Mean*enter 65 - For
*Standard deviation* - Select
*A specified probability* - Select
*Right tail* - For
*Probability*enter 0.10

Note: You could also select *Left tail* and enter 0.90 as the probability.

This should result in the following output:

What speed separates the middle 90% of cars from the most extreme 10% of cars? Note that the outer 10% will be split evently between the right and left tails in this scenario.

To find a range of scores associated with proportion of a normal distribution in Minitab Express:

- On a
**PC**: from the menu select**STATISTICS > Distribution Plot**

On a**Mac**: from the menu select**Statistics > Probability Distributions > Distribution Plot** - Select
*Display Probability* - For
*Distribution*select*Normal*(Note: This is the default) - For
*Mean*enter 65 - For
*Standard deviation* - Select
*A specified probability* - Select
*Middle* - For
*Probability 1*enter 0.05 - For
*Probability 2*

Note: You could also select *Equal tails* and enter 0.10 as the area in the tails

This should result in the following output:

This video walks through one example of using Minitab Express to find the scores that separate different given proportions of scores. A group of instructors have decided to assign grades on a curve. Given the mean and standard deviation of their students' scores, they want to know what point ranges are associated with each letter grade.

Similar to Minitab Express, StatKey may be used to determine the area under the normal distribution.

As we saw at the beginning of this lesson, many of the sampling distributions that you have constructed and worked with this semester are approximately normally distributed. The **Central Limit Theorem** states that if the sample size is sufficiently large then the sampling distribution will be approximately normally distributed for many frequently tested statistics, such as those that we have been working with in this course: one sample mean, one sample proportion, difference in two means, difference in two proportions, the slope of a simple linear regression model, and Pearson's *r* correlation. Over the next few lessons we will examine what constitutes a "sufficiently large" sample size. Essentially, it is determined by the point at which the sampling distribution becomes approximately normally distributed.

In practice, when we construct confidence intervals and conduct hypothesis tests we often use the normal distribution (or *t* distributions which you'll see next week) as opposed to bootstrapping or randomization procedures in situations when the sampling distribution is approximately normal.

When we can approximate the sampling distribution using a normal distribution we can use some general formulas to compute test statistics and confidence intervals. In the remaining lessons of the course you will be given the formulas for computing test statistics and standard errors. Here, we are going to look at the general concepts that underlay all of the formulas that you will see in the following weeks.

When using a normal distribution, the test statistic is the standardized value that is the boundary of the p-value. Recall the formula for a z score: \(z=\frac{x-\overline x}{s}\). The formula for a test statistic will be similar. When conducting a hypothesis test the sampling distribution will be centered on the null parameter and the standard deviation is known as the standard error.

**General Form of a Test Statistic**

\[test\;statistic=\frac{sample\;statistic-null\;parameter}{standard\;error}\]

This formula puts our observed sample statistic on a standard scale (e.g., z distribution).

The normal distribution can also be used to construct confidence intervals. You used this method when you first learned to construct confidence intervals using the standard error method. Recall the formula you used:

**95% Confidence Interval**

sample statistic \(\pm\) 2 (standard error)

The 2 in this formula comes from the normal distribution. According to the 95% Rule, approximately 95% of a normal distribution falls within 2 standard deviations of the mean. Using the normal distribution, we can conduct a confidence interval for any level using the following general formula:

**General Form of a Confidence Interval**

sample statistic \(\pm\) z* (standard error)

\(z^*\) is the multiplierThe \(z^*\) multiplier can be found on a z distribution. For a 90% confidence interval, for example, we would find the z scores that separate the middle 90% of the z distribution from the outer 10% of the z distribution.

**Research question**: Are more than 50% of all World Campus STAT 200 students female?

\(H_0: p=0.50\)

\(H_a: p>0.50\)

Data were collected from a representative sample of 501 World Campus STAT 200 students. In that sample, 284 students were female and 217 were male.

Next we'll learn how to compute the standard error using formulas. Here, let's using StatKey to conduct a randomization test to estimate the standard error:

This randomization distribution gives us a standard error of 0.022.

We can estimate the sampling distribution using a normal distribution with a mean of 0.50 (the value from our null hypothesis) and the standard error of 0.022:

We could use this distribution to compute the p-value. This is a right-tailed test. Our p-value is the area under this curve that is greater than our observed sample proportion of \(\widehat p = 284/501 = 0.567\)

Our p-value is 0.0011616

We could also compute a z test statistic and use that value to find the p-value:

\(z=\frac{sample\;statistic-null\;parameter}{standard\;error}\)

\(z=\frac{0.567-0.50}{0.022}=3.045\)

Our p-value will be the area under the z distribution [\(N(0,1)\)] above \(z=3.045\)

Our p-value is 0.0011634

The slight variation in p-values between these two methods is due to rounding.

We want to estimate the proportion of all Reese's Pieces that are orange with 90% confidence. In a random sample of 150 Reese's Pieces, 72 were orange.

We will estimate this population proportion by constructing a 90% confidence interval.

**General Form of a Confidence Interval**

sample statistic \(\pm\) z* (standard error)

\(z^*\) is the multiplierThe sample statistic here is \(\widehat p = \frac{72}{150} = 0.48\)

The \(z^*\) multiplier can be found using Minitab Express or StatKey. This the z score the separates the middle 90% of the z distribution from the outer 10% (i.e., 5% on the left and 5% on the right).

For a 90% confidence interval, \(z^* = 1.64485\)

The standard error can be estimating using bootstrapping methods in StatKey.

From the StatKey output above, the standard error is 0.041

We can combine this information to construct the confidence interval.

\(0.48 \pm 1.64485 (0.041)\)

\(0.48 \pm 0.067\)

\([0.413, 0.547]\)

We are 90% confident that in the population of all Reece's Pieces between 0.413 and 0.547 are orange.

Binomial probabilities are covered in section P.4 of the Lock^5 textbook.

With proportions, when the sample size is small it is inappropriate to approximate the sampling distribution with a normal distribution. In those cases a binomial distribution may be used to approximate the sampling distribution.

**Binomial random variable**: A specific type of discrete random variable that counts how often a particular event occurs in a fixed number of tries or trials

For a variable to be a **binomial random variable**, **ALL** of the following conditions must be met:

- There are a fixed number of trials (a fixed sample size)
- The probability of a success is the same on each trial
- Trials are independent of one another

- Number of correct guesses at 30 true-false questions when you randomly guess all answers
- Number of winning lottery tickets when you buy 10 tickets of the same kind
- Number of left-handers in a randomly selected sample of 100 unrelated people
- Number of tails when flipping a coin 10 times

**Notation **

n= number of trials

p= probability event of interest occurs on any one trial

Number of correct guesses at 30 true-false questions when you randomly guess all answers.

There are 30 trials, therefore *n* = 30.

There are two possible outcomes (true and false) that are equally probable, therefore \(p = \frac{1}{2} = 0.5\).

Here you will learn how to compute a binomial probability by hand. When possible, you should Minitab Express to perform these calculations. You will learn how to do this on the next page.

The formula below is used to compute the probability of exactly k successes in n trials.

**Binomial Random Variable Probability**

\[P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}\]

*n* = number of trials*k* = number of successes*p* = probability event of interest occurs on any one trial

\(\binom{n}{k}\) is the notation for a combination. For a review of combinations see the algebra review page in Lesson 0.

Cross-fertilizing a red flower and a white flower produces red flowers 25% of the time. Now we cross-fertilize five pairs of red and white flowers and produce five offspring. Find the probability that there will be no red flowered plants in the five offspring.

The number of red flowered plants has a binomial distribution with n = 5, p = .25

We want to find \(P(X=0)\)

\(P(X=0)=\binom{5}{0}0.25^0(1-0.25)^{5-0}=0.237\)

There is a 23.7% chance that none of the five plants will be red flowered.

When possible you should use Minitab Express to compute binomial probabilities. This is much more efficient than performing calculations by hand.

Minitab Express can be used to construct binomial distributions and compute the probability of \(P(X=k)\):

When flipping a fair coin the probability of flipping heads is 0.50. Let's say that we are going to be flipping a fair coin 10 times and we want to know the probability of flipping heads exactly five times.

To compute a binomial probability in Minitab Express:

- On a
**PC**: Select**STATISTICS > Distribution Plot**

On a**Mac**:**Statistics > Probability Distributions > Distribution Plot** - Select
*Display Probability* - For
*Distribution*select*Binomial* - For
*Number of trials*enter 10 - For
*Event probability* - Select
*A specified X value* - Select
*Middle* - For both
*X value 1*and*X value 2*enter 5

The output is a binomial distribution plot. The area that is shaded in red is the proportion of the distribution where X=5.

\(P(X=5)=0.246094\)

You can also use Minitab Express to find the probability of a range of outcomes:

Let's say that we are going to be flipping a fair coin 10 times and we want to know the probability of flipping heads five or more times.

To compute a binomial probability for a range of outcomes in Minitab Express:

- On a PC: Select
**Statistics**> Distribution Plot

On a Mac:**Statistics > Probability Distributions > Distribution Plot** - Select
*Display Probability* - For
*Distribution*select*Binomial* - For
*Number of trials*enter 10 - For
*Event probability* - Select
*A specified X value* - Select
*Right tail* - For
*X value*enter 5

The output is a binomial distribution plot displaying \(P(X \geq 5)=0.623047\)

Given the sample size (\(n\)) and the probability of a success (\(p\), we can compute the mean and standard deviation for a binomial random variable. The mean (\(\mu\)) is also known as the expected value (\(E(X)\)).

**Mean of a Binomial Random Variable**

\[\mu=np\]

Also known as \(E(X)\)

**Standard Deviation of a Binomial Random Variable**

\[\sigma=\sqrt {np(1-p)}\]

In both formulas \(n\) is the number of trials and \(p\) is the probability of success on any one trial.

Cross-fertilizing a red flower and a white flower produces red flowers 25% of the time. Now we cross-fertilize five pairs of red and white flowers and produce five offspring. Find the mean and standard deviation for the number of red flowers.

The number of red flowered plants has a binomial distribution with n = 5, p = 0.25

\(\mu=5(0.25) = 1.25\)

\(\sigma = \sqrt{5(0.25)(1-0.25)}=\sqrt{0.9375} =0.968\)

The mean number of red flowered plants when five plants are planted is 1.25 with a standard deviation of 0.968.

In this lesson we examined two commonly used probability distributions: the normal distribution and binomial distribution. In the next few lessons you will learn how these distributions are often used to construct confidence intervals and conduct hypothesis tests.

Upon successful completion of this lesson, you should be able to:

- find the area under a normal distribution using Minitab Express and StatKey.
- compute a test statistic for a normally distributed sampling distribution.
- construct a confidence interval at any level of confidence using a normally distributed sampling distribution.
- compute the mean and standard deviation of a binomial distribution.
- find probabilities associated with a binomial distribution using Minitab Express.