Lesson 7: Common Distributions

Lesson 7 Learning Objectives

Upon successful completion of this lesson, you should be able to:

Over the last three lessons you have approximated sampling distributions using bootstrapping and randomization methods. You may have noticed that many of the distributions that you constructed had similar shapes, such as those below:

Randomization distribution for a proportion  Bootstrap distribution for a single mean   Bootstrap distribution for a correlation   Randomization distribution for difference in means 

These four distributions are all approximately normally distributed. You were first introduced to the normal distribution in Lesson 2 as a special type of symmetrical distribution also known as a "bell-shaped" distribution.

In this lesson we will learn about how many sampling distributions can be approximated using a normal distribution. At the end of this lesson you will also be introduced to the binomial distribution which can also be used to approximate some sampling distributions for proportions when sample sizes are smaller. 

7.1 - The Normal Distribution

The content on this page is covered in section P.5 of the Lock^5 textbook and is also reviewed in section 5.1.

A normal distribution is a bell-shaped distribution. Theoretically, a normal distribution is continuous and may be depicted as a density curve, such as the one below: 

Standard normal distribution plot from Minitab Express

The distribution plot above is a standard normal distribution.  A standard normal distribution has a mean of 0 and standard deviation of 1. This is also known as a z distribution. You may see the notation \(N(\mu, \sigma\)) where N signifies that the distribution is normal, \(\mu\) is the mean, and \(\sigma\) is the standard deviation. A z distribution is \(N(0,1)\). 

While we cannot determine the probability for any one given value because the distribution is continuous, we can determine the probability for a given interval of values.  The probability for an interval is equal to the area under the density curve. The total area under the curve is 1.00, or 100%.  In other words, 100% of observations fall under the curve.

 

In Lesson 4 you used the standard error method to construct a confidence interval, this is one application of the normal distribution. When you constructed confidence intervals using the standard error method, you used a multiplier of 2 because approximately 95% of the normal distribution falls between \(z=-2\) and \(z=+2\):

 

Standard normal distribution in StatKey for +\- 2

On the next few pages you'll learn how to use Minitab Express and StatKey to construct probability distribution plots.  Then, we'll see how we can construct these plots for sampling distributions to construct confidence intervals or to compute p-values. 

7.1.1 - Reviews

Before moving forward to learn how we can use Minitab Express and StatKey to work with normal distributions, let's review a few related topics from Lesson 2.

Empirical Rule / 95% Rule

distribution displaying the features of the empirical rule

Recall from Lesson 2, the Empirical Rule can be used to estimate the proportion of observations that should fall within the intervals of one, two, and three standard deviations of the mean on a normal distribution:

Middle 68% of observations: \(\mu\pm 1(\sigma)\)

Middle 95% of observations: \(\mu\pm 2(\sigma)\)

Middle 99.7% of observations: \(\mu\pm 3(\sigma)\)

Example: SAT-Math Scores

For the SAT-Math scores are \(N(500, 100)\). Let's apply the Empirical Rule to determine the SAT-Math scores that separate the middle 68% of scores, the middle 95% of scores, and the middle 99.7% of scores. 

Middle 68%: \(500\pm1(100)=[400, 600]\)

Middle 95%: \(500\pm2(100)=[300, 700]\)

Middle 99.7%: \(500\pm 3(100)= [200, 800]\)

 

z Scores

In Lesson 2 we wanted to describe one observation in relation to the distribution of all observations. We did this using a z score.

z score: Distance between an individual score and the mean in standard deviation units; also known as a standardized score.

z Score

\[z=\frac{x - \overline{x}}{s}\]

z = z score
x = original data value
\(\overline{x}\) = mean of the original distribution
s = standard deviation of the original distribution

This equation could also be rewritten in terms of population values: \(z=\frac{x-\mu}{\sigma}\)

With algebra, we can solve for x: \(x=\mu+z\sigma\). In doing so we can transform a score on a z distribution [\(N(0,1)\)] to any other normal distribution with a mean of \(\mu\) and standard deviation of \(\sigma\). 

7.1.2 - Finding Probabilities using Minitab Express

Minitab Express can be used to find the proportion of a normal distribution in a given range. For example, if we know that at one highway location vehicles' speeds are normally distributed with a mean of 65 mph and a standard deviation of 5 mph, we can use that information to determine what proportion of vehicles are going under or over the speed limit. In the next pages we will use this scenario to identify the proportion of vehicles at that spot going different speeds. 

7.1.2.1 - "Less Than" Probabilities

The cumulative probability for a value is the probability less than or equal to that value. In notation, this is \(P(X\leq x)\)

The proportion at or below a given value is also known as a percentile

 

Let's look at an example in Minitab Express.

Scenario: Vehicle speeds at a highway location have a normal distribution with a mean of 65 mph and a standard deviation of 5 mph.

Question: What is the probability that a randomly selected vehicle will be going 73 mph or slower?

Minitab logoUsing Minitab Express

To calculate a probability for values less than a given value in Minitab Express:

  1. On a PC: from the menu select STATISTICS > Distribution Plot
    On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
  2. Select Display Probability 
  3. For Distribution select Normal(Note: This is the default)
  4. For Mean enter 65
  5. For Standard deviation enter 5
  6. Select A specified X value
  7. Select Left tail
  8. For X value  enter 73

This should result in the following output:

Minitab Express Output: Probability Less Than 73 mph

Video Review (no sound)

7.1.2.2 - "Greater than" Probabilities

Sometimes we want to know the probability that a variable has a value greater than some value. For instance, we might want to know the probability that a randomly selected vehicle speed is greater than 73 mph, written \(P(X > 73)\).

Previously we found \(P(X<73)=.9452\). The general rule for a "greater than" situation is\(P(X > x)=1-P(X \leq x)\). Thus, \(P(X>73)=1-.9452=.0548\). The probability that a randomly selected vehicle will be going 73 mph or greater is .0548.

If we did not know \(P(X \leq73)\) we could compute this probability by constructing a probability distribution in Minitab Express.

Minitab logoUsing Minitab Express

To calculate a probability for values greater than a given value in Minitab Express:

  1. On a PC: from the menu select STATISTICS > Distribution Plot
    On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
  2. Select Display Probability 
  3. For Distribution select Normal(Note: This is the default)
  4. For Mean enter 65
  5. For Standard deviation enter 5
  6. Select A specified X value
  7. Select Right tail
  8. For X value  enter 73

This should result in the following output:

distribution plot for binomial where mean=65 and sd=5

Video Review (no sound)

7.1.2.3 - "In between" Probabilities

Suppose we want to know the probability a normal random variable is within a specified interval. For instance, suppose we want to know the probability a randomly selected vehicle is between 60 and 73 mph?

Minitab logoUsing Minitab Express

To calculate a probability for values in between given values in Minitab Express:

  1. On a PC: from the menu select STATISTICS > Distribution Plot
    On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
  2. Select Display Probability 
  3. For Distribution select Normal (Note: This is the default)
  4. For Mean enter 65
  5. For Standard deviation enter 5
  6. Select A specified X value
  7. Select Middle
  8. For X value 1 enter 60 and for X value 2 enter 73

This should result in the following output:

Minitab Express Output: Probability that the speed is between 60 and 73 mph

Video Review (no sound)

7.1.2.4 - Finding a Score Given a Probability

We can also use Minitab Express to find a score or scores associated with a given proportion of a normal distribution. We will continue with our example involve car speeds with \(\mu = 65\) and \(\sigma = 5\). What speed separates the top 10% of car from the bottom 90% of cars?

Minitab logoUsing Minitab Express - Find a Score for a Proportion

To find a score associated with a proportion of a normal distribution in Minitab Express:

  1. On a PC: from the menu select STATISTICS > Distribution Plot
    On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
  2. Select Display Probability 
  3. For Distribution select Normal (Note: This is the default)
  4. For Mean enter 65
  5. For Standard deviation enter 5
  6. Select A specified probability
  7. Select Right tail
  8. For Probability enter 0.10

Note: You could also select Left tail and enter 0.90 as the probability. 

This should result in the following output:

Finding a score given a probabiity

Video Review (no sound)

What speed separates the middle 90% of cars from the most extreme 10% of cars? Note that the outer 10% will be split evently between the right and left tails in this scenario. 

Minitab logoUsing Minitab Express - Find a Range of Scores for a Proportion

To find a range of scores associated with proportion of a normal distribution in Minitab Express:

  1. On a PC: from the menu select STATISTICS > Distribution Plot
    On a Mac: from the menu select Statistics > Probability Distributions > Distribution Plot
  2. Select Display Probability 
  3. For Distribution select Normal (Note: This is the default)
  4. For Mean enter 65
  5. For Standard deviation enter 5
  6. Select A specified probability
  7. Select Middle
  8. For Probability 1 enter 0.05
  9. For Probability 2 enter 0.05

Note: You could also select Equal tails and enter 0.10 as the area in the tails

This should result in the following output:

Finding a range of scores given a probability

Video Review (no sound)

7.1.2.5 - Video Example: Curving Scores

This video walks through one example of using Minitab Express to find the scores that separate different given proportions of scores. A group of instructors have decided to assign grades on a curve. Given the mean and standard deviation of their students' scores, they want to know what point ranges are associated with each letter grade.

7.1.3 - Finding Probabilities using StatKey

Similar to Minitab Express, StatKey may be used to determine the area under the normal distribution. 

7.2 - Central Limit Theorem

As we saw at the beginning of this lesson, many of the sampling distributions that you have constructed and worked with this semester are approximately normally distributed. The Central Limit Theorem states that if the sample size is sufficiently large then the sampling distribution will be approximately normally distributed for many frequently tested statistics, such as those that we have been working with in this course: one sample mean, one sample proportion, difference in two means, difference in two proportions, the slope of a simple linear regression model, and Pearson's r correlation. Over the next few lessons we will examine what constitutes a "sufficiently large" sample size. Essentially, it is determined by the point at which the sampling distribution becomes approximately normally distributed.

In practice, when we construct confidence intervals and conduct hypothesis tests we often use the normal distribution (or t distributions which you'll see next week) as opposed to bootstrapping or randomization procedures in situations when the sampling distribution is approximately normal. 

7.3 - Using the Normal Distribution

When we can approximate the sampling distribution using a normal distribution we can use some general formulas to compute test statistics and confidence intervals. In the remaining lessons of the course you will be given the formulas for computing test statistics and standard errors. Here, we are going to look at the general concepts that underlay all of the formulas that you will see in the following weeks. 

Test Statistics

When using a normal distribution, the test statistic is the standardized value that is the boundary of the p-value. Recall the formula for a z score: \(z=\frac{x-\overline x}{s}\). The formula for a test statistic will be similar. When conducting a hypothesis test the sampling distribution will be centered on the null parameter and the standard deviation is known as the standard error. 

General Form of a Test Statistic

\[test\;statistic=\frac{sample\;statistic-null\;parameter}{standard\;error}\]

This formula puts our observed sample statistic on a standard scale (e.g., z distribution).

Confidence Intervals

The normal distribution can also be used to construct confidence intervals.  You used this method when you first learned to construct confidence intervals using the standard error method. Recall the formula you used:

95% Confidence Interval

sample statistic \(\pm\) 2 (standard error)

The 2 in this formula comes from the normal distribution. According to the 95% Rule, approximately 95% of a normal distribution falls within 2 standard deviations of the mean. Using the normal distribution, we can conduct a confidence interval for any level using the following general formula:

General Form of a Confidence Interval

sample statistic \(\pm\) z* (standard error)

\(z^*\) is the multiplier

The \(z^*\) multiplier can be found on a z distribution. For a 90% confidence interval, for example, we would find the z scores that separate the middle 90% of the z distribution from the outer 10% of the z distribution.

7.3.1 - Example: Hypothesis Testing

Research question: Are more than 50% of all World Campus STAT 200 students female?

\(H_0: p=0.50\)
\(H_a: p>0.50\)

Data were collected from a representative sample of 501 World Campus STAT 200 students. In that sample, 284 students were female and 217 were male. 

Next we'll learn how to compute the standard error using formulas.  Here, let's using StatKey to conduct a randomization test to estimate the standard error:

StatKey Randomization distirbution plot

This randomization distribution gives us a standard error of 0.022. 

We can estimate the sampling distribution using a normal distribution with a mean of 0.50 (the value from our null hypothesis) and the standard error of 0.022:

Minitab Express Output for the sampling distirbution

We could use this distribution to compute the p-value.  This is a right-tailed test. Our p-value is the area under this curve that is greater than our observed sample proportion of \(\widehat p = 284/501 = 0.567\)

Sampling distribution with p value shaded

Our p-value is 0.0011616

 

We could also compute a z test statistic and use that value to find the p-value:

\(z=\frac{sample\;statistic-null\;parameter}{standard\;error}\)

\(z=\frac{0.567-0.50}{0.022}=3.045\)

Our p-value will be the area under the z distribution [\(N(0,1)\)] above \(z=3.045\)

Minitab Express z Distribution with pvalue

Our p-value is 0.0011634

The slight variation in p-values between these two methods is due to rounding. 

7.3.2 - Example: Confidence Interval

We want to estimate the proportion of all Reese's Pieces that are orange with 90% confidence. In a random sample of 150 Reese's Pieces, 72 were orange. 

We will estimate this population proportion by constructing a 90% confidence interval. 

General Form of a Confidence Interval

sample statistic \(\pm\) z* (standard error)

\(z^*\) is the multiplier

The sample statistic here is \(\widehat p = \frac{72}{150} = 0.48\)

The \(z^*\) multiplier can be found using Minitab Express or StatKey. This the z score the separates the middle 90% of the z distribution from the outer 10% (i.e., 5% on the left and 5% on the right).

Minitab Express output for z multipliers, 90% CI

For a 90% confidence interval, \(z^* = 1.64485\) 

The standard error can be estimating using bootstrapping methods in StatKey.

StatKey Screenshot of Bootstrap Distribution

From the StatKey output above, the standard error is 0.041

 

We can combine this information to construct the confidence interval.

\(0.48 \pm 1.64485 (0.041)\)

\(0.48 \pm 0.067\)

\([0.413, 0.547]\)

We are 90% confident that in the population of all Reece's Pieces between 0.413 and 0.547 are orange. 

7.4 - Binomial Probabilities

Binomial probabilities are covered in section P.4 of the Lock^5 textbook.

With proportions, when the sample size is small it is inappropriate to approximate the sampling distribution with a normal distribution. In those cases a binomial distribution may be used to approximate the sampling distribution. 

 

 

Binomial random variable: A specific type of discrete random variable that counts how often a particular event occurs in a fixed number of tries or trials

For a variable to be a binomial random variable, ALL of the following conditions must be met:

  1. There are a fixed number of trials (a fixed sample size)
  2. The probability of a success is the same on each trial
  3. Trials are independent of one another

Examples of Binomial Random Variables

  • Number of correct guesses at 30 true-false questions when you randomly guess all answers
  • Number of winning lottery tickets when you buy 10 tickets of the same kind
  • Number of left-handers in a randomly selected sample of 100 unrelated people
  • Number of tails when flipping a coin 10 times

Notation

n = number of trials

 p = probability event of interest occurs on any one trial

Example: True-False Test

Number of correct guesses at 30 true-false questions when you randomly guess all answers.
There are 30 trials, therefore n = 30.
There are two possible outcomes (true and false) that are equally probable, therefore \(p = \frac{1}{2} = 0.5\).

7.4.1 - Formulas for Computing Binomial Probabilities

Here you will learn how to compute a binomial probability by hand. When possible, you should Minitab Express to perform these calculations. You will learn how to do this on the next page. 

The formula below is used to compute the probability of exactly k successes in n trials. 

Binomial Random Variable Probability

\[P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}\]

n = number of trials
k = number of successes
p = probability event of interest occurs on any one trial

\(\binom{n}{k}\) is the notation for a combination. For a review of combinations see the algebra review page in Lesson 0.

image of a red flowerExample: Red Flowers

Cross-fertilizing a red flower and a white flower produces red flowers 25% of the time. Now we cross-fertilize five pairs of red and white flowers and produce five offspring. Find the probability that there will be no red flowered plants in the five offspring.

The number of red flowered plants has a binomial distribution with n = 5, p = .25

We want to find \(P(X=0)\)

\(P(X=0)=\binom{5}{0}0.25^0(1-0.25)^{5-0}=0.237\)

There is a 23.7% chance that none of the five plants will be red flowered.

7.4.2 - Minitab Express: Binomial Probabilities

When possible you should use Minitab Express to compute binomial probabilities.  This is much more efficient than performing calculations by hand. 

Minitab Express can be used to construct binomial distributions and compute the probability of \(P(X=k)\):

Minitab logoUsing Minitab Express to Compute Binomial Probability

When flipping a fair coin the probability of flipping heads is 0.50.  Let's say that we are going to be flipping a fair coin 10 times and we want to know the probability of flipping heads exactly five times.

To compute a binomial probability in Minitab Express:

  1. On a PC: Select STATISTICS > Distribution Plot
    On a Mac: Select Statistics > Probability Distributions > Distribution Plot
  2. Select Display Probability 
  3. For Distribution select Binomial
  4. For Number of trials enter 10
  5. For Event probability enter 0.50
  6. Select A specified X value
  7. Select Middle
  8. For both X value 1  and X value 2 enter 5

The output is a binomial distribution plot. The area that is shaded in red is the proportion of the distribution where X=5.

\(P(X=5)=0.246094\)

distribution plot for binomial hwer n=10 and p=0.5

Video Review (no sound)

 

You can also use Minitab Express to find the probability of a range of outcomes:

Minitab logoUsing Minitab Express to Compute Binomial Probability for a Range of Outcomes

Let's say that we are going to be flipping a fair coin 10 times and we want to know the probability of flipping heads five or more times.

To compute a binomial probability for a range of outcomes in Minitab Express:

  1. On a PC: Select Statistics > Distribution Plot
    On a Mac: Select Statistics > Probability Distributions > Distribution Plot
  2. Select Display Probability
  3. For Distribution select Binomial
  4. For Number of trials enter 10
  5. For Event probability enter 0.50
  6. Select A specified X value
  7. Select Right tail
  8. For X value enter 5

The output is a binomial distribution plot displaying \(P(X \geq 5)=0.623047\)

distribution plot for binomial where n=10 and p=0.5

Video Review (no sound)

7.4.3 - Mean & Standard Deviation

Given the sample size (\(n\)) and the probability of a success (\(p\), we can compute the mean and standard deviation for a binomial random variable. The mean (\(\mu\)) is also known as the expected value (\(E(X)\)).

Mean of a Binomial Random Variable

\[\mu=np\]

Also known as \(E(X)\)

Standard Deviation of a Binomial Random Variable

\[\sigma=\sqrt {np(1-p)}\]

In both formulas \(n\) is the number of trials and \(p\) is the probability of success on any one trial.

image of a red flowerExample: Red Flowers

Cross-fertilizing a red flower and a white flower produces red flowers 25% of the time. Now we cross-fertilize five pairs of red and white flowers and produce five offspring. Find the mean and standard deviation for the number of red flowers. 

The number of red flowered plants has a binomial distribution with n = 5, p = 0.25

\(\mu=5(0.25) = 1.25\)

\(\sigma = \sqrt{5(0.25)(1-0.25)}=\sqrt{0.9375} =0.968\)

The mean number of red flowered plants when five plants are planted is 1.25 with a standard deviation of 0.968.

7.4.4 - Example: Multiple Choice Test

7.5 - Lesson 7 Summary

In this lesson we examined two commonly used probability distributions: the normal distribution and binomial distribution. In the next few lessons you will learn how these distributions are often used to construct confidence intervals and conduct hypothesis tests. 

Lesson 7 Learning Objectives

Upon successful completion of this lesson, you should be able to:

  • find the area under a normal distribution using Minitab Express and StatKey.
  • compute a test statistic for a normally distributed sampling distribution. 
  • construct a confidence interval at any level of confidence using a normally distributed sampling distribution.
  • compute the mean and standard deviation of a binomial distribution. 
  • find probabilities associated with a binomial distribution using Minitab Express.