6.2 - Sampling Distribution of the Sample Mean

Printer-friendly versionPrinter-friendly version

To review, the Central Limit Theorem states that if a large enough sample is taken (typically \(n\geq 30\)) then the sampling distribution of \(\bar{x}\) is approximately a normal distribution with a mean of μ and a standard deviation of \(\frac {\sigma}{\sqrt{n}}\).

Since in practice we usually do not know μ or σ we estimate these by \(\bar{x}\) and \( \frac {s}{\sqrt{n}}\) respectively. In this case s is the estimate of σ and is the standard deviation of the sample. The expression \( \frac {s}{\sqrt{n}}\) is known as the standard error of the mean, labeled \(SE(\overline{x})\) or \(s_{\overline{x}}\).

Standard Error of the Mean

\[SE(\overline{x})= \frac {\sigma}{\sqrt{n}}\]

If \(\sigma\) is unknown, estimate \(\sigma\) using \(s\)

In lesson 5 we learned that according to the Law of Large Numbers, if a large number of trials are performed, the mean of those trials will be approximately equal to the expected value (i.e., mean). We will apply the Law of Large Numbers in the following examples in which we simulate pulling many random samples from a population.  In each of these examples we have the populations and are drawing simple random samples of a consistent size (\(n\)).The statistics computed from these samples can be compared to the values computed using the Central Limit Theorem which states that the mean of a distirbution of sample means will equal \(\mu\) with a standard deviation of \(\frac {\sigma}{\sqrt{n}}\).

Below are a few more examples of simulations that draw a large number of random samples from a known population. These examples show that the mean of a distribution of sample means is approximately equal to the mean of the population and that the standard deviation of a distirbution of sample means is approximately equal to \(\frac {\sigma}{\sqrt{n}}\).

 

Example

Simulating a Distribution of Sample Means from a Normal Population

 

Let's generate 500 samples of size heights of 4 men. Assume the distribution of male heights is normal with \(\mu=70"\) and \(\sigma=3"\). We will find the mean of each of 500 samples of \(n=4\).

Here are the first 10 sample means:

70.4    72.0    72.3    69.9    70.5    70.0    70.5    68.1    69.2    71.8

Listed are descriptive statistics for x bar when sample size is 4, including variable names, number of measurements, mean, median, standard deviation, standard error of mean, minimum and maximum etc. Note that mean is 69.957, and standard deviation is 1.496.

Theory says that the mean of ( \(\bar{x}\) ) \(=\mu=70\) which is also the Population Mean and \(SE(\bar{x})=\frac {\sigma}{\sqrt{n}}=\frac{3}{\sqrt{4}}=1.50\)

Our simulation shows, (500 \(\bar{x}\)'s) = 69.957 and SE(of 500 \(\bar{x}\)'s) = 1.496. These values are similar to what was predicted by our equations. Our sample sizes were small \(n<30\). In the next example, let's increase our sample size.


What if we had a larger sample size?

Let's change the sample size from \(n=4\) to \(n=25\) and get descriptive statistics for 500 sample again:

In the histogram of number of cds owned by psu student, the peak appears at the left side and the curve is strongly right skewed.

Theory says that the mean of ( \(\bar{x}\)) \(=\mu=70\) which is also the Population Mean and \(SE(\bar{x})=\frac {\sigma}{\sqrt{n}}=\frac{3}{\sqrt{25}}=0.60\)

Our simulation shows, (500 \(\bar{x}\)'s) = 69.983 and SE(of 500 \(\bar{x}\)'s) = 0.592. Again, the resulrs of our simulation were similar to the results predicted by the equations.

Example

Simulating a Distribution of Sample Means from a Non-Normal Population

 

In the previous example, the height of men was normally distributed. Here we will look at an example in which the population is not normally distributed.

Simulation: Below is a histogram of the number of CDs owned by the population of Penn State Students. The distribution is strongly skewed to the right.

histogram of CD's, with normal curve 

In the population, \(\mu=84\) and \(\sigma=96\)

Let's obtain 500 samples of size 4 from this population and look at the distribution of the 500 x-bars:

In the histogram of average number of cds for every four students, the peak moves a little bit  to the middle,  and the curve is still right skewed.

For x bar with samples of size 4, the total number of measurements is 500. Its mean and standard deviation are 81.11 and 45.12, respectively.

Theory says that the mean of this distribution should be \(\mu=84\) which is also the Population Mean the \(SE(\overline{x})=\frac{96}{\sqrt{4}}=48\)

Simulation shows Average(500 \(\bar{x}\)'s) = 81.11 and SE(500 \(\bar{x}\)'s for samples of size 4) = 45.1

 

What if we had a larger sample size?

Change the sample size from \(n=4\) to \(n=25\) and get descriptive statistics and curve:

In the histogram of average number of cds for every 25 students, the peak almost locates in the middle, and the curve is pretty close to the normal.

For x bar with samples of size 25, the total number of measurements is 500. Its mean and standard deviation are 83.281 and 18.268, respectively.

Theory says that the mean of this distribution should equal \(\mu=84\) which is also the Population Mean and the \(SE(\bar{x})=\frac {96}{\sqrt{25}}=19.2\)

The simulation shows Average(500 \(\bar{x}\)'s) = 83.281 and SE(500 \(\bar{x}\)'s for samples of size 25) = 18.268. A histogram of the 500 \(\bar{x}\)'s computed from samples of size 25 is beginning to look a lot like a normal curve.