# Sampling Distribution of the Sample Mean, x-bar

The **central limit theorem **states that if a large enough sample is taken (typically *n* > 30) then the sampling distribution of \(\bar{x}\) is approximately a normal distribution with a mean of μ and a standard deviation of \(\frac {\sigma}{\sqrt{n}}\). Since in practice we usually do not know μ or σ we estimate these by \(\bar{x}\) and \( \frac {s}{\sqrt{n}}\) respectively. In this case *s * is the estimate of σ and is the standard deviation of the sample. The expression \( \frac {s}{\sqrt{n}}\) is known as the standard error of the mean, labeled **SE(**\(\bar{x}\)**) **

Simulation: Generate 500 samples of size heights of 4 men. Assume the distribution of male heights is normal with mean *μ* = 70" and standard deviation σ = 3.0". Then find the mean of each of 500 samples of size 4.

Here are the first 10 sample means:

70.4 72.0 72.3 69.9 70.5 70.0 70.5 68.1 69.2 71.8

Theory says that the mean of ( \(\bar{x}\) ) = *μ* = **70 ** which is also the Population Mean and \(SE(\bar{x})=\frac {\sigma}{\sqrt{n}}=\frac{3}{\sqrt{4}}=1.50\)

Simulation shows: Average (500 \(\bar{x}\)'s) = **69.957 ** and SE(of 500 \(\bar{x}\)'s) = **1.496**

Change the sample size from *n* = 4 to *n* = 25 and get descriptive statistics:

** **

Theory says that the mean of ( \(\bar{x}\)) = *μ* = **70 ** which is also the Population Mean and \(SE(\bar{x})=\frac {\sigma}{\sqrt{n}}=\frac{3}{\sqrt{25}}=0.60\)

Simulation shows: Average (500 \(\bar{x}\)'s) = **69.983 **and SE(of 500 \(\bar{x}\)'s) = **0.592**

**Sampling Distribution of ****Sample Mean **\(\bar{x}\) **from a Non-Normal Population**

Simulation: Below is a Histogram of Number of Cds Owned by PSU Students. The distribution is strongly skewed to the right.

** **

Assume the Population Mean Number of CDs owned is *μ* = 84 and σ = 96

Let's obtain 500 samples of size 4 from this population and look at the distribution of the 500 x-bars:

Theory says that the mean of ( \(\bar{x}\)) = *μ* = **84 ** which is also the Population Mean the \(SE(\bar{x})= 48=\frac{96}{\sqrt{4}}\)

Simulation shows Average(500 \(\bar{x}\)'s) = **81.11 **and SE(500 \(\bar{x}\)'s for samples of size 4) = **45.1**

Change the sample size from *n* = 4 to *n* = 25 and get descriptive statistics and curve:

Theory says that the mean of ( \(\bar{x}\)) = *μ* = **84 ** which is also the Population Mean and the \(SE(\bar{x})=\frac {96}{\sqrt{25}}=19.2\) Simulation shows Average(500 \(\bar{x}\)'s) = 83.281 and SE(500 \(\bar{x}\)'s for samples of size 25) = 18.268. A histogram of the 500 \(\bar{x}\)'s computed from samples of size 25 is beginning to look a lot like a normal curve.

**i. The Law of Large Numbers ** says that as the sample size increases the sample mean will approach the population mean.

**ii. The Central Limit Theorem ** says that as the sample size increases the sampling distribution of \(\bar{X}\) (read x-bar) approaches the normal distribution. We see this effect here for *n* = 25. Generally, we assume that a sample size of *n* = 30 is sufficient to get an approximate normal distribution for the distribution of the sample mean.

**iii. ****The Central Limit Theorem **is important because it enables us to calculate probabilities about sample means.

Example. Find the approximate probability that the average number of CDs owned when 100 students are asked is between 70 and 90.

Solution. Since the sample size is greater than 30, we assume the sampling distribution of \(\bar{x}\) is about normal with mean *μ* = 84 and \(SE(\bar{x})=\frac{\sigma}{\sqrt{n}}=\frac{96}{\sqrt{100}}=9.6\). We are asked to find Prob( 70 < \(\bar{X}\) < 90). The z-scores for the two values are

for 90: z = (90 - 84)/ 9.6 = 0.625 and for 70: z = (70-84)/9.6 = -1.46. From tables of the normal distribution we get P( -1.46 < Z < 0.625) = .734 - .072 = .662.

Suppose the sample size was 1600 instead of 100. Then the distribution of \(\bar{x}\) would be about normal with mean 84 and standard deviation \(\frac{\sigma}{\sqrt{n}}=\frac{96}{\sqrt{1600}}= \frac{96}{40}=2.4\). From the empirical rule we know that almost all x-bars for samples of size 1600 will be in the interval 84 ± (3)(2.4) or in the interval 84 ± 7.2 or between 76.8 and 91.2. The Law of Large Numbers says that as we increase the sample size the probability that the sample mean approaches the population mean is 1.00!