Sampling Distribution of the Sample Mean, x-bar

Printer-friendly versionPrinter-friendly version

The central limit theorem states that if a large enough sample is taken (typically n > 30) then the sampling distribution of \(\bar{x}\) is approximately a normal distribution with a mean of μ and a standard deviation of \(\frac {\sigma}{\sqrt{n}}\). Since in practice we usually do not know μ or σ we estimate these by \(\bar{x}\) and \( \frac {s}{\sqrt{n}}\) respectively. In this case s is the estimate of σ and is the standard deviation of the sample. The expression \( \frac {s}{\sqrt{n}}\) is known as the standard error of the mean, labeled SE(\(\bar{x}\))

Simulation: Generate 500 samples of size heights of 4 men. Assume the distribution of male heights is normal with mean μ = 70" and standard deviation σ = 3.0". Then find the mean of each of 500 samples of size 4.

Here are the first 10 sample means:

70.4    72.0    72.3    69.9    70.5    70.0    70.5    68.1    69.2    71.8

Listed are descriptive statistics for x bar when sample size is 4, including variable names, number of measurements, mean, median, standard deviation, standard error of mean, minimum and maximum etc. Note that mean is 69.957, and standard deviation is 1.496.

Theory says that the mean of ( \(\bar{x}\) ) = μ = 70 which is also the Population Mean and \(SE(\bar{x})=\frac {\sigma}{\sqrt{n}}=\frac{3}{\sqrt{4}}=1.50\)

Simulation shows: Average (500 \(\bar{x}\)'s) = 69.957 and SE(of 500 \(\bar{x}\)'s) = 1.496

Change the sample size from n = 4 to n = 25 and get descriptive statistics:

In the histogram of number of cds owned by psu student, the peak appears at the left side and the curve is strongly right skewed.

Theory says that the mean of ( \(\bar{x}\)) = μ = 70 which is also the Population Mean and \(SE(\bar{x})=\frac {\sigma}{\sqrt{n}}=\frac{3}{\sqrt{25}}=0.60\)

Simulation shows: Average (500 \(\bar{x}\)'s) = 69.983 and SE(of 500 \(\bar{x}\)'s) = 0.592

Sampling Distribution of Sample Mean \(\bar{x}\) from a Non-Normal Population

Simulation: Below is a Histogram of Number of Cds Owned by PSU Students. The distribution is strongly skewed to the right.

histogram of CD's, with normal curve 

Assume the Population Mean Number of CDs owned is μ = 84 and σ = 96

Let's obtain 500 samples of size 4 from this population and look at the distribution of the 500 x-bars:

In the histogram of average number of cds for every four students, the peak moves a little bit  to the middle,  and the curve is still right skewed.

For x bar with samples of size 4, the total number of measurements is 500. Its mean and standard deviation are 81.11 and 45.12, respectively.

Theory says that the mean of ( \(\bar{x}\)) = μ = 84 which is also the Population Mean the \(SE(\bar{x})= 48=\frac{96}{\sqrt{4}}\)

Simulation shows Average(500 \(\bar{x}\)'s) = 81.11 and SE(500 \(\bar{x}\)'s for samples of size 4) = 45.1

Change the sample size from n = 4 to n = 25 and get descriptive statistics and curve:

In the histogram of average number of cds for every 25 students, the peak almost locates in the middle, and the curve is pretty close to the normal.

For x bar with samples of size 25, the total number of measurements is 500. Its mean and standard deviation are 83.281 and 18.268, respectively.

Theory says that the mean of ( \(\bar{x}\)) = μ = 84 which is also the Population Mean and the \(SE(\bar{x})=\frac {96}{\sqrt{25}}=19.2\) Simulation shows Average(500 \(\bar{x}\)'s) = 83.281 and SE(500 \(\bar{x}\)'s for samples of size 25) = 18.268. A histogram of the 500 \(\bar{x}\)'s computed from samples of size 25 is beginning to look a lot like a normal curve.

i. The Law of Large Numbers says that as the sample size increases the sample mean will approach the population mean.

ii. The Central Limit Theorem says that as the sample size increases the sampling distribution of \(\bar{X}\) (read x-bar) approaches the normal distribution. We see this effect here for n = 25. Generally, we assume that a sample size of n = 30 is sufficient to get an approximate normal distribution for the distribution of the sample mean.

iii. The Central Limit Theorem is important because it enables us to calculate probabilities about sample means.

Example. Find the approximate probability that the average number of CDs owned when 100 students are asked is between 70 and 90.

Solution. Since the sample size is greater than 30, we assume the sampling distribution of \(\bar{x}\) is about normal with mean μ = 84 and \(SE(\bar{x})=\frac{\sigma}{\sqrt{n}}=\frac{96}{\sqrt{100}}=9.6\). We are asked to find Prob( 70 < \(\bar{X}\) < 90). The z-scores for the two values are

for 90: z = (90 - 84)/ 9.6 = 0.625 and for 70: z = (70-84)/9.6 = -1.46. From tables of the normal distribution we get P( -1.46 < Z < 0.625) = .734 - .072 = .662.

Suppose the sample size was 1600 instead of 100. Then the distribution of \(\bar{x}\) would be about normal with mean 84 and standard deviation \(\frac{\sigma}{\sqrt{n}}=\frac{96}{\sqrt{1600}}= \frac{96}{40}=2.4\). From the empirical rule we know that almost all x-bars for samples of size 1600 will be in the interval 84 ± (3)(2.4) or in the interval 84 ± 7.2 or between 76.8 and 91.2. The Law of Large Numbers says that as we increase the sample size the probability that the sample mean approaches the population mean is 1.00!