In a **dotplot** each observation is represented as a dot.

A **stemplot** displays the actual digits of the data in a data set. For example two digit numbers might be displayed with the first digit being the stems and the second digit being the leaves. **Stemplots** are used for small data sets.

The shape of the distribution of a measurement variable is provided by a **histogram** that puts counts or proportions of values (or percent of values) that fall into each of many classes of equal width.

A **symmetric** histogram is one that has equal percentage of data as mirror images on either side of the middle.

A **skewed** histogram may be **skewed** to the right (as in the picture) with most of the observations on the low side and a few outliers on the high end or **skewed** to the left with most of the data on the high side and a few outliers on the low end.

A **boxplot** is a picture showing the values of the five-number summary – with a box framing the distance from \(Q_L\) to \(Q_U\) and the median shown as a line within the box and whiskers coming out of the box going down to the minimum and from the top of the box to the maximum.

The **mean** of a data set is the numerical average.

The **median** of a data set is the 50th percentile (the middle value when the numbers are put in order).

A **percentile** gives the value for which a specific percentage of the data are above and below it. For example, the 30th percentile has 30% of the numbers on a list below it and 70% above it.

The **five number summary** of a data set is composed of the minimum , the lower quartile (\(Q_L\)), the median, the upper quartile (\(Q_U\)), and the maximum value. The **five-number summary** provides the values needed to make a boxplot.

The **lower quartile** (also called the first quartile and abbreviated as \(Q_L\)) is the 25th percentile in a data set. Thus, 25% of the values on the list of data would fall below \(Q_L\) and 75% would be larger than \(Q_L\).

The **upper quartile** (also called the third quartile and abbreviated as \(Q_U\)) is the 75th percentile in a data set. Thus, 75% of the values on the list of data would fall below \(Q_U\) and 25% would be larger than \(Q_U\).

The **interquartile range** (abbreviated as IQR) is the difference between the upper quartile and the lower quartile in a data set so \(IQR = Q_U - Q_L\). The IQR provides a resistant measure of the variability of a set of data.

An **outlier** is a data point that falls far outside the pattern seen in other points in the data set. **Outliers** can have a big effect on sensitive statistics such as the mean and the standard deviation.

The **standard deviation** (SD) measures how far data values differ from the mean. If there was no variability, every measurement would be the same – all being the mean value - and the **standard deviation** would be zero. As a rule of thumb, about 68% of the values in a symmetric histogram come within one **standard deviation** of the mean.

A **sensitive measure** is one that is highly affected by outliers. As examples, the mean is a **sensitive measure** of location along the number line and the standard deviation is a **sensitive measure** of variability (the range is the most **sensitive measure** of variability).

A **resistant measure** is one that is not affected very much by outliers. As examples, the median is a **resistant measure** of location along the number line and the IQR is a **resistant measure** of variability.