Summary Statistics – Explanation and Examples

Summary statistics are numbers or words that describe a data set or data sets simply.

This includes measures of centrality, dispersion, and correlation as well as descriptions of the overall shape of the data set.

Summary statistics are used in all branches of math and science that employ statistics. These include probability, economics, biology, psychology, and astronomy.

Before moving on with this section, make sure to review measures of central tendency and standard deviation.

This section covers:

• What are Summary Statistics?
• How to Interpret Summary Statistics
• Summary Statistics Definition
• Summary Statistics Examples

What are Summary Statistics

Summary statistics are numbers or words that describe a data set as succinctly as possible.

These include measures of central tendency such as mean, median, and mode. They also include measures of dispersion such as range and standard deviation. Summary statistics for multivariate data sets may also include measures of correlation such as the correlation coefficient.

Descriptions of the overall data shape such as “normally distributed” or “skewed right” are also part of summary statistics.

Summary statistics give a small “snapshot” of a data set that is more approachable than large quantities of data and more easily generalized than random data points. Like the summary of a story, they analyze and describe even large data sets in just a few numbers and words.

How to Interpret Summary Statistics

It is best to interpret individual components of summary statistics in light of the other components.

In general, a larger range and larger standard deviation indicate a wider dispersion. A wider range with a smaller standard deviation indicates outliers.

Similarly, when it comes to measures of central tendency, a mean that is higher than the median indicates a skew to the right. Likewise, a mean that is less than the median indicates a skew to the left. If they are about the same, the data set is likely normally distributed.

Summary Statistics Definition

Summary statistics are measures of central tendency, dispersion, and correlation combined with descriptions of shape that provide a simple overview of a data set or data sets.

These measures can include, mean, median, mode, standard deviation, range, and correlation coefficient.

Summary Statistics Examples

One example of an important use for summary statistics is a census. In the United States, there are over $320$ million people. This means that a census includes a lot of data points. Since a census also usually includes information such as age, family size, address, occupation, etc., these are multivariate data points!

But, civil servants and politicians need to make decisions based on census results. The easiest way to do that is to provide decision makers with summary statistics of census results. These snapshots are easier to understand than a collection of $320$ million+ data points.

Common Examples

This section covers common examples of problems involving summary statistics and their step-by-step solutions.

Example 1

A data set has a mean of $200$, a median of $50$, a mode of $40$, and a range of $1500$. What do the summary statistics say about this data set?

Solution

The summary statistics for this data set indicate a strong skew to the right. This means that there is one or more upper outliers.

How do they show this?

Outlier have a strong effect on the mean of a data set but very little effect on the median. This means that upper outliers will increase the average while the median stays in place. In fact, it is the main reason for a discrepancy in the median and mean of a data set.

Clearly, there is a large difference between $50$ and $200$, especially in light of the fact that the mode is $40$. This means that half of the data points are more than $50$ and half are less with $40$ being the most commonly occurring term. It certainly does not fit with that to say that a typical term is $200$.

Likewise, the wide range indicates large values are possible.

Additional summary statistics that would paint a fuller picture are the highest and lowest values along with the standard deviation.

Example 2

Find the summary statistics for the following data set.

$(1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 6, 6, 7, 8, 9, 11, 13, 17, 25, 33)$

Solution

Common summary statistics include mean, median, mode, range, and standard deviation.

In this case, the mean is equal to:

$\frac{1(6)+2(3)+3(2)+4(2)+5+6(2)+7+8+9+11+13+17+25+33}{24} = \frac{166}{24} = \frac{83}{12}$.

This is about equal to $6.9167$.

The median in this case is equal to the average of the twelfth and thirteenth numbers. These are both four, however, so four is the median.

Since one appears more often than any other number, it is the mode.

These are the measures of central tendency. On the other hand, the common measures of dispersion are range and standard deviation.

The range is just equal to the largest number minus the smallest number. This is equal to $33-1 = 32$.

Standard deviation, however, is difficult to calculate. It is equal to:

$\sqrt{\frac{\sum_{i=1}^k (n_i – \mu)^2}{k}}$.

These calculations take a while. For larger data sets, it is often easier to use a standard deviation calculator.

Whether calculating by hand or with technology, however, the standard deviation is about $8.086.$

The total summary, then, is:

Mean: $6.9167$

Median: $4$

Mode: $1$

Range: $32$

Standard Deviation: $8.086$.

The summary statistics may also note that there are $24$ elements in the data set, with the largest value being $1$ and the smallest value being $33$.

Example 3

Consider the following data set:

$(85, 86, 88, 88, 90, 91, 94, 94, 96, 97, 98, 98, 98, 99, 99, 99, 99, 100, 100, 100, 100, 100, 100, 101, 101, 101, 102, 102, 102, 103, 103, 104, 104, 105, 106, 106, 108, 109, 110, 110, 110, 113, 115)$.

What are the summary statistics for this data set? What do these statistics say about the data set?

Solution

This data set has 43 data points. The highest value is $115$, while the lowest value is $85$. This means that the range is $115-85=30$.

The median of this data set is going to be the twenty-second term, which is $100$.

Likewise, the mode of the data set is $100$ because it appears more than any other value.

The mean of this data set is equal to:

$\frac{4314}{43}$. This is about equal to $100.3$.

Plugging the standard deviation into a standard deviation calculator reveals that it is approximately $6.9$.

Therefore, the summary statistics on this data set are:

Mean: $100.3$

Median: $100$

Mode: $100$

Range: $30$

Standard Deviation: $6.9$

Number of Terms: $43$

Highest Value: $115$

Lowest Value: $85$.

Based on these statistics, the data is probably normally distributed because all of the measures of central tendency are almost exactly equal.

Example 4

A shipping company weighs a sample of packages before they are sent out. They get the following results.

$(0.1, 0.1, 0.3, 0.5, 0.8, 0.9, 1.1, 1.2, 1.4, 1.5, 1.5, 1.5, 1.6, 1.7, 1.7, 1.8, 1.9, 2.1, 2.9, 3.3, 4.0, 5.3, 5.5, 6.8, 9.2, 21.8)$.

What are the summary statistics for the data? What do they say about the data in context?

Solution

The summary statistics for this data set are:

Number of Terms: $26$

Mode (most common value): $1.5$

Median (average of the thirteenth and fourteenth terms): $1.65$

Mean (sum of the terms divided by $26$): About $3.096$

Highest Value: $21.8$

Lowest Value: $0.1$

Range (difference of highest and lowest values): $21.7$

Standard Deviation (average variance from mean): $4.397$

In this data set, the median and mode are approximately the same, but the mean is a bit higher. It is not, however, a full standard deviation higher. This means that the data is slightly skewed to the right, but not too much. This is likely due to the presence of some outliers.

In context, this means that there are a few heavier packages that the company sends, but, for the most part, the packages weigh around $1.65$ pounds.

Practice Problems

1. A data set has standard deviation of $1$, a mean of $0$, a median of $4$ and a mode of $3.5$. What can be said about the data set?
2. Another data set is approximately normally distributed. It has a median of $16$ and a standard deviation of $3$. In what range do the median and mode likely fall?
3. Describe what the summary statistics would look like for a U-shaped data set.
4. Find the summary statistics for the following data set: $(-5, -4, -4, -3, -3, -3, -2, -2, -2, -1, -1, -1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 5)$.
5. A charity receives donations at an event. The donation amounts in dollars are: $(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 5, 5, 5, 5, 5, 5, 5, 5, 10, 10, 10, 10, 11, 12, 15, 15, 20, 20, 20, 20, 40, 40, 45, 50, 50, 50, 100, 200)$. Find the summary statistics for the donations and interpret them in context.

2. Since this data is normally distributed, the median and mode are likely within $3$ units in either direction of the mean. That is, they are likely in the range of $13$ to $19$.
4. Number of terms: $28$. Mean is about $-0.3214$, median is 0, and mode is $0$ an $1$. The range between the highest and lowest values of $5$ and $-5$ is $10$, and the standard deviation is about $2.405$. The data is approximately normally distributed.
5. There were $37$ donations averaging $21.08$ dollars. The most common donation was $5$ dollars, and the median donation was $10$. The range of donations was from $1$ to $200$, which means the range was $199$. In this case, the standard deviation was about $36.31$, which means that there was a lot of variance in the donation amount. The large difference between the mean and median donation indicates an outlier to the right, namely the $200$ dollar donation.