Interquartile Range – Explanation and Examples
The interquartile range of a data set is the middle fifty percent of the data set found between the first and third quartiles.
This range tends to eliminate upper and lower outliers and focuses on the section where the majority of the data lies.
All subjects that use statistics, including math, economics, business, and all branches of science, use the interquartile range.
Before moving forward with this section, make sure to review quartiles and box and whisker plots.
This section covers:
- What is Interquartile Range?
- How to Find Interquartile Range?
- Interquartile Range Definition
- Interquartile Range Formula
What is Interquartile Range?
The interquartile range of a set of data is the middle fifty percent of the data. This data has an upper bound of the third quartile and a lower bound of the first quartile.
The interquartile range displays the middle section of the data, so it excludes upper and lower outliers. This makes it easy to focus on the more typical data points without exceptional distractions.
Note that the interquartile range is also known as the IQR, the midspread, or H-spread.
In terms of percentiles, the interquartile range is the 75th percentile minus the 25th percentile.
The IQR helps to identify outliers because data points that lie outside these values are considered untypically large or small.
In particular, a data point is an outlier if it is more than the third quartile value plus $1.5\times$IQR or if it is less than the first quartile value minus $1.5\times$ IQR.
How to Find Interquartile Range?
To find the interquartile range, find the difference between the third and first quartile of a data set.
Recall that quartiles are one-fourth sections of the data. The first quartile is the 25th percentile, the second quartile is the 50th percentile, and the third quartile is the 75th percentile. The second quartile is equal to the mean.
Additionally, recall that a percentile is a sort of ranking of data. If something is in the 60th percentile, it means that 60% of the data is below it and 40% is above it.
To find the quartiles, begin by finding the median. Then, find the “median” of the upper half of the data. This is the third quartile. Similarly, the “median” of the lower half of the data is the first quartile.
Finding the IQR and constructing a box and whisker plot, which shows the spread of a set of data, involve finding quartiles.
Interquartile Range Definition
The interquartile range is the middle $50%$ of a data set. This range typically excludes outliers. It is equal to the third quartile minus the first quartile, or the 75th percentile minus the 25th percentile.
Interquartile Range Formula
The formula for the interquartile range is $Q_3-Q_1$ where $Q_3$ is the third quartile or 75th percentile and $Q_1$ is the first quartile or 25th percentile.
Recall that $75%$ of the data is less than $Q_3$ and $25%$ of the data is less than $Q_1$. Therefore, $Q_3-Q_1$ represents 50% of the total data. To find these values, find the “median” of the median and the maximum value and the “median” of the median and the minimum value. These will be the third and first quartiles respectively.
This section covers common examples of problems involving the interquartile range and their step-by-step solutions.
The first quartile of a data set is $6$, the median is $8$, and the third quartile is $12$. What is the interquartile range of the data set?
Recall that an interquartile range is the difference between the third and first quartiles in a data set.
In this data set, the third quartile is $12$ and the first is $6$. Therefore, the interquartile range is $12-6 = 6$.
In this problem, the median does not matter and is a distractor.
Consider the data set:
$(100, 200, 200, 400, 500, 500, 500, 600, 700, 700, 700, 800, 800, 900, 900, 1000, 1000, 1200)$.
What is the interquartile range for this data set?
To find the IQR, begin by finding the first and third quartiles. This requires first finding the median.
Since there are $18$ values in the data set, the median is the average of the ninth and tenth values. In this case, however, both values are $700$, so the median and second quartile is $700$.
Then, the first quartile is the median of the first nine values. This equals the fifth value, which is $500$.
Similarly, the third quartile is the median of the last nine values. This is equal to the fourteenth value, which is equal to $900$.
Therefore, the interquartile range equals $Q_3-Q_1 = 900-500 = 400$.
Consider the data set:
$(1, 1, 1, 4, 4, 5, 9, 10, 12, 14 , 21, 27, 51)$.
What is the IQR of the data set? Is the value of $51$ an outlier?
Again, begin by finding the median or second quartile. Since this data set has thirteen data points, the median is the seventh data point, $9$.
Then, the first quartile is equal to the median of the first seven data points. This is equal to $4$. Likewise, the third quartile equals the median of the last seven data points. In this case, the third quartile is equal to $14$.
Since the interquartile range equals the third quartile minus the first quartile, the IQR is equal to $14-4=10$.
Recall now that an upper outlier is equal to $Q_3+1.5\times$IQR. In this case, that is $14+1.5\times10$ or $14+7.5=21.5$.
Since $51$ exceeds $21.5$, it is an outlier. Indeed, by this definition, $27$ is an outlier as well.
Suppose a data set contains at least five values.
How does changing the upper or lower value affect the IQR? What happens when the data set has fewer than five values?
The median or second quartile of a data set containing five values is the third value.
Then, the first quartile will equal to the second value while the third quartile will be equal to the fourth value.
In this case, therefore, changing the least or greatest value will not change the IQR. The second term does not depend on the first. Likewise, the fourth term does not depend on the fifth. Since the IQR is the difference of the fourth and second terms, its value does not depend on the least or greatest values of the data set.
If the data set has fewer than five values, say four values, a change in the largest and smallest data values affects the IQR.
In such a data set, the middle value is the average of the second and third values. Then, the first quartile will be the average of the first and second points while the third quartile will be the average of the third and fourth points.
Clearly, then the first quartile value depends on the lowest value while the third quartile value depends on the greatest value. Therefore, if there are fewer than four data points, the greatest and least values affect the IQR. Otherwise, however, they do not matter in the same way.
The first quartile of a data set is $50$ and the second quartile of the data set is $126$. Is the value $160$ an outlier? Why or why not?
Recall that an outlier is a number that is greater than the third quartile plus $1.5\times$IQR or less than the first quartile minus $1.5\times$IQR.
In this case, the value $160$ is greater than the third quartile, $126$. Therefore, one should test whether or not is an upper outlier.
The interquartile range equals $126-50=76$.
$1.5\times76 = 114$. Therefore, in order for a number to be an upper outlier, it must be greater than this.
Since $126$ exceeds $114$, it is an outlier.
- The first quartile of a data set is $49$, the median is $65$, and the third quartile is $81$. What is the IQR of this data set?
- Find the IQR for the data set $(15, 16, 19, 19, 21, 21, 21, 25, 29)$
- Construct a data set where the IQR is equal to zero, but not all of the terms in the data set are the same.
- Are there any outliers in the data set $(-16, -2, 3, 4, 5, 9, 12, 24, 25)$? Why or why not?
- A data set has a first quartile of $126$, a median of $452$, and a third quartile of $924$. Numbers under which value are considered lower outliers? Numbers above which value are considered upper outliers?
- (One option) $(-1, 0, 0, 0, 0, 0, 1)$
- IQR is $9$. $3-1.5(9) = -10.5$. $12+1.5(9) = 25.5$. Therefore, only $-16$ is an outlier.
- The IQR is $924-126 = 798$. $126-1.5(798) = -1071$. Lower outliers are less than this. $924+1.5(798) = 2121$. Upper outliers are greater than this.