 # Class width – Explanation & Examples The definition of class width is:

“The class width is the difference between the upper or lower class limits of consecutive classes in a bin frequency table”.

In this topic, we will discuss the class width from the following aspects:

• What is the class width?
• How to find the class width?
• Class width formula.
• Role of class width.
• Practical questions.

## What is the class width?

The class width is the difference between the upper or lower class limits of consecutive classes in a bin frequency table.

The bin frequency table groups values into equal-sized bins or classes and each class includes a range of values.

The frequency of each class is the number of data points it has.

The boundaries of each class are called the lower-class limit and the upper-class limit, and the class width is the difference between the lower (or higher) limits of successive classes.

All classes should have the same width.

## How to find the class width?

We will go through an example for illustration.

Example 1

The following is the age (in years) of 50 participants from a certain survey.

 participant Age 1 70 2 56 3 37 4 69 5 70 6 40 7 66 8 53 9 43 10 70 11 54 12 42 13 54 14 48 15 68 16 48 17 42 18 35 19 72 20 70 21 70 22 48 23 56 24 74 25 57 26 52 27 58 28 62 29 56 30 68 31 70 32 46 33 35 34 56 35 50 36 48 37 47 38 60 39 63 40 71 41 43 42 65 43 38 44 64 45 73 46 54 47 67 48 58 49 62 50 70

What is the proper class width for a bin frequency table of this data?

1. Determine the number of bins or classes you need.

There are no hard rules about how many bins to pick, but there are some general guidelines:

• Pick between 5 and 20 classes.
• Make sure you have a few items in each bin. For example, if you have 40 data points, you can choose 5 bins (8 data points per category), but not 20 bins (which would give you only 2 data points per bin).
• Use the mathematical formula to choose the number of classes.

The formula is log(number of observations)/ log(2). You would round up the answer to the next integer.

For this data, log(50)/log(2) = 5.6 will be rounded up to become 6, so the number of classes should be 6.

1. Sort the data and subtract the minimum data value from the maximum data value to get the data range.

35 35 37 38 40 42 42 43 43 46 47 48 48 48 48 50 52 53 54 54 54 56 56 56 56 57 58 58 60 62 62 63 64 65 66 67 68 68 69 70 70 70 70 70 70 70 71 72 73 74.

In our age list, the minimum value is 35 and the maximum value is 74, so the data range = 74 – 35 = 39.

1. Divide the data range in Step 2 by the number of classes you get in Step 1.

Round the number you get up to a whole number to get the class width.

Class width = 39 / 6 = 6.5. Rounded up to 7.

1. Add the class width, 7, sequentially (6 times because we have 6 bins) to the minimum value to create the different 6 classes.

35 + 7 = 42 so the first class is 35-42.

42+7 = 49 so the next bin is 42-49.

49+7 = 56, so the next bin is 49-56.

56+7 = 63, so the next bin is 56-63.

63+7 = 70, so the next bin is 63-70.

70+7 = 77, so the next bin is 70-77.

1. We draw a table of 2 columns. The first column carries the different classes of the data that we created in step 4.

The second column contains the frequency of age values in each class.

 range frequency 35 – 42 7 42 – 49 8 49 – 56 10 56 – 63 7 63 – 70 14 70 – 77 4

We see that:

• The age bin “35-42” contains the ages from 35 to 42.
• The next age bin “42-49” contains the ages larger than 42 till 49, and so on.
• The class width is 7 for any two consecutive classes.
• For example, the first class is 35-42 with 35 as the lower limit and 42 as the upper limit. The next class is 42-49 with 42 as the lower limit and 49 as the upper limit. The class width = 42-35 = 49-42 = 7.
• If you sum these frequencies, you will get 50 which is the total number of data. 7+8+10+7+14+4 = 50.

We can then use this bin frequency table to plot a histogram of this data where we plot the data bins on a certain axis against their frequency on the other axis. We see that the most frequent bin is the 63-70 bin with 14 occurrences.

### Class width formula

From the above example, we see that the class width formula:

class width = data range/number of classes = (maximum – minimum)/number of classes

### Role of class width

By selecting the suitable class width according to the above guidelines, we can observe the data distribution.

Selecting too tight or too wide class width can result in poor representation of data distribution.

Example 1

The following bin frequency table is for the age (in years) of 21407 participants from a certain survey.

The suitable number of classes = log(21407)/log(2) = 14.39 or 15.

Data range = 89-18 = 71.

class width = 71/15 = 4.7 or 5.

 range frequency 18 – 23 1528 23 – 28 1912 28 – 33 2086 33 – 38 2134 38 – 43 2154 43 – 48 2117 48 – 53 2033 53 – 58 1783 58 – 63 1570 63 – 68 1219 68 – 73 961 73 – 78 817 78 – 83 585 83 – 88 360 88 – 93 148

and plot this bin frequency table as a histogram. We see that the most frequent bin is the 38-43 bin with 2154 occurrences.

If we use too tight class width as 2, we will get the following frequency table.

 range frequency 18 – 20 591 20 – 22 576 22 – 24 705 24 – 26 796 26 – 28 772 28 – 30 809 30 – 32 852 32 – 34 850 34 – 36 845 36 – 38 864 38 – 40 867 40 – 42 839 42 – 44 880 44 – 46 826 46 – 48 859 48 – 50 847 50 – 52 790 52 – 54 783 54 – 56 749 56 – 58 647 58 – 60 661 60 – 62 617 62 – 64 545 64 – 66 490 66 – 68 476 68 – 70 414 70 – 72 395 72 – 74 332 74 – 76 350 76 – 78 287 78 – 80 262 80 – 82 224 82 – 84 199 84 – 86 149 86 – 88 111 88 – 90 148

We see that the frequency table becomes too long with more than 20 bins and hard to grasp to get the data distribution.

If we plot this bin frequency table as a histogram. There are too many bins or classes and the data distribution is hard to see.

If we use a too wide class width of 36, we will get the following frequency table.

 range frequency 18 – 54 14351 54 – 90 7056

We see that the frequency table with 2 bins only, and hard to grasp to get the data distribution.

If we plot this bin frequency table as a histogram. With only two bins, we have no idea about the data distribution.

Example 2

The following bin frequency table is for the physical activity (in Kcal/week) of 2206 participants from a certain survey.

The suitable number of classes = log(2206)/log(2) = 11.1 or 12.

Data range = 5083.2-0 = 5083.2.

class width = 5083.2/12 = 423.6 or 424.

 range frequency 0 – 424 1442 424 – 848 563 848 – 1272 145 1272 – 1696 26 1696 – 2120 19 2120 – 2544 2 2544 – 2968 2 2968 – 3392 2 3392 – 3816 2 3816 – 4240 2 4240 – 4664 0 4664 – 5088 1

and plot this bin frequency table as a histogram. We see that the most frequent bin is the 0-424 bin with 1442 occurrences.

If we use too tight class width of 100, we will get the following frequency table.

 range frequency 0 – 100 335 100 – 200 373 200 – 300 380 300 – 400 288 400 – 500 239 500 – 600 155 600 – 700 121 700 – 800 84 800 – 900 57 900 – 1000 48 1000 – 1100 33 1100 – 1200 30 1200 – 1300 9 1300 – 1400 9 1400 – 1500 4 1500 – 1600 7 1600 – 1700 4 1700 – 1800 9 1800 – 1900 6 1900 – 2000 3 2000 – 2100 1 2100 – 2200 0 2200 – 2300 1 2300 – 2400 0 2400 – 2500 1 2500 – 2600 0 2600 – 2700 1 2700 – 2800 1 2800 – 2900 0 2900 – 3000 0 3000 – 3100 0 3100 – 3200 1 3200 – 3300 1 3300 – 3400 0 3400 – 3500 1 3500 – 3600 0 3600 – 3700 0 3700 – 3800 1 3800 – 3900 0 3900 – 4000 0 4000 – 4100 0 4100 – 4200 0 4200 – 4300 2 4300 – 4400 0 4400 – 4500 0 4500 – 4600 0 4600 – 4700 0 4700 – 4800 0 4800 – 4900 0 4900 – 5000 0 5000 – 5100 1

We see that the frequency table becomes too long with more than 20 bins and hard to interpret to get the data distribution.

If we plot this bin frequency table as a histogram. There are too many bins or classes and the class width is hard to see.

If we use too wide class width as 2600, we will get the following frequency table.

 range frequency 0 – 2600 2197 2600 – 5200 9

We see that the frequency table is with 2 bins only, and hard to grasp to get the data distribution.

If we plot this bin frequency table as a histogram. With only two bins, we have no idea about the data distribution.

### Practical questions

1. The following information is related to some price data.

The number of observations = 53940.

Minimum = $326. Maximum =$18823.

What is the suitable class width for this data?

1. The following information is related to some diamond weights.

The number of observations = 53940.

Minimum = 0.2 grams.

Maximum = 5.01 grams.

What is the suitable class width for this data?

1. The following bin frequency table is for the wind speed of some storms (in knots).
 range frequency 10 – 21 287 21 – 32 2258 32 – 43 1727 43 – 54 1575 54 – 65 1678 65 – 76 812 76 – 87 492 87 – 98 402 98 – 109 242 109 – 120 329 120 – 131 117 131 – 142 52 142 – 153 32 153 – 164 7

What is the most frequent bin?

Is this data skewed data?

1. The following is the bin frequency table for some Ozone measurements.
 range frequency 1 – 57 83 57 – 113 28 113 – 169 5

Is the class width suitable for this data?

Can you determine a more suitable number of classes for this data?

1. The following is the bin frequency table for some solar radiation measurements.
 range frequency 0 – 100 34 100 – 200 37 200 – 300 66 300 – 400 9 400 – 500 0 500 – 600 0 600 – 700 0 700 – 800 0 800 – 900 0 900 – 1000 0

What is wrong with this table?

Can you determine a more appropriate class width if you know that the data range is 327?

1. The recommended number of bins or classes = log(53940)/log(2) = 15.7 rounded up to 16.

The data range = 18823-326 = 18497.

The class width = 18497/16 =  1156.062 rounded up to 1157.

1. The recommended number of bins or classes = log(53940)/log(2) = 15.7 rounded up to 16.

The data range = 5.01-0.2 = 4.81.

The class width = 4.81/16 =  0.300625 rounded up to 0.31.

1. The most frequent bin is “21-32” with 2258 occurrences.

This data is right-skewed because it is clustered at small values and large values have a much lower frequency.

1. There are only 3 classes while there should be 5-20 classes.

The suitable number of classes = log(number of observations)/log(2) = log(83+28+5)/log(2) = 6.86 rounded up to 7.

1. The bin frequency table has many empty bins at its end. These can be deleted to not confuse the reader and the table should be:
 range frequency 0 – 100 34 100 – 200 37 200 – 300 66 300 – 400 9

The recommended number of bins or classes = log(34+37+66+9)/log(2) = 7.19 rounded up to 8.

The data range = 327.

The suitable class width = 327/8 =  40.88 rounded up to 41.