Mode statistics – Explanation & Examples

Mode statisticsThe definition of mode is: “Mode is the most frequent value in a set of data values”

In this topic, we will discuss the mode from the following aspects:

  • What is the mode in statistics?
  • The role of mode value in statistics
  • How to find the mode of a set of numbers?
  • How to find the mode of a set of strings or characters?
  • Exercises
  • Answers

What is the mode in statistics?

The mode is the value that appears most frequently in a set of data values.

If these data values are a set of numbers so the mode, in that case, is the number that has the highest number of occurrences. For example, if we have a set of numbers, 1,1,2,2,3,3,4,4,4,5,6,7,8,9,9,10, the mode will be 4 because 4 has the highest number of occurrences which is 3 times.

This can be easily shown if we plot a simple dot plot of this data.

Simple plot dot of the data

Here, we see that 4 has occurred 3 times, 1,2,3, and 9 have occurred 2 times, and all other values have occurred only 1 time. Therefore, the mode of this data is 4.

Let’s look at another example, if we have a data set of salaries for a number of managers in USA, in $1,000, these salaries are:

100,200,300,150,200,250,300,350,400,400,500,550,600,100,150,300,300

By plotting the data as a dotplot, we could easily see that the mode is 300.

Plotting the data as a dotplot

Here we see that the most frequent number is 300 (or $300,000) as it has occurred 4 times in this data.

But what about strings, categories, or character data sets? The same rule applies. In that case, the string or category with the highest number of occurrences will be the mode of that data.

For example, we have a set of student names in a certain statistical class. These names are: “John”, “Jan”, “Sam”, “Ali”, “Alice”, “Emmy”, “Ann”, “John”, “Ali”, “John”.

Mode of data of names

Here, we see that the mode of this data is the name “John” as it has occurred 3 times which is the maximum number of occurrences in this data.

The role of mode value in statistics

The mode is a type of summary statistics used to give important information about a certain data or population.

For the example of the data set of salaries, the mode is 300,000, so we know that $300,000 is the most frequent salary for these managers. In the other example of student names, by knowing that the mode is “John”, so we know that “John” is the most frequent name in this class.

The mode is not necessarily unique to a given data, since certain numbers or categories may occur the same maximum value. In that case, the data is called multimodal data as opposed to unimodal data with only one unique mode.

A common example of multimodal data when you have a mixed population. For example, if you have data of individual heights from a certain school, the data obtained, mostly, will be bimodal with one mode for students and the other mode for teachers.

How to find the mode of a set of numbers?

The mode of a certain set of numbers can be found graphically, using a frequency table, or by mlv (most likely value) function from the modeest package of R programming language.

Example 1

The following is the age (in years) of 100 different individuals from a certain survey in Spain:

70 56 37 69 70 40 66 53 43 70 54 42 54 48 68 48 42 35 72 70 70 48 56 74 57

52 58 62 56 68 70 46 35 56 50 48 47 60 63 71 43 65 38 64 73 54 67 58 62 70

58 49 67 52 47 44 59 67 47 70 35 43 66 68 59 61 35 73 58 36 50 67 58 67 72

52 68 38 61 50 59 35 39 43 61 43 68 47 63 65 59 72 74 70 48 40 37 53 57 38

What is the mode of this data?

1.Graphical method

Where we plot the data values on a certain axis against their frequency on the other axis.

Graphical method of mode of a set of numbers

Frequency table of mode data

The different plots show that the mode is 70 because it has the maximum occurrences in this data (9 times).

2.Frequency table

Where we tabulate the data values in one column and their frequency in another column.

Age

Frequency

35

5

36

1

37

2

38

3

39

1

40

2

42

2

43

5

44

1

46

1

47

4

48

5

49

1

50

3

52

3

53

2

54

3

56

4

57

2

58

5

59

4

60

1

61

3

62

2

63

2

64

1

65

2

66

2

67

5

68

5

69

1

70

9

71

1

72

3

73

2

74

2

The frequency table shows, also, that the mode is 70 because it has the maximum occurrences in this data (9 times).

3.mlv function of R

Both graphical and tabular methods can be problematic when we have a large number of unique data values. The mlv function, from the modeest package, solves this by giving the mode of large data using only one line of code.

These 100 numbers were the first 100 age numbers of the R built-in regicor dataset from the compareGroups package.

We begin our R session by activating the modeest and compareGroups packages. Then, we use the data function to import the regicor data into our session.

Finally, we create a vector called x that will hold the first 100 values of the age column (using the head function) from the regicor data and then using the mlv function to obtain the mode of these 100 numbers which is 70.

# activating the modeest and compareGroups packages

library(modeest)

library(compareGroups)

data(“regicor”)

# reading the data into R by creating a vector that holds these values

x<- head(regicor$age,100)

x

##   [1] 70 56 37 69 70 40 66 53 43 70 54 42 54 48 68 48 42 35 72 70 70 48 56 74 57
##  [26] 52 58 62 56 68 70 46 35 56 50 48 47 60 63 71 43 65 38 64 73 54 67 58 62 70
##  [51] 58 49 67 52 47 44 59 67 47 70 35 43 66 68 59 61 35 73 58 36 50 67 58 67 72
##  [76] 52 68 38 61 50 59 35 39 43 61 43 68 47 63 65 59 72 74 70 48 40 37 53 57 38

mlv(x)

## [1] 70

Example 2

The following is the first 100 systolic blood pressures (sbp) (in mmHg) from regicor data

138 139 132 168 NA 108 120 132 95 142 130 99 117 105 158 114 128 111 155

195 132 112 124 164 146 158 139 94 129 132 160 104 110 118 110 114 147 119

184 132 106 147 118 126 140 152 145 116 139 142 150 121 130 158 108 116 135

147 110 146 100 132 138 142 136 98 122 164 112 122 126 131 113 120 132 111

142 132 148 158 134 122 132 129 134 110 126 133 182 108 150 150 114 138 150

126 107 145 142 140

  • NA holds for not available

What is the mode of this data?

1.Graphical method

Graphic of mlv function of R

Frequency table of mlv function of R

2.Frequency table

Blood pressure

Frequency

94

1

95

1

98

1

99

1

100

1

104

1

105

1

106

1

107

1

108

3

110

4

111

2

112

2

113

1

114

3

116

2

117

1

118

2

119

1

120

2

121

1

122

3

124

1

126

4

128

1

129

2

130

2

131

1

132

9

133

1

134

2

135

1

136

1

138

3

139

3

140

2

142

5

145

2

146

2

147

3

148

1

150

4

152

1

155

1

158

4

160

1

164

2

168

1

182

1

184

1

195

1

3.mlv function of R

# reading the data into R by creating a vector that holds these values

x<- head(regicor$sbp,100)

x

##   [1] 138 139 132 168  NA 108 120 132  95 142 130  99 117 105 158 114 128 111
##  [19] 155 195 132 112 124 164 146 158 139  94 129 132 160 104 110 118 110 114
##  [37] 147 119 184 132 106 147 118 126 140 152 145 116 139 142 150 121 130 158
##  [55] 108 116 135 147 110 146 100 132 138 142 136  98 122 164 112 122 126 131
##  [73] 113 120 132 111 142 132 148 158 134 122 132 129 134 110 126 133 182 108
##  [91] 150 150 114 138 150 126 107 145 142 140

mlv(x)

## [1] 132

From three methods, the mode is 132 mmHg.

How to find the mode of a set of strings or characters?

Similarly, the mode of a certain set of characters can be found graphically, using a frequency table, or by the mlv (most likely value) function from the modeest package of R programming language.

Example 1:

You have some baby names

“Linda” “Linda” “James” “Robert” “Robert” “James” “John” “James”

“James” “James” “James” “Robert” “Robert” “James” “Robert” “David”

“James” “Robert” “James” “David” “Robert” “James” “David” “James”

“James” “Robert” “David” “Robert” “Robert” “Robert” “Robert” “John”

“John” “David” “John”

What is the mode of this data?

1.Graphical methods

Graphical method of mode of a certain set of characters

Frequency table of mode of a certain set of characters

2.Frequency table

Name

Frequency

David

5

James

12

John

4

Linda

2

Robert

12

3.mlv function of R

# reading the data into R by creating a vector that holds these values

x<- c(“Linda”,  “Linda”, “James”,  “Robert”, “Robert”,  “James”,  “John”,
    
     “James”, “James”,  “James”,  “James”,  “Robert”, “Robert”, “James”,
    
     “Robert”, “David”, “James”,  “Robert”, “James”,  “David”,  “Robert”,
    
     “James”,  “David”,  “James”, “James”,  “Robert”, “David”,  “Robert”,
    
     “Robert”, “Robert”, “Robert”, “John” ,”John” ,”David”,  “John”)

x

##  [1] “Linda”  “Linda”  “James”  “Robert” “Robert” “James”  “John”   “James”
##  [9] “James”  “James”  “James”  “Robert” “Robert” “James”  “Robert” “David”
## [17] “James”  “Robert” “James”  “David”  “Robert” “James”  “David”  “James”
## [25] “James”  “Robert” “David”  “Robert” “Robert” “Robert” “Robert” “John” 
## [33] “John”   “David”  “John”

mlv(x)

## [1] “James”  “Robert”

The mode of this data is “James” and “Robert” as they both have occurred 12 times and this is the maximum number of occurrences. This is an example of multimodal or bimodal data.

Exercises

1.The air quality data contains some daily measurements of Ozone (ppb) in New York on certain days of 1977, what is the mode of these measurements?

2.The air quality data contains also some daily measurements of Solar radiation (lang), what is the mode of these measurements?

3.These air quality measurements were made in specific months. What is the mode of the month values?

4.Which of these examples (1,2, or 3) are an example of unimodal or multimodal data?

5.The regicor data contains some age values (in years) from certain Spanish individuals, what is the mode of these values

Answers

1.The air quality data is a built-in data in R. So we import the data using the data function the create a vector to hold the ozone measurements and then use the mlv function. Here, we add another argument to the function, na.rm, to remove NA values from this data and give us the mode value

data(“airquality”)

x<-airquality$Ozone

mlv(x, na.rm = TRUE)

## [1] 23

So the mode is 23 ppb.

2.The same steps apply

x<-airquality$Solar.R

mlv(x, na.rm = TRUE)

## [1] 238 259

So the mode is 238 and 259 lang.

3.The same steps apply

x<-airquality$Month

mlv(x,na.rm = TRUE)

## [1] 5 7 8

So the mode is 5,7,8 or May, July, and August.

4.Ozone is an example of unimodal data as it has only 1 mode. Solar radiation and month data are examples of multimodal data as they have 2 modes and 3 modes respectively.

5.The same steps apply

x<-regicor$age

mlv(x, na.rm = TRUE)

## [1] 58

So the mode is 58 years

Previous Lesson | Main Page | Next Lesson