banner

Bar Graph – Explanation & Examples

The definition of the bar graph is:

“The bar graph is a chart used to represent categorical data using bars’ heights”

In this topic, we will discuss the bar graph from the following aspects:

  • What is a bar graph?
  • How to make a bar graph?
  • How to read bar graphs?
  • Vertical bar graph
  • Horizontal bar graph
  • Creating bar graphs with R
  • Practical questions
  • Answers

What is a bar graph?

The bar graph is a graph used to represent categorical data using bars of different heights.

The heights of the bars are proportional to the values or the frequencies of these categorical data.

How to make a bar graph?

The bar graph is made by plotting the categorical data on one axis and the values of these categorical data on the other axis.

Example 1, A survey of smoking habits for 10 individuals has shown the following table

Smoking habit

Count

Never smoker

5

Current smoker

2

Former smoker

3

By plotting this data as a bar graph, we will get.

The x-axis or the horizontal axis has the categorical data and the y axis or the vertical axis has the counts of these categories.

The length of the Never smoker bar is 5, the length of the former smoker bar is 3, and the length of the current smoker bar is 2.

Each bar has a height that corresponds to the count of these smoking habits.

Example 2, the following table is the landmass area of 4 continents (Africa, Antarctica, Asia, and Australia) in thousands of square miles.

Location

Area

Africa

11506

Antarctica

5500

Asia

16988

Australia

2968

If we plot this data as a bar graph, we will get.

We see that the bar for Asia is the longest one followed by the bar for Africa and Antarctica. The bar corresponding to Australia has the lowest height.

In the second bar plot, we see that each bar’s height corresponds to the area of each continent.

How to read bar graphs?

we read the bar graph by looking at the bars’ heights to determine the category with highest and lowest values.

In the example of smoking habits, the Never smoker category has the longest bar so this category has the highest count in our survey.

The current smoker has the lowest height so this category has the lowest count in our survey.

In the example of continents’ areas, Asia has the longest bar followed by Africa, Antarctica, Australia. Therefore, we can arrange these continents according to their area in the following descending order

Asia > Africa > Antarctica > Australia

If we want the exact value of each category, we can extrapolate a line from the top of each bar to its value on the y axis.

We see that the line from the never smoker bar is extrapolated to 5, so the count of never smokers in our survey is 5.

Similarly, the count of former smokers is 3 and the count of current smokers is only 2.

In the plot of continents’ areas.

By extrapolating the lines from each bar top, we see that:

The area of Asia = 16,988,000 square miles.

The area of Africa = 11,506,000 square miles.

The area of Antarctica = 5,500,000 square miles.

The area of Australia = 2,968,000 square miles.

Vertical bar graph

All the above examples are examples of vertical bar plots where we have the categories on the x-axis or the horizontal axis and the categories’ values on the y- axis or the vertical axis.

We use vertical bar graphs when we have a low number of categories.

For example, we have the following table of the landmass area of different locations in thousands of square miles.

Location

Area

Africa

11506

Antarctica

5500

Asia

16988

Australia

2968

Axel Heiberg

16

Baffin

184

Banks

23

Borneo

280

Britain

84

Celebes

73

Celon

25

Cuba

43

Devon

21

Ellesmere

82

Europe

3745

Greenland

840

Hainan

13

Hispaniola

30

Hokkaido

30

Honshu

89

Iceland

40

Ireland

33

Java

49

Kyushu

14

Luzon

42

Madagascar

227

Melville

16

Mindanao

36

Moluccas

29

New Britain

15

New Guinea

306

New Zealand (N)

44

New Zealand (S)

58

Newfoundland

43

North America

9390

Novaya Zemlya

32

Prince of Wales

13

Sakhalin

29

South America

6795

Southampton

16

Spitsbergen

15

Sumatra

183

Taiwan

14

Tasmania

26

Tierra del Fuego

19

Timor

13

Vancouver

12

Victoria

82

We have 48 different locations. If we plot this data as a vertical bar graph, we will get.

The categories are crowded together and difficult to discern.

One solution to that is using a horizontal bar graph.

Horizontal bar graph

We make the horizontal bar graph by reversing the positions of the categories and their values.

The categories are on the y axis and their values on the x-axis.

The horizontal bar graph for the 48 different locations.

The categories are now more discerned than before.

Let’s look at another example.

The following is a table for the maximum wind speed for 30 storms.

name

maximum wind speed

Opal

130

Ophelia

120

Oscar

45

Otto

75

Pablo

50

Paloma

125

Patty

40

Paula

90

Peter

60

Philippe

80

Rafael

80

Richard

85

Rina

100

Rita

155

Roxanne

100

Sandy

100

Sean

55

Sebastien

55

Shary

65

Sixteen

25

Stan

70

Tammy

45

Tanya

75

Ten

30

Tomas

85

Tony

45

Two

30

Vince

65

Wilma

160

Zeta

55

We can plot this data as a vertical bar graph

or, more clearly, as a horizontal bar graph

A more informative graph would be by arranging the different storms according to their maximum wind speed.

From this, we see that the storm with the highest maximum speed is Wilma and Sixteen has the lowest maximum wind speed.

Creating bar graphs with R

R has an excellent package called tidyverse that contains many packages for data visualization (as ggplot2) and data analysis (as dplyr).

These packages allow us to draw different versions of bar graphs for large datasets.

However, they require the supplied data to be a data frame which is a tabular form to store data in R.

Example: The relig_income data frame is part of the tidyverse package and contains data related to the Pew religion and income survey.

We begin our session by activating the tidyverse package using the library function.

Then, we load the relig_income data using the data function and examine it by typing its name.

The data is composed of 11 columns, 1 column for 18 religion categories, and 10 columns for different income categories.

Finally, we use the ggplot function with argument data = relig_income, and religion on the x-axis and <$10k on the y-axis plus geom_col function to draw the bar graph for this income category.

This will plot a vertical bar graph showing the number of persons in this survey who earn <$10k for each religion.

library(tidyverse)

data(“relig_income”)

relig_income

## # A tibble: 18 x 11
##    religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` `$75-100k`
##                                        
##  1 Agnostic      27        34        60        81        76       137        122
##  2 Atheist       12        27        37        52        35        70         73
##  3 Buddhist      27        21        30        34        33        58         62
##  4 Catholic     418       617       732       670       638      1116        949
##  5 Don’t k~      15        14        15        11        10        35         21
##  6 Evangel~     575       869      1064       982       881      1486        949
##  7 Hindu          1         9         7         9        11        34         47
##  8 Histori~     228       244       236       238       197       223        131
##  9 Jehovah~      20        27        24        24        21        30         15
## 10 Jewish        19        19        25        25        30        95         69
## 11 Mainlin~     289       495       619       655       651      1107        939
## 12 Mormon        29        40        48        51        56       112         85
## 13 Muslim         6         7         9        10         9        23         16
## 14 Orthodox      13        17        23        32        32        47         38
## 15 Other C~       9         7        11        13        13        14         18
## 16 Other F~      20        33        40        46        49        63         46
## 17 Other W~       5         2         3         4         2         7          3
## 18 Unaffil~     217       299       374       365       341       528        407
## # … with 3 more variables: `$100-150k` , `>150k` , `Don’t
## #   know/refused`

ggplot(data = relig_income, aes(x = religion, y = `<$10k`))+

  geom_col()

The different religions are crowded together so we draw horizontal bar graph by adding the coord_flip function.

ggplot(data = relig_income, aes(x = religion, y = `<$10k`))+

  geom_col()+ coord_flip()

An important information can be added by using geom_label function with argument, aes(label = income category).

This function will add the number of persons that corresponds to each religion at the top of each bar.

ggplot(data = relig_income, aes(x = religion, y = `<$10k`))+

  geom_col()+ coord_flip()+ geom_label(aes(label = `<$10k`))

For the persons earning <$10k, the Evangelical Prot religion has the highest number of persons (575), while the Hindu religion has the lowest number of persons (only 1).

If we plot the highest income category (>150k)

ggplot(data = relig_income, aes(x = religion, y = `>150k`))+

  geom_col()+ coord_flip()+ geom_label(aes(label = `>150k`))

For the persons earning >$150k, the Mainline Prot religion has the highest number of persons (634), while the Other World Religions category has the lowest number of persons (only 4).

Practical questions

1. For the relig_income data, plot the $75-100k column, and determine which religion has the highest number of persons earning this amount?

2. For the relig_income data, plot the $30-40k column, and determine which religion has the lowest number of persons earning this amount?

3. The mtcars data contains some properties of 32 automobiles of 1973-1974 models.

We use the rownames_to_column to add another column containing the model names.

Plot this data and determine which model has the highest weight (wt column).

dat<- mtcars %>% rownames_to_column(var = “model”)

4. For the same mtcars data, plot the data as a bar graph and determine which model has the lowest number of carburetors (carb column)

5. The state.x77 is a matrix containing some data about the 50 states of the USA in the 1970s.

We use this function to convert it to a data frame and add a column for the state name

dat2<- state.x77 %>% data.frame() %>% rownames_to_column(var = “state”)

Use this data and plot it as a bar graph to determine which state has the lowest and highest murder rate (Murder column)

Answers

1. As before, we begin our session by activating the tidyverse package using the library function.

Then, we load the relig_income data using the data function and plotting the bar graph using the $75-100k column as the y argument, and label the bars using the same column.

library(tidyverse)

data(“relig_income”)

ggplot(data = relig_income, aes(x = religion, y = `$75-100k`))+

  geom_col()+ coord_flip()+ geom_label(aes(label = `$75-100k`))

We see that both the Evangelical Prot and Catholic religions have the highest number of persons earning this income or 949 persons.

2. As before, but we use $30-40k as the y argument and for labeling the bars.

library(tidyverse)

data(“relig_income”)

ggplot(data = relig_income, aes(x = religion, y = `$30-40k`))+

  geom_col()+ coord_flip()+ geom_label(aes(label = `$30-40k`))

We see that the other world religions category has the lowest number of persons earning this amount (4 persons only).

3. We use the created dat data frame with model as x argument and wt as y argument and for labeling the bars.

ggplot(data = dat, aes(x = model, y = wt))+

  geom_col()+ coord_flip()+ geom_label(aes(label = wt))

We see that the model “Lincoln Continental” has the largest weight or 5.424.

4. We use the created dat data frame with model as x argument and carb as y argument and for labeling the bars.

ggplot(data = dat, aes(x = model, y = carb))+

  geom_col()+ coord_flip()+ geom_label(aes(label = carb))

We see that different models have the lowest number of carburetors or 1 carburetor only. These models are “Datsun 710”, “Hornet 4 Drive”, “Valiant”, “Fiat 128”, “Toyota Corolla”, “Toyota Corona”, and “Fiat X1-9”.

5. We use the created dat2 data frame with state as x argument and Murder as y argument and for labeling the bars.

ggplot(data = dat2, aes(x = state, y = Murder))+

  geom_col()+ coord_flip()+ geom_label(aes(label = Murder))

We see that the state with the highest murder rate was Alabama (15.1), and North Dakota was the state with the lowest murder rate (1.4).

Previous Lesson | Main Page | Next Lesson