- Home
- >
- Bar graph – Explanation & Examples
Contents
Bar Graph – Explanation & Examples
The definition of the bar graph is:
“The bar graph is a chart used to represent categorical data using bars’ heights”
In this topic, we will discuss the bar graph from the following aspects:
- What is a bar graph?
- How to make a bar graph?
- How to read bar graphs?
- Vertical bar graph
- Horizontal bar graph
- Creating bar graphs with R
- Practical questions
- Answers
What is a bar graph?
The bar graph is a graph used to represent categorical data using bars of different heights.
The heights of the bars are proportional to the values or the frequencies of these categorical data.
How to make a bar graph?
The bar graph is made by plotting the categorical data on one axis and the values of these categorical data on the other axis.
Example 1, A survey of smoking habits for 10 individuals has shown the following table
Smoking habit | Count |
Never smoker | 5 |
Current smoker | 2 |
Former smoker | 3 |
By plotting this data as a bar graph, we will get.
The x-axis or the horizontal axis has the categorical data and the y axis or the vertical axis has the counts of these categories.
The length of the Never smoker bar is 5, the length of the former smoker bar is 3, and the length of the current smoker bar is 2.
Each bar has a height that corresponds to the count of these smoking habits.
Example 2, the following table is the landmass area of 4 continents (Africa, Antarctica, Asia, and Australia) in thousands of square miles.
Location | Area |
Africa | 11506 |
Antarctica | 5500 |
Asia | 16988 |
Australia | 2968 |
If we plot this data as a bar graph, we will get.
We see that the bar for Asia is the longest one followed by the bar for Africa and Antarctica. The bar corresponding to Australia has the lowest height.
In the second bar plot, we see that each bar’s height corresponds to the area of each continent.
How to read bar graphs?
we read the bar graph by looking at the bars’ heights to determine the category with highest and lowest values.
In the example of smoking habits, the Never smoker category has the longest bar so this category has the highest count in our survey.
The current smoker has the lowest height so this category has the lowest count in our survey.
In the example of continents’ areas, Asia has the longest bar followed by Africa, Antarctica, Australia. Therefore, we can arrange these continents according to their area in the following descending order
Asia > Africa > Antarctica > Australia
If we want the exact value of each category, we can extrapolate a line from the top of each bar to its value on the y axis.
We see that the line from the never smoker bar is extrapolated to 5, so the count of never smokers in our survey is 5.
Similarly, the count of former smokers is 3 and the count of current smokers is only 2.
In the plot of continents’ areas.
By extrapolating the lines from each bar top, we see that:
The area of Asia = 16,988,000 square miles.
The area of Africa = 11,506,000 square miles.
The area of Antarctica = 5,500,000 square miles.
The area of Australia = 2,968,000 square miles.
Vertical bar graph
All the above examples are examples of vertical bar plots where we have the categories on the x-axis or the horizontal axis and the categories’ values on the y- axis or the vertical axis.
We use vertical bar graphs when we have a low number of categories.
For example, we have the following table of the landmass area of different locations in thousands of square miles.
Location | Area |
Africa | 11506 |
Antarctica | 5500 |
Asia | 16988 |
Australia | 2968 |
Axel Heiberg | 16 |
Baffin | 184 |
Banks | 23 |
Borneo | 280 |
Britain | 84 |
Celebes | 73 |
Celon | 25 |
Cuba | 43 |
Devon | 21 |
Ellesmere | 82 |
Europe | 3745 |
Greenland | 840 |
Hainan | 13 |
Hispaniola | 30 |
Hokkaido | 30 |
Honshu | 89 |
Iceland | 40 |
Ireland | 33 |
Java | 49 |
Kyushu | 14 |
Luzon | 42 |
Madagascar | 227 |
Melville | 16 |
Mindanao | 36 |
Moluccas | 29 |
New Britain | 15 |
New Guinea | 306 |
New Zealand (N) | 44 |
New Zealand (S) | 58 |
Newfoundland | 43 |
North America | 9390 |
Novaya Zemlya | 32 |
Prince of Wales | 13 |
Sakhalin | 29 |
South America | 6795 |
Southampton | 16 |
Spitsbergen | 15 |
Sumatra | 183 |
Taiwan | 14 |
Tasmania | 26 |
Tierra del Fuego | 19 |
Timor | 13 |
Vancouver | 12 |
Victoria | 82 |
We have 48 different locations. If we plot this data as a vertical bar graph, we will get.
The categories are crowded together and difficult to discern.
One solution to that is using a horizontal bar graph.
Horizontal bar graph
We make the horizontal bar graph by reversing the positions of the categories and their values.
The categories are on the y axis and their values on the x-axis.
The horizontal bar graph for the 48 different locations.
The categories are now more discerned than before.
Let’s look at another example.
The following is a table for the maximum wind speed for 30 storms.
name | maximum wind speed |
Opal | 130 |
Ophelia | 120 |
Oscar | 45 |
Otto | 75 |
Pablo | 50 |
Paloma | 125 |
Patty | 40 |
Paula | 90 |
Peter | 60 |
Philippe | 80 |
Rafael | 80 |
Richard | 85 |
Rina | 100 |
Rita | 155 |
Roxanne | 100 |
Sandy | 100 |
Sean | 55 |
Sebastien | 55 |
Shary | 65 |
Sixteen | 25 |
Stan | 70 |
Tammy | 45 |
Tanya | 75 |
Ten | 30 |
Tomas | 85 |
Tony | 45 |
Two | 30 |
Vince | 65 |
Wilma | 160 |
Zeta | 55 |
We can plot this data as a vertical bar graph
or, more clearly, as a horizontal bar graph
A more informative graph would be by arranging the different storms according to their maximum wind speed.
From this, we see that the storm with the highest maximum speed is Wilma and Sixteen has the lowest maximum wind speed.
Creating bar graphs with R
R has an excellent package called tidyverse that contains many packages for data visualization (as ggplot2) and data analysis (as dplyr).
These packages allow us to draw different versions of bar graphs for large datasets.
However, they require the supplied data to be a data frame which is a tabular form to store data in R.
Example: The relig_income data frame is part of the tidyverse package and contains data related to the Pew religion and income survey.
We begin our session by activating the tidyverse package using the library function.
Then, we load the relig_income data using the data function and examine it by typing its name.
The data is composed of 11 columns, 1 column for 18 religion categories, and 10 columns for different income categories.
Finally, we use the ggplot function with argument data = relig_income, and religion on the x-axis and <$10k on the y-axis plus geom_col function to draw the bar graph for this income category.
This will plot a vertical bar graph showing the number of persons in this survey who earn <$10k for each religion.
library(tidyverse)
data(“relig_income”)
relig_income
## # A tibble: 18 x 11
## religion `<$10k` `$10-20k` `$20-30k` `$30-40k` `$40-50k` `$50-75k` `$75-100k`
##
## 1 Agnostic 27 34 60 81 76 137 122
## 2 Atheist 12 27 37 52 35 70 73
## 3 Buddhist 27 21 30 34 33 58 62
## 4 Catholic 418 617 732 670 638 1116 949
## 5 Don’t k~ 15 14 15 11 10 35 21
## 6 Evangel~ 575 869 1064 982 881 1486 949
## 7 Hindu 1 9 7 9 11 34 47
## 8 Histori~ 228 244 236 238 197 223 131
## 9 Jehovah~ 20 27 24 24 21 30 15
## 10 Jewish 19 19 25 25 30 95 69
## 11 Mainlin~ 289 495 619 655 651 1107 939
## 12 Mormon 29 40 48 51 56 112 85
## 13 Muslim 6 7 9 10 9 23 16
## 14 Orthodox 13 17 23 32 32 47 38
## 15 Other C~ 9 7 11 13 13 14 18
## 16 Other F~ 20 33 40 46 49 63 46
## 17 Other W~ 5 2 3 4 2 7 3
## 18 Unaffil~ 217 299 374 365 341 528 407
## # … with 3 more variables: `$100-150k` , `>150k` , `Don’t
## # know/refused`
ggplot(data = relig_income, aes(x = religion, y = `<$10k`))+
geom_col()
The different religions are crowded together so we draw horizontal bar graph by adding the coord_flip function.
ggplot(data = relig_income, aes(x = religion, y = `<$10k`))+
geom_col()+ coord_flip()
An important information can be added by using geom_label function with argument, aes(label = income category).
This function will add the number of persons that corresponds to each religion at the top of each bar.
ggplot(data = relig_income, aes(x = religion, y = `<$10k`))+
geom_col()+ coord_flip()+ geom_label(aes(label = `<$10k`))
For the persons earning <$10k, the Evangelical Prot religion has the highest number of persons (575), while the Hindu religion has the lowest number of persons (only 1).
If we plot the highest income category (>150k)
ggplot(data = relig_income, aes(x = religion, y = `>150k`))+
geom_col()+ coord_flip()+ geom_label(aes(label = `>150k`))
For the persons earning >$150k, the Mainline Prot religion has the highest number of persons (634), while the Other World Religions category has the lowest number of persons (only 4).
Practical questions
1. For the relig_income data, plot the $75-100k column, and determine which religion has the highest number of persons earning this amount?
2. For the relig_income data, plot the $30-40k column, and determine which religion has the lowest number of persons earning this amount?
3. The mtcars data contains some properties of 32 automobiles of 1973-1974 models.
We use the rownames_to_column to add another column containing the model names.
Plot this data and determine which model has the highest weight (wt column).
dat<- mtcars %>% rownames_to_column(var = “model”)
4. For the same mtcars data, plot the data as a bar graph and determine which model has the lowest number of carburetors (carb column)
5. The state.x77 is a matrix containing some data about the 50 states of the USA in the 1970s.
We use this function to convert it to a data frame and add a column for the state name
dat2<- state.x77 %>% data.frame() %>% rownames_to_column(var = “state”)
Use this data and plot it as a bar graph to determine which state has the lowest and highest murder rate (Murder column)
Answers
1. As before, we begin our session by activating the tidyverse package using the library function.
Then, we load the relig_income data using the data function and plotting the bar graph using the $75-100k column as the y argument, and label the bars using the same column.
library(tidyverse)
data(“relig_income”)
ggplot(data = relig_income, aes(x = religion, y = `$75-100k`))+
geom_col()+ coord_flip()+ geom_label(aes(label = `$75-100k`))
We see that both the Evangelical Prot and Catholic religions have the highest number of persons earning this income or 949 persons.
2. As before, but we use $30-40k as the y argument and for labeling the bars.
library(tidyverse)
data(“relig_income”)
ggplot(data = relig_income, aes(x = religion, y = `$30-40k`))+
geom_col()+ coord_flip()+ geom_label(aes(label = `$30-40k`))
We see that the other world religions category has the lowest number of persons earning this amount (4 persons only).
3. We use the created dat data frame with model as x argument and wt as y argument and for labeling the bars.
ggplot(data = dat, aes(x = model, y = wt))+
geom_col()+ coord_flip()+ geom_label(aes(label = wt))
We see that the model “Lincoln Continental” has the largest weight or 5.424.
4. We use the created dat data frame with model as x argument and carb as y argument and for labeling the bars.
ggplot(data = dat, aes(x = model, y = carb))+
geom_col()+ coord_flip()+ geom_label(aes(label = carb))
We see that different models have the lowest number of carburetors or 1 carburetor only. These models are “Datsun 710”, “Hornet 4 Drive”, “Valiant”, “Fiat 128”, “Toyota Corolla”, “Toyota Corona”, and “Fiat X1-9”.
5. We use the created dat2 data frame with state as x argument and Murder as y argument and for labeling the bars.
ggplot(data = dat2, aes(x = state, y = Murder))+
geom_col()+ coord_flip()+ geom_label(aes(label = Murder))
We see that the state with the highest murder rate was Alabama (15.1), and North Dakota was the state with the lowest murder rate (1.4).