Home
>
Chi square – Explanation & Examples

JUMP TO TOPIC

1. What is the chi-square test?
2. Hypothesis testing using the chi-square test
- – Steps of hypothesis testing performed by the chi-square test
3. How to calculate the chi-square test?
5. Practice questions
6. Answer key

Chi Square – Explanation & Examples

The definition of the chi-square test is:

“The chi-square test compares two variables in a contingency table to see if they are related.”

In this topic, we will discuss the chi-square test from the following aspects:

What is the chi-square test?
Hypothesis testing using the chi-square test.
Steps of hypothesis testing performed by the chi-square test.
How to calculate the chi-square test?
Practice questions.
Answer key.

1. What is the chi-square test?

The chi-square test of independence, also called χ^2 test, is used to analyze the contingency table formed by two categorical variables.

The chi-square test evaluates whether there is a significant association between the categories of the two variables.

An R × C contingency table is a table with R rows and C columns. It displays the relationship between two variables, where the variable in the rows has R categories and the variable in the columns has C categories.

– Example of 2 X 2 contingency table

A study in 1994 examined 491 dogs that had developed cancer and 945 dogs as a control group (without cancer) to determine whether there is an increased risk of cancer in dogs that are exposed to the herbicide 2,4-Dichlorophenoxyacetic acid (2,4-D).

Reference:

Hayes HM, Tarone RE, Cantor KP, Jessen CR, McCurnin DM, and Richardson RC. 1991. Case-Control Study of Canine Malignant Lymphoma: Positive Association With Dog Owner’s Use of 2, 4- Dichlorophenoxyacetic Acid Herbicides. Journal of the National Cancer Institute 83(17):1226-1231.

The results of this study are shown in the following table.

	cancer	no cancer	Sum
2,4-D	191	304	495
no 2,4-D	300	641	941
Sum	491	945	1436

We see that:

The 2,4-D exposure categories are indicated along the rows and the cancer status categories are indicated along with the columns.
The data are arranged in the form of a 2 × 2 contingency table because the 2,4-D exposure has 2 categories and the cancer status has 2 categories also.
The “cancer” and “no cancer” columns for dogs that developed and did not develop cancer respectively.
The “2,4-D” and “no 2,4-D” rows for dogs that were exposed and were not exposed to 2,4-D respectively.
191 dogs were exposed to 2,4-D and developed cancer.
304 dogs were exposed to 2,4-D and did not develop cancer.
300 dogs were not exposed to 2,4-D and developed cancer.
641 dogs were not exposed to 2,4-D and did not develop cancer.

We want to test for the relationship between exposure to 2,4-D and developing cancer. We use the χ^2 test to test if this relationship truly exists.

– Example of 2 X 5 contingency table

Survey responses for 20,000 responses to the Behavioral Risk Factor Surveillance System.

Source Office of Surveillance, Epidemiology, and Laboratory Services Behavioral Risk Factor Surveillance System, BRFSS 2010 Survey Data.

The results of this study are shown in the following table.

	Excellent	Fair	Good	Poor	Very good	Sum
No	459	385	854	99	727	2524
Yes	4198	1634	4821	578	6245	17476
Sum	4657	2019	5675	677	6972	20000

We see that:

The health coverage categories are indicated along the rows and the health status categories are indicated along with the columns.
The data are arranged in the form of a 2 × 5 contingency table because the health coverage has 2 categories and the health status has 5 categories.
The “Excellent”, “Fair”, “Good”, “Poor”, and “Very good” columns are for the person’s health status.
The “No” and “Yes” rows are for whether the person had health coverage or not.
459 persons had excellent health status and were not having health coverage.
4198 persons had excellent health status and were having health coverage.
385 persons had fair health status and were not having health coverage.
1634 persons had fair health status and were having health coverage.
854 persons had good health status and were not having health coverage.
4821 persons had good health status and were having health coverage.
99 persons had poor health status and were not having health coverage.
578 persons had poor health status and were having health coverage.
727 persons had very good health status and were not having health coverage.
6245 persons had very good health status and were having health coverage.

We want to test for the relationship between health status and health coverage. We use the χ^2 test to test if this relationship truly exists.

– Example of 4 X 3 contingency table

A 2010 Pew Research poll asked 1,306 Americans, “From what you’ve read and heard, is there solid evidence that the average temperature on earth has been getting warmer over the past few decades, or not?”

Source: Pew Research Center, Majority of Republicans No Longer See Evidence of Global Warming, data collected on October 27, 2010.

The results of this study are shown in the following table.

	Don’t know / refuse to answer	Earth is warming	Not warming	Sum
Conservative Republican	45	248	450	743
Liberal Democrat	23	405	23	451
Mod/Cons Democrat	45	563	158	766
Mod/Lib Republican	23	135	135	293
Sum	136	1351	766	2253

We see that:

The party categories are indicated along the rows and the response categories are indicated along with the columns.
The data are arranged in the form of a 4 × 3 contingency table because the party has 4 categories and the response has 3 categories.
The “Don’t know / refuse to answer”, “Earth is warming”, and “Not warming” columns are the response categories.
The “Conservative Republican”, “Liberal Democrat”, “Mod/Cons Democrat”, and “Mod/Lib Republican” rows are the party categories.
45 persons responded “Don’t know / refuse to answer” and were having “Conservative Republican” party, compared to 23 persons having “Liberal Democrat” party, 45 persons having “Mod/Cons Democrat” party, and 23 persons having “Mod/Lib Republican” party.
248 persons responded “Earth is warming” and were having “Conservative Republican” party, compared to 405 persons having “Liberal Democrat” party,
563 persons having “Mod/Cons Democrat” party, and 135 persons having “Mod/Lib Republican” party.
450 persons responded “Not warming” and were having “Conservative Republican” party, compared to 23 persons having “Liberal Democrat” party,
158 persons having “Mod/Cons Democrat” party, and 135 persons having “Mod/Lib Republican” party.

We want to test for the relation between the party categories and the response categories. We use the χ^2 test to test if this relationship truly exists.

2. Hypothesis testing using the chi-square test

Where you start with two exclusive possibilities for the unknown truth. Then, use the sample to choose between these two possibilities for the truth. The two possibilities are the Null hypothesis, Ho, and the Alternative hypothesis, Ha.

The null hypothesis, Ho: There is no difference between the two populations or the two categorical variables, and the difference = zero.
The alternative hypothesis, Ha: There is a difference between the two populations so the difference ≠ zero.

Hypothesis testing is denoted as:

Ho: p_1=p_2 or p_1-p_2=0. The proportions of one variable are the same for different values of the other variable.

In testing the relation between exposure to 2,4-D and developing cancer, this means that the proportion of developing cancer is similar for dogs exposed and not exposed to 2,4-D.

Ha: p_1≠p_2 or p_1-p_2≠0. In testing the relation between exposure to 2,4-D and developing cancer, this means that the proportion of developing cancer is different for dogs exposed and not exposed to 2,4-D.

Note: Although the hypothesis testing for the chi-square test compares proportions, the chi-square test uses the actual count to test that.

– Steps of hypothesis testing performed by the chi-square test

The chi-square test uses the contingency table of data to calculate an expected table.

The expected table contains the theoretical data counts that would be expected when there is no relation between the rows and the columns i.e. the null hypothesis is true, Ho: p_1=p_2.

The test calculates the discrepancies between each observed and expected value and aggregates them.
If the null hypothesis is true, the aggregated value, called the χ^2 statistic, has a chi-square distribution. Define the probability of the aggregated value under this chi-square distribution. This is the p-value.

The p-value is the probability of our sample results if the null hypothesis is true.

The null hypothesis means that the proportion of developing cancer is similar for dogs exposed and not exposed to 2,4-D.

Generally in research, the cut-off used is 0.05. This 0.05 is called the rejection level, α level, or significance level.

Make a decision, accept Ha, or fail to reject Ho.

If p-value < 0.05, so it is a statistically significant result at 0.05 level. Reject the null hypothesis and conclude that our sample data are unlikely under the Ho, null hypothesis, they have a probability of less than 0.05.

If p-value >= 0.05, so it is a statistically non-significant result at 0.05 level and we fail to reject the Null hypothesis.

We say fail to reject the Null hypothesis because if we have a p-value of 0.25. This means that our sample data have a probability of 25% under the null hypothesis which is considered a large percentage. In your opinion, you may consider it small and accept Ha.

Note: No expected value in the expected table is less than 5 (sometimes known as “the rule of five”). If any expected value is less than 5, the chi-square test is not applicable and other tests are applied (Fisher Exact test).

3. How to calculate the chi-square test?

– Example of 2 X 2 contingency table

A study in 1994 examined 491 dogs that had developed cancer and 945 dogs as a control group to determine whether there is an increased risk of cancer in dogs that are exposed to the herbicide 2,4-Dichlorophenoxyacetic acid (2,4-D).

The following 2 X 2 contingency table is obtained:

	cancer	no cancer	Sum
2,4-D	191	304	495
no 2,4-D	300	641	941
Sum	491	945	1436

To see the different proportions of cancer development per 2,4-D exposure, we can use the following table:

	cancer	no cancer
2,4-D	0.39	0.61
no 2,4-D	0.32	0.68

The sum of every row is 1.00 or 100%.

We see that 0.39 or 39% of dogs exposed to 2,4-D developed cancer compared to 0.32 or 32% of dogs not exposed to 2,4-D.

We can plot these proportions in the following bar plot.

To test for the relationship between exposure to 2,4-D and developing cancer, we follow these steps:

Use the 2 X 2 table to calculate the expected count of each cell.

The expected count in the (i, j) cell =
(the total count in the ith row X the total count in the jth column)/ the total count in the table.

The expected counts indicate no association between the rows and columns. In other words, there is no association between exposure to 2,4-D and cancer development.

The sum of the expected values across any row or column must equal the corresponding row or column total.

The expected count for dogs exposed to 2,4-D and developed cancer = (total in 1st row X total in 1st column)/table total = (495X491)/1436 = 169.2514.

The expected count for dogs not exposed to 2,4-D and developed cancer = (total in 2nd row X total in 1st column)/table total = (941X491)/1436 = 321.7486.

The expected count for dogs exposed to 2,4-D and did not develop cancer = (total in 1st row X total in 2nd column)/table total = (495X945)/1436 = 325.7486.

The expected count for dogs not exposed to 2,4-D and did not develop cancer = (total in 2nd row X total in 2nd column)/table total = (941X945)/1436 = 619.2514.

The following table will be produced:

	cancer	no cancer	Sum
2,4-D	169.2514	325.7486	495
no 2,4-D	321.7486	619.2514	941
Sum	491.0000	945.0000	1436

We see that all expected values are larger than 5 so the chi-square test can be used.

To see the different proportions of cancer development per 2,4-D exposure in the expected table:

	cancer	no cancer
2,4-D	0.34	0.66
no 2,4-D	0.34	0.66

The sum of every row is 1.00 or 100%.

We see that 0.34 or 34% of dogs exposed to 2,4-D developed cancer and also 0.34 or 34% of dogs not exposed to 2,4-D.

Make a table with 2 columns for different cells and their observed counts.

We have 4 cells in this 2 X 2 table:

A cell for dogs exposed to 2,4-D and had cancer.
A cell for dogs exposed to 2,4-D and had not cancer.
A cell for dogs not exposed to 2,4-D and had cancer.
A cell for dogs not exposed to 2,4-D and had not cancer.

category	observed
2,4-D, cancer	191
no 2,4-D,cancer	300
2,4-D,no cancer	304
no 2,4-D, no cancer	641

Add a column for the expected count of each cell.

category	observed	expected
2,4-D,cancer	191	169.2514
no 2,4-D,cancer	300	321.7486
2,4-D,no cancer	304	325.7486
no 2,4-D,no cancer	641	619.2514

Subtract the expected value from the Observed value and place the result in the “obs-exp” column.

category	observed	expected	obs-exp
2,4-D,cancer	191	169.2514	21.75
no 2,4-D,cancer	300	321.7486	-21.75
2,4-D,no cancer	304	325.7486	-21.75
no 2,4-D,no cancer	641	619.2514	21.75

Square the differences from Step 4 and place the result in the “(obs-exp)^2” column.

category	observed	expected	obs-exp	(obs-exp)^2
2,4-D, cancer	191	169.2514	21.75	473.06
no 2,4-D,cancer	300	321.7486	-21.75	473.06
2,4-D,no cancer	304	325.7486	-21.75	473.06
no 2,4-D,no cancer	641	619.2514	21.75	473.06

Divide the squared differences by their respective expected value and place the result in the “(obs-exp)^2/exp” column.

category	observed	expected	obs-exp	(obs-exp)^2	(obs-exp)^2/exp
2,4-D,cancer	191	169.2514	21.75	473.06	2.80
no 2,4-D,cancer	300	321.7486	-21.75	473.06	1.47
2,4-D,no cancer	304	325.7486	-21.75	473.06	1.45
no 2,4-D,no cancer	641	619.2514	21.75	473.06	0.76

Sum all the values in the last column to get the chi-square statistic:

The χ^2 statistic = 2.80+1.47+1.45+0.76 = 6.48.

The last column is summed to get an overall measure of agreement between the observed and expected tables.

If the null hypothesis is true, the χ^2 statistic has a chi-square distribution with (R − 1) × (C − 1) degrees of freedom or df.

Define the probability (or the p-value) of the χ^2 statistic under this chi-square distribution.

The p-value is given by the area to the right of the χ^2 statistic under this chi-square distribution.

In our 2X2 contingency table, the df = (2-1)X(2-1) = 1.

The following is the chi-square distribution with 1 df.

The total area under the curve is 1.00 or 100%.

In the first plot, we see that when the χ^2 value = 3.84, the area to the right or the p-value = 0.05.

In our contingency table, the χ^2 value = 6.84 (plotted as a vertical line in the second plot), so the p-value is smaller than 0.05.

Make a decision, accept Ha, or fail to reject Ho.

The p-value < 0.05, so it is a statistically significant result. We reject the null hypothesis and conclude that our sample data are unlikely under the Ho, null hypothesis, they have a probability of less than 0.05.

We conclude that there is a significant relationship between exposure to 2,4-D and cancer development in dogs.

– Example of 2 X 5 contingency table

The Survey responses for 20,000 responses to the Behavioral Risk Factor Surveillance System.

The results of this study are shown in the following table.

	Excellent	Fair	Good	Poor	Very good	Sum
No	459	385	854	99	727	2524
Yes	4198	1634	4821	578	6245	17476
Sum	4657	2019	5675	677	6972	20000

To see the different proportions of health status per health coverage, we can use the following table:

	Excellent	Fair	Good	Poor	Very good
No	0.18	0.15	0.34	0.04	0.29
Yes	0.24	0.09	0.28	0.03	0.36

The sum of every row is 1.00 or 100%.

We see that 0.18 or 18% of persons who do not have health coverage had excellent health status compared to 0.24 or 24% of persons who do have health coverage.

We see that 0.15 or 15% of persons who do not have health coverage had fair health status compared to 0.09 or 9% of persons who do have health coverage, and so on.

We can plot these proportions in the following bar plot.

To test for the relationship between health coverage and health status, we follow these steps:

Use the 2 X 5 table to calculate the expected count of each cell.

The expected counts indicate no association between the rows and columns. In other words, there is no association between health coverage and health status.

The expected count in the (i, j) cell =
(the total count in the ith row X the total count in the jth column)/ the total count in the table.

For example, the expected count for persons who do not have a health coverage and with excellent health status = (row total X column total) / table total = (2524 X 4657)/20000 = 587.7134.

The following table will be produced:

	Excellent	Fair	Good	Poor	Very good	Sum
No	587.7134	254.7978	716.185	85.4374	879.8664	2524
Yes	4069.2866	1764.2022	4958.815	591.5626	6092.1336	17476
Sum	4657.0000	2019.0000	5675.000	677.0000	6972.0000	20000

We see that all expected values are larger than 5 so the chi-square test can be used.

To see the different proportions of health statuses per health coverage in the expected table:

	Excellent	Fair	Good	Poor	Very good
No	0.23	0.1	0.28	0.03	0.35
Yes	0.23	0.1	0.28	0.03	0.35

We see that 0.23 or 23% of persons who do or do not have health coverage had excellent health status.

All other proportions of different health statuses across the health coverage are equal.

Make a table with 2 columns for different cells and their observed counts.

We have 10 cells in this 2 X 5 table which are shown in the following table:

category	observed
No, Excellent	459
Yes, Excellent	4198
No, Fair	385
Yes, Fair	1634
No, Good	854
Yes, Good	4821
No, Poor	99
Yes, Poor	578
No, Very good	727
Yes, Very good	6245

For example, the “No, Excellent” category means persons without health coverage and excellent health status.

Add a column for the expected count of each cell.

category	observed	expected
No, Excellent	459	587.7134
Yes, Excellent	4198	4069.2866
No, Fair	385	254.7978
Yes, Fair	1634	1764.2022
No, Good	854	716.1850
Yes, Good	4821	4958.8150
No,Poor	99	85.4374
Yes, Poor	578	591.5626
No, Very good	727	879.8664
Yes, Very good	6245	6092.1336

Subtract the expected value from the Observed value and place the result in the “obs-exp” column.

category	observed	expected	obs-exp
No,Excellent	459	587.7134	-128.71
Yes,Excellent	4198	4069.2866	128.71
No,Fair	385	254.7978	130.20
Yes,Fair	1634	1764.2022	-130.20
No,Good	854	716.1850	137.82
Yes,Good	4821	4958.8150	-137.81
No,Poor	99	85.4374	13.56
Yes,Poor	578	591.5626	-13.56
No,Very good	727	879.8664	-152.87
Yes, Very good	6245	6092.1336	152.87

Square the differences from Step 4 and place the result in the “(obs-exp)^2” column.

category	observed	expected	obs-exp	(obs-exp)^2
No,Excellent	459	587.7134	-128.71	16566.26
Yes,Excellent	4198	4069.2866	128.71	16566.26
No,Fair	385	254.7978	130.20	16952.04
Yes,Fair	1634	1764.2022	-130.20	16952.04
No,Good	854	716.1850	137.82	18994.35
Yes,Good	4821	4958.8150	-137.81	18991.60
No,Poor	99	85.4374	13.56	183.87
Yes,Poor	578	591.5626	-13.56	183.87
No,Very good	727	879.8664	-152.87	23369.24
Yes,Very good	6245	6092.1336	152.87	23369.24

Divide the squared differences by their respective expected value and place the result in the “(obs-exp)^2/exp” column.

category	observed	expected	obs-exp	(obs-exp)^2	(obs-exp)^2/exp
No,Excellent	459	587.7134	-128.71	16566.26	28.19
Yes,Excellent	4198	4069.2866	128.71	16566.26	4.07
No,Fair	385	254.7978	130.20	16952.04	66.53
Yes,Fair	1634	1764.2022	-130.20	16952.04	9.61
No,Good	854	716.1850	137.82	18994.35	26.52
Yes,Good	4821	4958.8150	-137.81	18991.60	3.83
No,Poor	99	85.4374	13.56	183.87	2.15
Yes,Poor	578	591.5626	-13.56	183.87	0.31
No,Very good	727	879.8664	-152.87	23369.24	26.56
Yes,Very good	6245	6092.1336	152.87	23369.24	3.84

Sum all the values in the last column to get the chi-square statistic:

The χ^2 statistic = 28.19+ 4.07+ 66.53+ 9.61+ 26.52+ 3.83+ 2.15+ 0.31+ 26.56+ 3.84 = 171.61.

If the null hypothesis is true, the χ^2 statistic has a chi-square distribution with (R − 1) × (C − 1) degrees of freedom.

In our 2X5 contingency table, the df = (2-1)X(5-1) = 4.

The following is the chi-square distribution with 4 df.

In the first plot, we see that when the χ^2 value = 9.49, the area to the right or the p-value = 0.05.

In our contingency table, the χ^2 value = 171.61 (plotted as a vertical line in the second plot), so the p-value is very much smaller than 0.05.

Make a decision, accept Ha, or fail to reject Ho.

The p-value < 0.05, so it is a statistically significant result. We reject the null hypothesis and conclude that our sample data are unlikely under the Ho or the null hypothesis.

We conclude that there is a significant relationship between health statuses and health coverage in the persons surveyed.

– Example of 4 X 3 contingency table

The results of this study are shown in the following table.

	Don’t know / refuse to answer	Earth is warming	Not warming	Sum
Conservative Republican	45	248	450	743
Liberal Democrat	23	405	23	451
Mod/Cons Democrat	45	563	158	766
Mod/Lib Republican	23	135	135	293
Sum	136	1351	766	2253

To see the different proportions of responses per different parties, we can use the following table:

	Don’t know / refuse to answer	Earth is warming	Not warming
Conservative Republican	0.06	0.33	0.61
Liberal Democrat	0.05	0.90	0.05
Mod/Cons Democrat	0.06	0.73	0.21
Mod/Lib Republican	0.08	0.46	0.46

The sum of every row is 1.00 or 100%.

We see that:

0.33 or 33% of “Conservative Republican” persons responded that Earth is warming, compared to 0.90 or 90% of “Liberal Democrat” persons, 0.73 or 73% of “Mod/Cons Democrat” persons, and 0.46 or 46% of “Mod/Lib Republican” persons.
0.61 or 61% of “Conservative Republican” persons responded that Earth is not warming, compared to only 0.05 or 5% of “Liberal Democrat” persons, 0.21 or 21% of “Mod/Cons Democrat” persons, and 0.46 or 46% of “Mod/Lib Republican” persons.

We can plot these proportions in the following bar plot.

To test for the relationship between parties and responses, we follow these steps:

Use the 4 X 3 table to calculate the expected count of each cell.

The expected counts indicate no association between the rows and columns. In other words, there is no association between the parties and the responses.

The expected count in the (i, j) cell =
(the total count in the ith row X the total count in the jth column)/ the total count in the table.

For example, the expected count for “Conservative Republican” persons who responded that Earth is warming = (row total X column total) / table total = (743 X 1351)/2253 = 445.5362.

The following table will be produced:

	Don’t know / refuse to answer	Earth is warming	Not warming	Sum
Conservative Republican	44.85042	445.5362	252.6134	743
Liberal Democrat	27.22415	270.4399	153.3360	451
Mod/Cons Democrat	46.23879	459.3280	260.4332	766
Mod/Lib Republican	17.68664	175.6960	99.6174	293
Sum	136.00000	1351.0000	766.0000	2253

We see that all expected values are larger than 5 so the chi-square test can be used.

To see the different proportions of responses per parties in the expected table:

	Don’t know / refuse to answer	Earth is warming	Not warming
Conservative Republican	0.06	0.6	0.34
Liberal Democrat	0.06	0.6	0.34
Mod/Cons Democrat	0.06	0.6	0.34
Mod/Lib Republican	0.06	0.6	0.34

All proportions of different responses across the different parties are the same.

Make a table with 2 columns for different cells and their observed counts.

We have 12 cells in this 4 X 3 table which are shown in the following table:

category	observed
Conservative Republican,Don’t know / refuse to answer	45
Liberal Democrat,Don’t know / refuse to answer	23
Mod/Cons Democrat,Don’t know / refuse to answer	45
Mod/Lib Republican,Don’t know / refuse to answer	23
Conservative Republican,Earth is warming	248
Liberal Democrat,Earth is warming	405
Mod/Cons Democrat,Earth is warming	563
Mod/Lib Republican,Earth is warming	135
Conservative Republican,Not warming	450
Liberal Democrat,Not warming	23
Mod/Cons Democrat,Not warming	158
Mod/Lib Republican,Not warming	135

For example, the “Conservative Republican,Earth is warming” category means “Conservative Republican” persons who responded that Earth is warming.

Add a column for the expected count of each cell.

category	observed	expected
Conservative Republican,Don’t know / refuse to answer	45	44.85042
Liberal Democrat,Don’t know / refuse to answer	23	27.22415
Mod/Cons Democrat,Don’t know / refuse to answer	45	46.23879
Mod/Lib Republican,Don’t know / refuse to answer	23	17.68664
Conservative Republican,Earth is warming	248	445.53617
Liberal Democrat,Earth is warming	405	270.43986
Mod/Cons Democrat,Earth is warming	563	459.32801
Mod/Lib Republican,Earth is warming	135	175.69596
Conservative Republican,Not warming	450	252.61340
Liberal Democrat,Not warming	23	153.33600
Mod/Cons Democrat,Not warming	158	260.43320
Mod/Lib Republican,Not warming	135	99.61740

Subtract the expected value from the Observed value and place the result in the “obs-exp” column.

category	observed	expected	obs-exp
Conservative Republican,Don’t know / refuse to answer	45	44.85042	0.15
Liberal Democrat,Don’t know / refuse to answer	23	27.22415	-4.22
Mod/Cons Democrat,Don’t know / refuse to answer	45	46.23879	-1.24
Mod/Lib Republican,Don’t know / refuse to answer	23	17.68664	5.31
Conservative Republican,Earth is warming	248	445.53617	-197.54
Liberal Democrat,Earth is warming	405	270.43986	134.56
Mod/Cons Democrat,Earth is warming	563	459.32801	103.67
Mod/Lib Republican,Earth is warming	135	175.69596	-40.70
Conservative Republican,Not warming	450	252.61340	197.39
Liberal Democrat,Not warming	23	153.33600	-130.34
Mod/Cons Democrat,Not warming	158	260.43320	-102.43
Mod/Lib Republican,Not warming	135	99.61740	35.38

Square the differences from Step 4 and place the result in the “(obs-exp)^2” column.

category	observed	expected	obs-exp	(obs-exp)^2
Conservative Republican,Don’t know / refuse to answer	45	44.85042	0.15	0.02
Liberal Democrat,Don’t know / refuse to answer	23	27.22415	-4.22	17.81
Mod/Cons Democrat,Don’t know / refuse to answer	45	46.23879	-1.24	1.54
Mod/Lib Republican,Don’t know / refuse to answer	23	17.68664	5.31	28.20
Conservative Republican,Earth is warming	248	445.53617	-197.54	39022.05
Liberal Democrat,Earth is warming	405	270.43986	134.56	18106.39
Mod/Cons Democrat,Earth is warming	563	459.32801	103.67	10747.47
Mod/Lib Republican,Earth is warming	135	175.69596	-40.70	1656.49
Conservative Republican,Not warming	450	252.61340	197.39	38962.81
Liberal Democrat,Not warming	23	153.33600	-130.34	16988.52
Mod/Cons Democrat,Not warming	158	260.43320	-102.43	10491.90
Mod/Lib Republican,Not warming	135	99.61740	35.38	1251.74

Divide the squared differences by their respective expected value and place the result in the “(obs-exp)^2/exp” column.

category	observed	expected	obs-exp	(obs-exp)^2	(obs-exp)^2/exp
Conservative Republican,Don’t know / refuse to answer	45	44.85042	0.15	0.02	0.00
Liberal Democrat,Don’t know / refuse to answer	23	27.22415	-4.22	17.81	0.65
Mod/Cons Democrat,Don’t know / refuse to answer	45	46.23879	-1.24	1.54	0.03
Mod/Lib Republican,Don’t know / refuse to answer	23	17.68664	5.31	28.20	1.59
Conservative Republican,Earth is warming	248	445.53617	-197.54	39022.05	87.58
Liberal Democrat,Earth is warming	405	270.43986	134.56	18106.39	66.95
Mod/Cons Democrat,Earth is warming	563	459.32801	103.67	10747.47	23.40
Mod/Lib Republican,Earth is warming	135	175.69596	-40.70	1656.49	9.43
Conservative Republican,Not warming	450	252.61340	197.39	38962.81	154.24
Liberal Democrat,Not warming	23	153.33600	-130.34	16988.52	110.79
Mod/Cons Democrat,Not warming	158	260.43320	-102.43	10491.90	40.29
Mod/Lib Republican,Not warming	135	99.61740	35.38	1251.74	12.57

Sum all the values in the last column to get the chi-square statistic:

The χ^2 statistic = 507.52.

If the null hypothesis is true, the χ^2 statistic has a chi-square distribution with (R − 1) × (C − 1) degrees of freedom.

In our 4X3 contingency table, the df = (4-1)X(3-1) = 6.

The following is the chi-square distribution with 6 df.

In the first plot, we see that when the χ^2 value = 12.59, the area to the right or the p-value = 0.05.

In our contingency table, the χ^2 value = 507.52 (plotted as a vertical line in the second plot), so the p-value is very much smaller than 0.05.

Make a decision, accept Ha, or fail to reject Ho.

The p-value < 0.05, so it is a statistically significant result. We reject the null hypothesis and conclude that our sample data are unlikely under the null hypothesis.

We conclude that there is a significant relationship between the different parties and the response type in the persons surveyed.

5. Practice questions

1. The Data from the 2010 General Social Survey shows the following table.

	LEGAL	NOT LEGAL	Sum
BACHELOR	119	112	231
GRADUATE	73	63	136
HIGH SCHOOL	304	307	611
JUNIOR COLLEGE	42	44	86
LT HIGH SCHOOL	65	130	195
Sum	603	656	1259

The rows are for the educational degree and the columns for answering the question “Do you think the use of marijuana should be made legal, or not?”.

Is there a relationship between educational degree and the answer type?

2. A sample of categorical variables from the General Social survey showed the following table.

	Other	Black	White	Sum
$25000 or more	621	886	5856	7363
$20000 – 24999	112	220	951	1283
$15000 – 19999	134	180	734	1048
$10000 – 14999	126	210	832	1168
$8000 to 9999	41	56	243	340
$7000 to 7999	24	27	137	188
$6000 to 6999	26	35	154	215
$5000 to 5999	27	40	160	227
$4000 to 4999	34	38	154	226
$3000 to 3999	35	59	182	276
$1000 to 2999	47	71	277	395
Lt $1000	36	51	199	286
Sum	1263	1873	9879	13015

The rows are for the reported income and the columns for race categories.

Is there a relationship between race and the reported income?

3. A study from the 1970s about whether gender influences hiring recommendations showed the following table.

	not	promoted	Sum
female	10	14	24
male	3	21	24
Sum	13	35	48

The rows are for the gender and the columns for the promotions.

All expected values are larger than 5 so the chi-square test can be used.

The χ^2 statistic = 5.1692.

The following is the chi-square distribution with 1 df.

Is that a significant result?

4. The demographic information on every member of a 1000 random sample of the US armed forces showed the following table.

	female	male	Sum
air force	40	175	215
army	54	351	405
marine corps	6	148	154
navy	34	192	226
Sum	134	866	1000

The rows are for the branch of the armed forces: air force, army, marine corps, or navy, and the columns for the gender.

All expected values are larger than 5 so the chi-square test can be used.

The χ^2 statistic = 17.534.

The following is the chi-square distribution with 3 df.

Is that a significant result?

5. The demographic information on every member of a random 2000 sample of the US armed forces showed the following table.

	asian	black	white	Sum
air force	11	91	364	466
army	38	176	586	800
marine corps	6	39	250	295
navy	30	99	310	439
Sum	85	405	1510	2000

The rows are for the branch of the armed forces: air force, army, marine corps, or navy, and the columns for the race.

All expected values are larger than 5 so the chi-square test can be used.

The χ^2 statistic = 30.051.

The following is the chi-square distribution with 6 df.

Is that a significant result?

6. Answer key

1. We follow the same steps above to calculate the expected table.

	LEGAL	NOT LEGAL	Sum
BACHELOR	110.63781	120.36219	231
GRADUATE	65.13741	70.86259	136
HIGH SCHOOL	292.63940	318.36060	611
JUNIOR COLLEGE	41.18983	44.81017	86
LT HIGH SCHOOL	93.39555	101.60445	195
Sum	603.00000	656.00000	1259

We see that all expected values are larger than 5 so the chi-square test can be used.

Then, we follow the above steps to get the chi-square statistic.

The χ^2 statistic = 20.48.

In our 5X2 contingency table, the df = (5-1)X(2-1) = 4.

The following is the chi-square distribution with 4 df.

In the first plot, we see that when the χ^2 value = 9.49, the area to the right or the p-value = 0.05.

In our contingency table, the χ^2 value = 20.48 (plotted as a vertical line in the second plot), so the p-value is very much smaller than 0.05.

The p-value < 0.05, so it is a statistically significant result.

We reject the null hypothesis and conclude that there is a significant relationship between the different educational degrees and the response type in the persons surveyed.

This means that the proportions of response type are different across different educational degrees.

2. We follow the same steps above to calculate the expected table.

	Other	Black	White	Sum
$25000 or more	714.51932	1059.61575	5588.8649	7363
$20000 – 24999	124.50473	184.63765	973.8576	1283
$15000 – 19999	101.69988	150.81859	795.4815	1048
$10000 – 14999	113.34491	168.08790	886.5672	1168
$8000 to 9999	32.99424	48.92970	258.0761	340
$7000 to 7999	18.24387	27.05524	142.7009	188
$6000 to 6999	20.86400	30.94084	163.1952	215
$5000 to 5999	22.02851	32.66777	172.3037	227
$4000 to 4999	21.93146	32.52386	171.5447	226
$3000 to 3999	26.78356	39.71940	209.4970	276
$1000 to 2999	38.33154	56.84479	299.8237	395
Lt $1000	27.75398	41.15851	217.0875	286
Sum	1263.00000	1873.00000	9879.0000	13015

We see that all expected values are larger than 5 so the chi-square test can be used.

Then, we follow the above steps to get the chi-square statistic.

The χ^2 statistic = 148.13.

In our 12X3 contingency table, the df = (12-1)X(3-1) = 22.

The following is the chi-square distribution with 22 df.

In the first plot, we see that when the χ^2 value = 33.92, the area to the right or the p-value = 0.05.

In our contingency table, the χ^2 value = 148.13 (plotted as a vertical line in the second plot), so the p-value is very much smaller than 0.05.

The p-value < 0.05, so it is a statistically significant result.

We reject the null hypothesis and conclude that there is a significant relationship between race and the reported income in the persons surveyed.

This means that the proportions of incomes are different across different races.

3. We see, from the plot, that when the χ^2 value = 3.84, the area to the right or the p-value = 0.05.

In our contingency table, the χ^2 value = 5.1692, so the p-vale is smaller than 0.05.
The p-value < 0.05, so it is a statistically significant result.

We reject the null hypothesis and conclude that there is a significant relationship between gender and being promoted.

This means that the proportions of promotions are different across the 2 sexes.

4. We see, from the plot, that when the χ^2 value = 7.81, the area to the right or the p-value = 0.05.

In our contingency table, the χ^2 value = 17.534, so the p-vale is smaller than 0.05.
The p-value < 0.05, so it is a statistically significant result.

We reject the null hypothesis and conclude that there is a significant relationship between gender and the branch of the armed forces.

This means that the proportions of branches from the US armed forces are different across the 2 sexes.

5. We see, from the plot, that when the χ^2 value = 12.59, the area to the right or the p-value = 0.05.

In our contingency table, the χ^2 value = 30.051, so the p-value is smaller than 0.05.
The p-value < 0.05, so it is a statistically significant result.

We reject the null hypothesis and conclude that there is a significant relationship between race and the branch of the armed forces.

This means that the proportions of branches are different across the different races.

Chi Square – Explanation & Examples

1. What is the chi-square test?

– Example of 2 X 2 contingency table

– Example of 2 X 5 contingency table

– Example of 4 X 3 contingency table

2. Hypothesis testing using the chi-square test

– Steps of hypothesis testing performed by the chi-square test

3. How to calculate the chi-square test?

– Example of 2 X 2 contingency table

– Example of 2 X 5 contingency table

– Example of 4 X 3 contingency table

5. Practice questions

6. Answer key

Previous Lesson | Main Page | Next Lesson