Contents
Normal Probability Plot – Explanation & Examples
The definition of the normal probability plot is:
“The normal probability plot is a plot used to assess the normal distribution of numerical data.”
In this topic, we will discuss the normal probability plot from the following aspects:
- What is a normal probability plot?
- How to make a normal probability plot?
- How to read a normal probability plot?
- Practice questions.
- Answer key.
1. What is a normal probability plot?
The normal probability plot is a plot used to assess the normal distribution of any numerical data.
Making a histogram of your data can help you decide whether or not a set of data is normal, but there is a more specialized type of plot you can create, called a normal probability plot.
If the data follow a normal distribution then a normal probability plot of the theoretical percentiles of the normal distribution on the x-axis versus the observed sample percentiles on the y-axis should be approximately linear.
The theoretical p% percentile of a normal distribution is the value such that p% of the values are lower than that value.
The sample p% percentile of any numerical data is the value such that p% of the measurements fall below that value.
For example, the 50% percentile or the median is the value so that 50% or half of your measurements fall below that value.
Another example, the 27% percentile is the value so that 27% of the data points in your numerical data fall below that value.
2. How to make a normal probability plot?
We will go through several examples.
– Example 1
The following are the weights (in kg) of 100 persons from a certain survey.
52.44 52.77 54.56 53.07 53.13 54.72 53.46 51.73 52.31 52.55 54.22 53.36 53.40 53.11 52.44 54.79 53.50 51.03 53.70 52.53 51.93 52.78 51.97 52.27 52.37 51.31 53.84 53.15 51.86 54.25 53.43 52.70 53.90 53.88 53.82 53.69 53.55 52.94 52.69 52.62 52.31 52.79 51.73 55.17 54.21 51.88 52.60 52.53 53.78 52.92 53.25 52.97 52.96 54.37 52.77 54.52 51.45 53.58 53.12 53.22 53.38 52.50 52.67 51.98 51.93 53.30 53.45 53.05 53.92 55.05 52.51 50.69 54.01 52.29 52.31 54.03 52.72 51.78 53.18 52.86 53.01 53.39 52.63 53.64 52.78 53.33 54.10 53.44 52.67 54.15 53.99 53.55 53.24 52.37 54.36 52.40 55.19 54.53 52.76 51.97.
Draw a normal probability plot of this data.
1. Order the numbers from smallest to largest number.
50.69 51.03 51.31 51.45 51.73 51.73 51.78 51.86 51.88 51.93 51.93 51.97 51.97 51.98 52.27 52.29 52.31 52.31 52.31 52.37 52.37 52.40 52.44 52.44 52.50 52.51 52.53 52.53 52.55 52.60 52.62 52.63 52.67 52.67 52.69 52.70 52.72 52.76 52.77 52.77 52.78 52.78 52.79 52.86 52.92 52.94 52.96 52.97 53.01 53.05 53.07 53.11 53.12 53.13 53.15 53.18 53.22 53.24 53.25 53.30 53.33 53.36 53.38 53.39 53.40 53.43 53.44 53.45 53.46 53.50 53.55 53.55 53.58 53.64 53.69 53.70 53.78 53.82 53.84 53.88 53.90 53.92 53.99 54.01 54.03 54.10 54.15 54.21 54.22 54.25 54.36 54.37 54.52 54.53 54.56 54.72 54.79 55.05 55.17 55.19.
2. Assign a rank to each value of your data.
weight
rank50.691
51.03
251.313
51.45
451.735
51.73
651.787
51.86
851.889
51.93
1051.9311
51.97
12
51.97
1351.9814
52.27
1552.2916
52.31
1752.3118
52.31
1952.3720
52.37
2152.4022
52.44
2352.4424
52.50
2552.5126
52.53
2752.5328
52.55
2952.6030
52.62
3152.6332
52.67
3352.6734
52.69
3552.7036
52.72
3752.7638
52.77
3952.7740
52.78
4152.7842
52.79
4352.8644
52.92
4552.9446
52.96
4752.9748
53.01
4953.0550
53.07
5153.1152
53.12
5353.1354
53.15
5553.1856
53.22
5753.2458
53.25
5953.3060
53.33
6153.3662
53.38
6353.3964
53.40
6553.4366
53.44
6753.4568
53.46
6953.5070
53.55
71
53.55
7253.5873
53.64
7453.6975
53.70
7653.7877
53.82
7853.8479
53.88
8053.9081
53.92
8253.9983
54.01
8454.0385
54.10
8654.1587
54.21
8854.2289
54.25
9054.3691
54.37
9254.5293
54.53
9454.5695
54.72
9654.7997
55.05
98
55.17
9955.19100
Note that repeated values or ties are ranked sequentially as usual.
The first (smallest) value is 50.69 so its rank is 1, the next value is 51.03 so its rank is 2.
The last (largest) value is 55.19 so its rank is 100.
3. Calculate the cumulative probability (pi) associated with each rank (i) using the following formula:
pi=(i-a)/(n+1-2a)
Where:
i = 1,2,3,…..n. n is the number of data points.
a = 3/8 for n ≤ 10, and = 0.5 for n > 10.
Since the number of data points = 100 which is larger than 10, so the formula reduces to:
pi=(i-0.5)/n
The following table will be produced:
weight
rankpi50.6910.005
51.03
20.01551.313
0.025
51.45
40.03551.73
50.04551.736
0.055
51.78
70.06551.8680.075
51.88
90.08551.93100.095
51.93110.105
51.97
120.11551.97130.125
51.98
140.13552.27150.145
52.29
160.15552.31170.165
52.31
180.17552.31190.185
52.37
200.19552.37210.205
52.40
220.21552.44230.225
52.44
240.23552.50250.245
52.51
260.25552.53270.265
52.53
280.27552.55290.285
52.60
300.29552.62310.305
52.63
320.31552.67330.325
52.67
340.33552.69350.345
52.70
360.35552.72370.365
52.76
380.37552.77390.385
52.77
400.39552.78410.405
52.78
420.415
52.79
430.42552.86440.435
52.92
450.44552.94460.455
52.96
470.465
52.97
480.47553.01490.485
53.05
500.49553.07510.505
53.11
520.51553.12530.525
53.13
540.53553.15550.545
53.18
560.55553.22570.565
53.24
580.575
53.25
590.58553.30600.595
53.33
610.60553.36620.615
53.38
63
0.62553.39640.635
53.40
650.64553.43660.655
53.44
670.66553.45680.675
53.46
690.68553.50700.695
53.55
710.70553.55720.715
53.58
730.72553.64740.735
53.69
750.74553.70
76
0.755
53.78
770.76553.82780.775
53.84
790.78553.88800.795
53.90
810.80553.92820.815
53.99
830.82554.01
840.835
54.03
85
0.84554.10860.855
54.15
870.86554.21880.875
54.22
890.88554.25900.895
54.36
910.90554.37920.915
54.52
93
0.925
54.53940.935
54.56
950.94554.7296
0.955
54.79970.965
55.05
980.97555.17990.985
55.19
1000.995
4. Calculate the Z-score for each pi value (zi). The function qnorm of the R programming language finds the Z-score that is associated with each pi or probability.
For example, when pi = 0.5, the Z-score = 0.
qnorm(0.5)
## [1] 0
This is because the Z-score is for a normal distribution with mean = 0 and standard deviation = 1.
We know from the normal distribution properties that when the data value equals the mean or 0, the probability of data points < 0 = the probability of data points > 0 = 0.5.
As a result, the Z-score values are negative for every data point that has an associated p less than 0.5 and positive for those that have a p greater than 0.5.
The following table will be produced.
weight
rankpizi50.6910.005-2.58
51.03
20.015-2.1751.3130.025-1.96
51.45
40.035-1.8151.7350.045-1.70
51.73
60.055-1.6051.7870.065-1.51
51.86
80.075-1.4451.8890.085-1.37
51.93
100.095-1.3151.93110.105-1.25
51.97
120.115-1.2051.97130.125-1.15
51.98
140.135-1.1052.27150.145-1.06
52.29
160.155-1.0252.31170.165-0.97
52.31
180.175-0.9352.31190.185-0.90
52.37
200.195-0.8652.37210.205-0.82
52.40
220.215-0.7952.44230.225-0.76
52.44
240.235-0.7252.50250.245-0.69
52.51
260.255-0.6652.53270.265-0.63
52.53
280.275-0.6052.55290.285-0.57
52.60
300.295-0.5452.62310.305-0.51
52.63
320.315-0.4852.67330.325-0.45
52.67
340.335-0.43
52.69
350.345-0.4052.70360.355-0.37
52.72
370.365-0.3552.76380.375-0.32
52.77
390.385-0.2952.77400.395-0.27
52.78
410.405-0.2452.78420.415-0.21
52.79
430.425-0.1952.86440.435-0.16
52.92
450.445-0.1452.94460.455-0.11
52.96
470.465-0.0952.97480.475-0.06
53.01
490.485-0.0453.05500.495-0.01
53.07
510.5050.0153.11520.5150.04
53.12
530.5250.0653.13540.5350.09
53.15
550.5450.1153.18560.5550.14
53.22
570.5650.1653.24580.5750.19
53.25
590.5850.2153.30600.5950.24
53.33
610.6050.2753.36620.6150.29
53.38
630.6250.3253.39640.6350.35
53.40
650.6450.3753.43660.6550.40
53.44
670.6650.4353.45680.6750.45
53.46
690.6850.4853.50700.6950.51
53.55
710.7050.5453.55720.7150.57
53.58
730.7250.6053.64740.7350.63
53.69
750.7450.6653.70760.7550.69
53.78
770.7650.7253.82780.7750.76
53.84
790.7850.7953.88800.7950.82
53.90
810.8050.8653.92820.8150.90
53.99
830.8250.9354.01840.8350.97
54.03
850.8451.0254.10860.8551.06
54.15
870.8651.1054.21880.8751.15
54.22
890.8851.2054.25900.8951.25
54.36
910.9051.3154.37920.9151.37
54.52
930.9251.44
54.53
940.9351.5154.56950.9451.60
54.72
960.9551.7054.79970.9651.81
55.05
980.9751.9655.17990.9852.17
55.19
1000.9952.58
5. Create an x-y scatter plot of your z-score values on the x-axis versus their corresponding data points on the y-axis.
6. If the weight data are consistent with the normal percentiles from a normal distribution, the points should lie close to a straight line.
As a reference, a straight line can be added to the plot which passes through the first and third quartiles.
From the table, we see that the first quartile (at pi = 0.25) was about 52.50 kg and zi = -0.69 and third quartile (at pi = 0.75) was 53.69 kg and zi = 0.66.
The further the points vary from this line, the greater the indication of departure from normality.
Nearly all the data are on the straight line, so it is normally distributed data.
– Example 2
The following is the ankle diameter in centimeters, measured as the sum of two ankles for 60 physically active individuals from a certain survey.
14.1 15.1 14.1 15.0 14.9 13.9 15.6 14.6 13.2 15.0 14.5 16.0 15.4 13.2 14.0 14.0 16.0 14.7 14.8 15.5 13.9 14.4 13.8 14.1 14.7 14.9 15.3 14.5 13.2 13.2 15.8 14.0 15.1 15.0 12.9 14.0 13.0 14.0 15.4 16.4 15.2 13.8 14.9 16.0 16.0 16.3 15.3 16.5 14.4 13.4 14.4 14.2 15.4 15.0 13.0 13.0 14.8 16.2 15.4 14.4.
Reference:
Heinz G, Peterson LJ, Johnson RW, Kerk CJ. 2003. Exploring Relationships in Body Dimensions. Journal of Statistics Education 11(2).
Draw a normal probability plot of this data.
1. Order the numbers from smallest to largest number.
12.9 13.0 13.0 13.0 13.2 13.2 13.2 13.2 13.4 13.8 13.8 13.9 13.9 14.0 14.0 14.0 14.0 14.0 14.1 14.1 14.1 14.2 14.4 14.4 14.4 14.4 14.5 14.5 14.6 14.7 14.7 14.8 14.8 14.9 14.9 14.9 15.0 15.0 15.0 15.0 15.1 15.1 15.2 15.3 15.3 15.4 15.4 15.4 15.4 15.5 15.6 15.8 16.0 16.0 16.0 16.0 16.2 16.3 16.4 16.5.
2. Assign a rank to each value of your data.
diameter
rank12.91
13.0
213.03
13.0
413.25
13.2
6
13.2
713.28
13.4
913.810
13.8
1113.912
13.9
1314.014
14.0
1514.016
14.0
17
14.0
1814.119
14.1
2014.121
14.2
22
14.4
2314.424
14.4
25
14.4
2614.527
14.5
2814.629
14.7
30
14.7
3114.832
14.8
33
14.9
3414.935
14.9
3615.037
15.0
38
15.0
39
15.0
4015.141
15.1
42
15.2
4315.344
15.3
4515.446
15.4
4715.448
15.4
4915.550
15.6
5115.852
16.0
5316.054
16.0
5516.056
16.2
57
16.3
5816.459
16.5
60
Note that repeated values or ties are ranked sequentially as usual.
The first (smallest) value is 12.9 cm so its rank is 1, the next value is 13.0 cm so its rank is 2.
The last (largest) value is 16.5 so its rank is 60.
3. Calculate the cumulative probability (pi) associated with each rank (I).
Since the number of data points = 60 which is larger than 10, so the formula reduces to:
pi=(i-0.5)/n
The following table will be produced:
diameter
rankpi12.910.008
13.0
20.02513.030.042
13.0
40.05813.250.075
13.2
60.09213.270.108
13.2
80.12513.490.142
13.8
100.15813.8110.175
13.9
120.19213.9130.208
14.0
140.22514.0150.242
14.0
160.25814.0170.275
14.0
180.29214.1190.308
14.1
200.32514.1210.342
14.2
220.35814.4230.375
14.4
240.39214.4250.408
14.4
260.42514.5270.442
14.5
280.45814.6290.475
14.7
300.49214.7310.508
14.8
320.52514.8330.542
14.9
340.55814.9350.575
14.9
360.592
15.0
370.60815.0380.625
15.0
390.64215.0400.658
15.1
410.67515.1420.692
15.2
430.70815.344
0.725
15.3
450.74215.4460.758
15.4
470.77515.4480.792
15.4
490.80815.5500.825
15.6
510.84215.8520.858
16.0
530.87516.0540.892
16.0
550.90816.0560.925
16.2
570.94216.3580.958
16.4
590.97516.5600.992
4. Calculate the Z-score for each pi value using the function qnorm of the R programming language.
diameter
rankpizi12.910.008-2.41
13.0
20.025-1.9613.030.042-1.73
13.0
40.058-1.5713.250.075-1.44
13.2
60.092-1.3313.270.108-1.24
13.2
80.125-1.1513.490.142-1.07
13.8
100.158-1.0013.8110.175-0.93
13.9
120.192-0.87
13.9
130.208-0.8114.0140.225-0.76
14.0
150.242-0.7014.0160.258-0.65
14.0
170.275-0.60
14.0
180.292-0.5514.1190.308-0.50
14.1
200.325-0.4514.1210.342-0.41
14.2
220.358-0.3614.4230.375-0.32
14.4
240.392-0.2714.4250.408-0.23
14.4
260.425-0.1914.5270.442-0.15
14.5
280.458-0.11
14.6
290.475-0.0614.7300.492-0.02
14.7
310.5080.0214.8320.5250.06
14.8
330.5420.1114.9340.5580.15
14.9
350.5750.1914.9360.5920.23
15.0
370.6080.2715.0380.6250.32
15.0
390.6420.3615.0400.6580.41
15.1
410.6750.4515.1420.6920.50
15.2
430.7080.5515.3440.7250.60
15.3
450.7420.6515.4460.7580.70
15.4
470.7750.7615.4480.7920.81
15.4
490.8080.8715.5500.8250.93
15.6
510.8421.0015.8520.8581.07
16.0
530.8751.1516.0540.8921.24
16.0
550.9081.3316.0560.9251.44
16.2
570.9421.5716.3580.9581.73
16.4
590.9751.9616.5600.9922.41
5. Create an x-y scatter plot of your z-score values on the x-axis versus their corresponding data points on the y-axis.
6. If the diameter data are consistent with the normal percentiles from a normal distribution, the points should lie close to a straight line.
As a reference, a straight line is plotted which passes through the first and third quartiles.
From the table, we see that the first quartile (at pi = 0.25) was about 14.0 cm and zi = -0.65 and third quartile (at pi = 0.75) was 15.4 cm and zi = 0.70.
Nearly all the data are on the straight line, so it is normally distributed data.
3. How to read a normal probability plot?
The shape of a normal probability plot can tell you the distribution of your data.
– Example 1: normally-distributed variable
The following plot is the histogram and normal probability plot for heights in cm of 100 individuals.
When the data is normally distributed, the histogram is nearly symmetric, unimodal, and bell-shaped.
The normal probability plot of normally distributed data will show nearly all the points on the reference straight line, at least when the few large and small values are ignored.
– Example 2: normally-distributed variable with one outlier
The following plot is the histogram and normal probability plot for heights in cm of 100 individuals.
The histogram of the data will be the same except for a faraway bin for the outlier.
The normal probability plot will show that nearly all the points are near the straight line except the far away outlier point.
– Example 3: Right-skewed variable
The following plot is the histogram and normal probability plot for the Annual income of 100 individuals.
The histogram of right-skewed data looks unimodal with less frequent large values.
The normal probability plot of right-skewed data has an inverted C shape.
– Example 4: Left-skewed variable
The following plot is the histogram and normal probability plot for the Physical ability Lawyers’ ratings of state judges in the US Superior Court.
The histogram of left-skewed data l