Z-Score – Explanation & Examples

The definition of the Z-score is:

“The Z-score is the number of standard deviations by which an observed value is above or below the mean value.”

In this topic, we will discuss the Z-score from the following aspects:

  1. What is Z-score?
  2. Z-score formula.
  3. Z-score properties.
  4. How to calculate Z-score?
  5. The role of Z-score.
  6. Practice questions.
  7. Answer key.

1. What is Z-score?

The Z-score (standard score) is the number of standard deviations by which an observed value is above or below the mean value.

The Z-score is positive if the value lies above (greater than) the mean, and negative if the value lies below (smaller than) the mean.

For example, if the Z-score for an individual height is +1. This means that his height is 1 standard deviation above the mean height of his population.

On the other hand, if the Z-score for the same individual weight is -1. This means that his weight is 1 standard deviation below the mean weight of his population.

The Z-score can be 0 if the observed value exactly equals the mean.

The Z-score is used when the distribution of data, plotted as a histogram, nearly follows a normal distribution curve (a bell-shaped symmetrical curve centered around the mean).

– Example 1

The following are the histograms of heights, physical activity, and mental component summary from a certain population.

The mean value was plotted as a red dashed vertical line for each data.

We see that:

  • The histogram of height nearly follows a normal distribution curve (a bell-shaped symmetrical curve centered around the mean).
  • The histogram of the mental component shows a left-skewed distribution (low frequent small values).
  • The histogram of physical activity shows a right-skewed distribution (low frequent large values).

The Z-score can be applied to an individual’s height but cannot be applied to an individual’s mental component or physical activity.

However, there are different normal distributions with different means and standard deviations.

– Example 2

The following are the histograms of heights and weights from a certain population.

The mean value was plotted as a red dashed vertical line for each data.

We see that:

  • The histogram of heights and weights nearly follows a normal distribution curve (a bell-shaped curve).
  • However, the mean value for heights was at about 165 (cm), while the mean value for weights was about 75 (kg).

– Example 3

In the above example, the mean height was 163 cm and standard deviation = 9.22 cm, while the mean weight was 73.4 kg and the standard deviation = 13.7 kg.

Assuming that heights and weights from this population follow the normal distribution, we can plot the normal distribution curves for heights and weights as follows:

We see that:

  • Each normal distribution curve is bell-shaped, peaked, and symmetric about its mean.
  • When the standard deviation increases as for weights, the curve flattens away.

The Z-score converts all different normal distributions to a standard normal distribution with mean = 0 and standard deviation = 1.

We see that:

  • The two curves are superimposed over each other.
  • Both heights and weights are now with a mean = 0 and standard deviation = 1.
  • The Z-score allows the comparison of values (as heights and weights) from different normal distributions by standardizing their distribution.

2. Z-score formula

The Z-score formula is:

Z=(x-μ)/σ

where:

x is the data point.

μ is the population mean.

σ is the population standard deviation.

When the population mean and the population standard deviation are unknown, the Z-score can be calculated using the sample mean (¯x) and sample standard deviation (s) as estimates of the population values.

3. Z-score properties

As the Z-score forms a standard normal distribution with mean = 0 and standard deviation = 1, so it follows the properties of the normal distribution as the 68-95-99.7% rule.

The following is the normal distribution curve for any Z-score:

The important properties of normal distribution, that the Z-score follows, are:

  • 68% of the data are within 1 standard deviation from the mean.

This means that 68% of the population has Z-score between +1 and -1. In other words, the probability of data from this population to lie between +1 and -1 Z-score is 68%.

As the normal distribution is symmetric around its mean, so 34% (68%/2) of this population have Z-score between 0 (mean) and +1 and 34% of this population have Z-score between -1 and 0.

If we shade the area within 1 standard deviation from the mean or between -1 and +1.

Without doing integration for this green AUC, the green shaded area represents 68 % of the total area, or the data within -1 and +1 Z-score represents 68% of the total data.

  • 95% of the data are within 2 standard deviations from the mean.

This means that 95% of the population has Z-score between +2 and -2. In other words, the probability of data from this population to lie between +2 and -2 Z-score is 95%.

As the normal distribution is symmetric around its mean, so 47.5% (95%/2) of this population have Z-score between 0 (mean) and +2 and 47.5% of this population have Z-score between -2 and 0.

If we shade the area within 2 standard deviations from the mean or between -2 and +2.

Without doing integration for this red AUC, the red shaded area represents 95% of the total area, or the data within -2 and +2 Z-score represents 95% of the total data.

  • 99.7% of the data are within 3 standard deviations from the mean.

This means that 99.7% of the population has Z-score between +3 and -3. In other words, the probability of data from this population to lie between +3 and -3 Z-score is 99.7%.

As the normal distribution is symmetric around its mean, so 49.85% (99.7%/2) of this population have Z-score between 0 (mean) and +3 and 49.85% of this population have Z-score between -3 and 0.

If we shade the area within 3 standard deviations from the mean or between -3 and +3.

Without doing integration for this blue AUC, the blue shaded area represents 99.7% of the total area, or the data within -3 and +3 Z-score represents 99.7% of the total data.

  • The proportion (probability) of data that are larger than the mean = probability of data that are less than the mean = 0.50 or 50%.

This means that 50% of the population has Z-score more than 0 and the other half has Z-score smaller than 0.

In other words, the probability of data from this population to be more than 0 Z-score = the probability of data from this population to be less than 0 Z-score = 50%.

This is plotted as follows: