Sampling Distribution – Explanation & Examples

The definition of a sampling distribution is:

“The sampling distribution is a probability distribution of a statistic obtained from a larger number of samples with the same size and randomly drawn from a specific population.”

In this topic, we will discuss the sampling distribution from the following aspects:

  1. What is the sampling distribution?
  2. Sampling distribution formula for the mean.
  3. How to calculate the sampling distribution for the mean?
  4. Sampling distribution formula for proportion.
  5. How to calculate the sampling distribution for proportion?
  6. Practice questions.
  7. Answer key.

1. What is the sampling distribution?

The sampling distribution is a theoretical distribution, that we cannot observe, that describes all the possible values of a sample statistic (like mean or proportion) from random samples of the same size that are taken from the same population.

In real-life research, only one sample is taken with a certain size from a specific population. This sample is one of many possible samples that we may get by chance.

There are many types of sample statistics that we can estimate from our samples:

  • The sample mean for continuous variables.
  • The sample proportion for categorical variables.
  • The sample mean difference for comparing 2 continuous variables.
  • The sample proportion difference for comparing 2 categorical variables.

These sample statistics vary across different samples of the same size. This variability in sample statistics is called the standard error (SE) and is different from the variability of individual values in any single sample, which is called the standard deviation (s).

– Example of the sampling distribution for the sample mean

We have population data for individual body mass index (bmi). We know that the population mean for these body mass indices is 29.97.

The distribution of bmi in this population is normal or bell-shaped as we see from the histogram below.


The x-axis is the individual bmi values and the histogram has a normally distributed shape that is symmetric around the population mean (plotted as a vertical dashed line).

Using a computer program, we will take 1000 random samples from this population data, each of size 30, 100, or 200, calculate the sample mean for each sample, and plot the samples’ means as histograms to see their (sampling) distribution.

We see that:

  • The x-axis is the mean value from each sample.
  • We have 3 histograms, one for the sample means based on 30 sample size (means_30), one for the sample means based on 100 sample size (means_100), and the last one for the sample means based on 200 sample size (means_200).
  • The (sampling) distribution of sample means is normally distributed (bell-shaped) for all sample sizes (30, 100, and 200), and centered around the population mean which is plotted as a black dashed line.
  • The variability of the sampling distribution for the sample means decreases with increasing the sample size.

The following table lists the mean and standard deviation (or standard error) of each 1000 sample means:

means

mean

SE

means_30

29.95

0.69

means_100

29.96

0.37

means_200

29.98

0.26

We see that:

  • The mean of each 1000 sample means based on size 30, 100, or 200 is nearly equal to the true population mean (29.97).
  • The standard deviation (or standard error SE) of the 1000 sample means decreases with increasing the sample size.

– Example of the sampling distribution for sample means from skewed data

We have population data for individual physical activity (Kcal/week). We know that the population mean for these physical activities is 398.83 Kcal/week.

The distribution of physical activity in this population is right-skewed as we see from the histogram below.

The x-axis is the individual physical activity values and the histogram has a right-skewed shape with low frequent large values.

The histogram is not symmetric around the population mean (plotted as a vertical dashed line).

Using a computer program, we will take 1000 random samples from this population data, each of size 30, 100, or 200, calculate the sample mean for each sample, and plot the samples’ means as histograms to see their (sampling) distribution.

We see that:

  • The x-axis is the mean value from each sample.
  • We have 3 histograms, one for the sample means based on 30 sample size (means_30), one for the sample means based on 100 sample size (means_100), and the last one for the sample means based on 200 sample size (means_200).
  • The (sampling) distribution of sample means is normally distributed (bell-shaped) for all sample sizes (30, 100, and 200), and centered around the population mean which is plotted as a black dashed line.
  • The variability of the sampling distribution for the sample means decreases with increasing the sample size.

The following table lists the mean and standard deviation (or standard error) of each 1000 sample means:

means

mean

SE

means_30

400.16

74.00

means_100

400.67

37.83

means_200

399.00

24.81

We see that:

  • The mean of each 1000 sample means based on size 30, 100, or 200 is nearly equal to the true population mean (398.83).
  • The standard deviation (or standard error SE) of the 1000 sample means decreases with increasing the sample size.

– Example of the sampling distribution for sample proportions

We have population data for individual ethnicities. We know that the true population proportion for White persons is 0.763 or 76.3%.

We can see the percentage of White and non-White individuals from the following bar plot.

We see that the percentage of White individuals is 76.3% and the percentage of Other individuals is 23.7%.

Using a computer program, we will take 1000 random samples from this population data, each of size 50, 100, or 200, calculate the White proportion from each sample, and plot the different sample proportions as histograms to see their sampling distribution.

We see that:

  • The x-axis is the proportion value from each sample.
  • We have 3 histograms, one for the sample proportions based on 50 sample size (proportions_50), one for the sample proportions based on 100 sample size (proportions_100), and the last one for the sample proportions based on 200 sample size (proportions_200).
  • The (sampling) distribution of sample proportions is normally distributed (bell-shaped) for all sample sizes (50, 100, and 200), and centered around the population proportion which is plotted as a black dashed line.
  • The variability of the sampling distribution for the sample proportions decreases with increasing the sample size.

The following table lists the mean and standard deviation (or standard error) of each 1000 sample proportions:

proportions

mean

SE

proportions_50

0.765

0.058

proportions_100

0.762

0.044

proportions_200

0.763

0.030

We see that:

  • The mean of each 1000 sample proportions based on size 50, 100, or 200 is nearly equal to the true population proportion (0.763).
  • The standard deviation (or standard error SE) of the 1000 sample proportions decreases with increasing the sample size.

The reason for the decrease in the variability of the distribution with increasing the sample size is that the sample estimates (means or proportions) will be less affected by sample data (individual observations) with increasing the sample size.

– Sampling distribution formula for the mean

For a large sample of size n ≥ 30 independent observations, the sampling distribution of the sample mean ¯x will be nearly normal with:

μ_¯x=μ

and

SE=σ/√n

Where:

μ_¯x is the mean of the sample means with the same size (n).

μ is the population mean.

SE is the standard error or the variability in the sample means.

σ is the population standard deviation. It can be replaced by the sample standard deviation (s) when the sample size is ≥ 30.

2. How to calculate the sampling distribution for the mean?

We use the rules of the normal distribution to define the sampling distribution for a sample mean.

For any normal distribution, 95% of the data are within 1.96 standard deviations from the mean and 99% of the data are within 2.58 standard deviations from the mean.

We follow these steps:

1. Check for the needed sam