 # The Poisson Distribution – Explanation & Examples

The definition of the Poisson distribution is:

“The Poisson distribution is a discrete probability distribution that describes the probability of the number of events occurring in a fixed interval.”

In this topic, we will discuss the Poisson distribution from the following aspects:

• What is a Poisson distribution?
• When to use Poisson distribution?
• Poisson distribution formula.
• How to do the Poisson distribution?
• Practice questions.

## What is a Poisson distribution?

The Poisson distribution is a discrete probability distribution that describes the probability of the number of events (discrete random variable) from a random process in a fixed interval.

Discrete random variables take a countable number of integer values and cannot take decimal values. Discrete random variables are usually counts.

The fixed interval can be:

• Time as the number of calls received per hour in a call center or the number of goals per football match.
• Distance as the number of mutations on a strand of DNA per unit length.
• Area as the number of bacteria found per unit area of an agar plate.
• Volume as the number of bacteria found per milliliter of a liquid.

The Poisson distribution is named after the French mathematician Siméon Denis Poisson.

## When to use Poisson distribution?

You can apply the Poisson distribution to random processes with a large number of possible events, each of which is rare.

However, the average rate (the average number of events per interval) can be any number and does not always have to be small.

For the Poisson distribution to describe a random process, it must be:

1. The number of events occurring in an interval can take values 0, 1, 2, ….etc. No decimal numbers are allowed because it is a discrete distribution or a count distribution.
2. The occurrence of one event does not affect the probability that a second event will occur. That is, events occur independently.
3. The average rate (the average number of events per interval) is constant and does not change based on time.
4. Two events cannot occur at the same time. It means that at each sub-interval, either an event occurs or not.

### – Example 1

Data from a certain call center shows a historical average of 10 calls received per hour. What is the probability of receiving 0, 10, 20, or 30 per hour in this center?

We can use the Poisson distribution to describe this process because:

1. The number of calls per hour can take values 0, 1, 2, ….etc. No decimal numbers can occur.
2. The occurrence of one event does not affect the probability that a second event will occur. There is no reason to expect a caller to affect the chances of another person calling, and so the events occur independently.
3. We may assume the average rate (the number of calls per hour) to be constant.
4. Two calls cannot occur at the same time. It means that at each sub-interval, like second or minute, either a call occurs or not.

This process is not a perfect fit for the Poisson distribution. For example, the average rate of calls per hour may decrease in the night hours.

Practically speaking, the process (the number of calls per hour) is close to the Poisson distribution and can be used to describe the process’s behavior.

Using the Poisson distribution can help us to calculate the probability of 0,10,20 or 30 calls per hour: The probability of zero calls per hour = 0%.

The probability of 10 calls per hour = 0.125 or 12.5%.

The probability of 20 calls per hour = 0.002 or 0.2%.

The probability of 30 calls per hour = 0%.

We see that 10 calls have the highest probability, and as we move away from 10, the probability fades away.

We can connect the points to draw a curve: The average rate of 10 calls per hour has the highest probability (curve peak). As we move away from 10, the probability fades away.

The average rate (the average number of events per interval) can take a decimal value. In that case, the number of events with the highest probability will be the nearest integer to the average rate, as we will see in the following example.

### – Example 2

Data from the maternity ward in a certain hospital shows 2372 babies born in this hospital in the last year. The average per day = 2372/365 = 6.5.

What is the probability that 10 babies will be born in this hospital tomorrow?

How many days of the next year that 10 babies per day will be born in this hospital?

The number of babies born per day in this hospital can be described using the Poisson distribution because:

1. The number of babies born per day can take values 0, 1, 2, ….etc. No decimal numbers can occur.
2. The occurrence of one event does not affect the probability that a second event will occur. We do not expect that a newborn baby will affect another baby’s chances to be born in that hospital unless the hospital is full, so the events occur independently.
3. The average rate (the number of babies born per day) may be assumed to be constant.
4. Two babies cannot be born at the same time. It means that either a baby is born or not at each sub-interval, like second or minute.

The number of babies born per day is close to the Poisson distribution. We can use the Poisson distribution to describe the process’s behavior.

The Poisson distribution can help us to calculate the probability of 10 babies born per day: The probability of 10 babies born per day = 0.056 or 5.6 %.

We see that 6 babies have the highest probability.

When the number of babies is larger than 16, the probability is very small and can be considered zero.

We can connect the points to draw a curve: The 6 babies per day have the highest probability (curve peak), and as we move away from 6, the probability fades away.

1. To know the number of days in the next year, this hospital will expect a different number of births.

We construct a table with each outcome (number of babies) and its probability.
babies probability

 babies probability 0 0.002 1 0.010 2 0.032 3 0.069 4 0.112 5 0.145 6 0.157 7 0.146 8 0.119 9 0.086 10 0.056 11 0.033 12 0.018 13 0.009 14 0.004 15 0.002 16 0.001 17 0.000 18 0.000 19 0.000 20 0.000

2. Add another column for the expected days. Fill that column by multiplying each probability value by the number of days in a year (365).

 babies probability days 0 0.002 0.730 1 0.010 3.650 2 0.032 11.680 3 0.069 25.185 4 0.112 40.880 5 0.145 52.925 6 0.157 57.305 7 0.146 53.290 8 0.119 43.435 9 0.086 31.390 10 0.056 20.440 11 0.033 12.045 12 0.018 6.570 13 0.009 3.285 14 0.004 1.460 15 0.002 0.730 16 0.001 0.365 17 0.000 0.000 18 0.000 0.000 19 0.000 0.000 20 0.000 0.000

We expect that about 20 days out of the total 365 days of the next year, this hospital will deliver 10 births per day.

### – Example 3

The average number of goals in a World Cup soccer match is approximately 2.5.

The number of goals per football match can be described using the Poisson distribution because:

1. The number of goals per football match can take values 0, 1, 2, ….etc. No decimal numbers can occur.
2. The occurrence of one event (goal) does not affect the probability that a second event will occur, and so the events occur independently.
3. The average rate (the number of goals per match) may be assumed to be constant.
4. Two goals cannot occur at the same time. It means that at each sub-interval of the match, like second or minute, either a goal occurs or not.

The number of goals per match is close to the Poisson distribution. We can use the Poisson distribution to describe the process’s behavior.

The Poisson distribution can help us to calculate the probability of each number of goals in a football match: We see that 2 goals per match have the highest probability = 0.257 or 25.7%.
Examples of 2 goals per match are a score of 2-0 or 1-1.

When the number of goals is larger than 9, the probability is very small and can be considered zero.

We can connect the points to draw a curve: The 2 goals per match have the highest probability (curve peak), and as we move away from 2, the probability fades away.

64 matches are played in World Cup soccer. We can use the Poisson distribution to calculate the number of matches that will likely contain the different number of goals:

1. We construct a table with each outcome (number of goals) and its probability.
goals probability

 goals probability 0 0.082 1 0.205 2 0.257 3 0.214 4 0.134 5 0.067 6 0.028 7 0.010 8 0.003 9 0.001 10 0.000

2. Add another column for the expected matches.

Fill that column by multiplying each probability value by the number of matches in World Cup soccer (64).

 goals probability matches 0 0.082 5.248 1 0.205 13.120 2 0.257 16.448 3 0.214 13.696 4 0.134 8.576 5 0.067 4.288 6 0.028 1.792 7 0.010 0.640 8 0.003 0.192 9 0.001 0.064 10 0.000 0.000

We are expecting:

About 6 matches will contain no goals.

About 13 matches will contain 1 goal.

About 16 matches will contain 2 goals.

About 13 matches will contain 3 goals, and so on.

3. We can add another column for the observed number of goals in the World Cup soccer of 2018 in Russia to see how closely the Poisson distribution predicts the number of goals:

 goals probability matches matches 2018 0 0.082 5.248 1 1 0.205 13.120 15 2 0.257 16.448 17 3 0.214 13.696 19 4 0.134 8.576 5 5 0.067 4.288 2 6 0.028 1.792 2 7 0.010 0.640 3 8 0.003 0.192 0 9 0.001 0.064 0 10 0.000 0.000 0

We see that the expected number of matches found by Poisson distribution is near the observed number of matches having these goals.

The Poisson distribution is good at describing this process behavior. Similarly, you can use it to predict the number of goals per match in the next World Cup of 2022.

## Poisson distribution formula

If the random variable X follows the Poisson distribution with λ average number of events per fixed interval, the probability of getting exactly k events in this fixed interval is given by:

f(k,λ)=”P(k events in the interval)”=(λ^k.e^(-λ))/k!

where:

f(k,λ) is the probability of k events per fixed interval.

λ is the average number of events per fixed interval.

e is a mathematical constant approximately equal to 2.71828.

k! is the factorial of k and equals to k X (k-1) X (k-2) X….X1.

## How to do the Poisson distribution?

To calculate the Poisson distribution for the number of events in a fixed interval, we only need the average number of events in a fixed interval.

### – Example 1

Data from a certain call center shows a historical average of 10 calls received per hour. Assuming that this process follows the Poisson distribution, what is the probability that the call center will receive 0,10,20, or 30 calls per hour?

1. Construct a table for the different number of events:

 calls 0 10 20 30

2. Add another column named “average^calls” for the λ^k term. λ is the average events number = 10 and k = 0,10,20,30.

 calls average^calls 0 1e+00 10 1e+10 20 1e+20 30 1e+30

The first value is 10^0 = 1.

The second value is 10^10 = 1 X 10^10 = 1e+10 in a scientific notation.

The third value is 10^20 = 1 X 10^20 = 1e+20 in a scientific notation.

The fourth value is 10^30 = 1 X 10^30 = 1e+30 in a scientific notation.

3. Add another column named “multiplied average^calls” for the multiplication of average^calls by e^(-λ) = 2.71828^-10.

 calls average^calls multiplied average^calls 0 1e+00 4.540024e-05 10 1e+10 4.540024e+05 20 1e+20 4.540024e+15 30 1e+30 4.540024e+25

4. Add another column named “probability” by dividing each value of the “multiplied average^calls” by factorial calls.

For 0 calls, the factorial = 1.

For 10 calls, the factorial = 10X9X8X7X6X5X4X3X2X1 = 3628800.

For 20 calls, the factorial = 20X19X18X17X16X15X14X13X12X11X10X9X8X7X6X5X4X3X2X1 = 2.432902e+18, and so on.

 calls average^calls multiplied average^calls probability 0 1e+00 4.540024e-05 0.00005 10 1e+10 4.540024e+05 0.12511 20 1e+20 4.540024e+15 0.00187 30 1e+30 4.540024e+25 0.00000

5. With similar calculations, we can calculate the probability of the different number of calls per hour, from 0 to 30, as we see in the following table and plot:

 calls probability 0 0.00005 1 0.00045 2 0.00227 3 0.00757 4 0.01892 5 0.03783 6 0.06306 7 0.09008 8 0.11260 9 0.12511 10 0.12511 11 0.11374 12 0.09478 13 0.07291 14 0.05208 15 0.03472 16 0.02170 17 0.01276 18 0.00709 19 0.00373 20 0.00187 21 0.00089 22 0.00040 23 0.00018 24 0.00007 25 0.00003 26 0.00001 27 0.00000 28 0.00000 29 0.00000 30 0.00000 The probability of zero calls per hour = 0.00005 or 0.005%.

The probability of 10 calls per hour = 0.12511 or 12.511%.

The probability of 20 calls per hour = 0.00187 or 0.187%.

The probability of 30 calls per hour = 0%.

We see that 10 calls have the highest probability, and as we move away from 10, the probability fades away.

We can connect the points to draw a curve: We can use these probabilities to calculate how many hours per day are expected to receive these calls.

We multiply each probability by 24 as the day contains 24 hours.

 calls probability hours/day 0 0.00005 0.00 1 0.00045 0.01 2 0.00227 0.05 3 0.00757 0.18 4 0.01892 0.45 5 0.03783 0.91 6 0.06306 1.51 7 0.09008 2.16 8 0.11260 2.70 9 0.12511 3.00 10 0.12511 3.00 11 0.11374 2.73 12 0.09478 2.27 13 0.07291 1.75 14 0.05208 1.25 15 0.03472 0.83 16 0.02170 0.52 17 0.01276 0.31 18 0.00709 0.17 19 0.00373 0.09 20 0.00187 0.04 21 0.00089 0.02 22 0.00040 0.01 23 0.00018 0.00 24 0.00007 0.00 25 0.00003 0.00 26 0.00001 0.00 27 0.00000 0.00 28 0.00000 0.00 29 0.00000 0.00 30 0.00000 0.00 We are expecting 3 hours of the day to contain 10 calls per hour.

### – Example 2

In the following table and plot, we will use the Poisson distribution to calculate the probability of the different number of calls per hour from 0 to 30 if the average calls were 2 calls/hour, 10 calls/hour, or 20 calls/hour:

 calls 10 calls/hour 2 calls/hour 20 calls/hour 0 0.00005 0.13534 0.00000 1 0.00045 0.27067 0.00000 2 0.00227 0.27067 0.00000 3 0.00757 0.18045 0.00000 4 0.01892 0.09022 0.00001 5 0.03783 0.03609 0.00005 6 0.06306 0.01203 0.00018 7 0.09008 0.00344 0.00052 8 0.11260 0.00086 0.00131 9 0.12511 0.00019 0.00291 10 0.12511 0.00004 0.00582 11 0.11374 0.00001 0.01058 12 0.09478 0.00000 0.01763 13 0.07291 0.00000 0.02712 14 0.05208 0.00000 0.03874 15 0.03472 0.00000 0.05165 16 0.02170 0.00000 0.06456 17 0.01276 0.00000 0.07595 18 0.00709 0.00000 0.08439 19 0.00373 0.00000 0.08884 20 0.00187 0.00000 0.08884 21 0.00089 0.00000 0.08461 22 0.00040 0.00000 0.07691 23 0.00018 0.00000 0.06688 24 0.00007 0.00000 0.05573 25 0.00003 0.00000 0.04459 26 0.00001 0.00000 0.03430 27 0.00000 0.00000 0.02541 28 0.00000 0.00000 0.01815 29 0.00000 0.00000 0.01252 30 0.00000 0.00000 0.00834 Every curve peak corresponds to the average value for that curve.

The curve for the average 2 calls/hour (green curve) has a peak at 2.

The curve for the average 10 calls/hour (red curve) has a peak at 10.

The curve for the average 20 calls/hour (blue curve) has a peak at 20.

We can use these probabilities to calculate how many hours per day are expected to receive these calls when the average is 2 calls/hour, 10 calls/hour, or 20 calls/hour.

We multiply each probability by 24 as the day contains 24 hours. For example:

• We expect 2 hours of the day to contain 4 calls per hour when the average is 2 calls/hour.
• We expect only half an hour (or 1 hour) of the day to contain 4 calls per hour when the average is 10 calls/hour.
• We are not expecting any hours of the day to contain 4 calls per hour when the average is 20 calls/hour.
• We are not expecting any hours of the day to contain 10 calls per hour when the average is 2 calls/hour.
• We expect 3 hours of the day to contain 10 calls per hour when the average is 10 calls/hour.
• We are not expecting any hours of the day to contain 10 calls per hour when the average is 20 calls/hour.

### – Example 3

When hit by cosmic rays for a week, the average mutation of cells is 2.1, while the average mutation of cells when hit by X-rays for a week is 1.4.

Assuming that this process follows the Poisson distribution, what is the probability that 0,1,2,3,4, or 5 cells will be mutated this week from either ray?

For cosmic rays:

1. Construct a table for the different number of events (mutated cells):

 Mutated cells 0 1 2 3 4 5

2. Add another column named “average^cells” for the λ^k term. λ is the average events number = 2.1 and k = 0,1,2,3,4,5.

 mutated.cells average^cells 0 1.00 1 2.10 2 4.41 3 9.26 4 19.45 5 40.84

The first value is 2.1^0 = 1.

The second value is 2.1^1 = 2.1.

The third value is 2.1^2 = 4.41, and so on.

3. Add another column named “multiplied average^cells” for the multiplication of average^cells by e^(-λ) = 2.71828^-2.1.

 mutated.cells average^cells multiplied average^cells 0 1.00 0.1224566 1 2.10 0.2571589 2 4.41 0.5400336 3 9.26 1.1339481 4 19.45 2.3817809 5 40.84 5.0011276

4. Add another column named “probability” by dividing each value of the “multiplied average^cells” by factorial cells.

For 0 cells, the factorial = 1.

For 1 cell, the factorial = 1.

For 2 cells, the factorial = 2X1 = 2.

For 3 cells, the factorial = 3X2X1 = 6, and so on.

 mutated.cells average^cells multiplied average^cells probability 0 1.00 0.1224566 0.12246 1 2.10 0.2571589 0.25716 2 4.41 0.5400336 0.27002 3 9.26 1.1339481 0.18899 4 19.45 2.3817809 0.09924 5 40.84 5.0011276 0.04168

5. We can plot the probabilities for the different number of mutated cells, from 0 to 5. The curve peak is at 2 mutated cells.

For X-rays:

1. Construct a table for the different number of events (mutated cells):

 mutated cells 0 1 2 3 4 5

2. Add another column named “average^cells” for the λ^k term. λ is the average events number = 1.4 and k = 0,1,2,3,4,5.

 mutated cells 0 1 2 3 4 5

The first value is 1.4^0 = 1.

The second value is 1.4^1 = 1.4.

The third value is 1.4^2 = 1.96, and so on.

3. Add another column named “multiplied average^cells” for the multiplication of average^cells by e^(-λ) = 2.71828^-1.4.

 mutated.cells average^cells multiplied average^cells 0 1.00 0.2465972 1 1.40 0.3452361 2 1.96 0.4833305 3 2.74 0.6756763 4 3.84 0.9469332 5 5.38 1.3266929

4. Add another column named “probability” by dividing each value of the “multiplied average^cells” by factorial cells.

For 0 cells, the factorial = 1.

For 1 cell, the factorial = 1.

For 2 cells, the factorial = 2X1 = 2.

For 3 cells, the factorial = 3X2X1 = 6, and so on.

 mutated.cells average^cells multiplied average^cells probability 0 1.00 0.2465972 0.24660 1 1.40 0.3452361 0.34524 2 1.96 0.4833305 0.24167 3 2.74 0.6756763 0.11261 4 3.84 0.9469332 0.03946 5 5.38 1.3266929 0.01106

5. We can plot the probabilities for the different number of mutated cells, from 0 to 5. The curve peak is at 1 mutated cell.

### Practice questions

1. In the following plots, we show the probability of the different number of mutated cells when we subject them to different types of rays for a week.

Which are the most dangerous rays? 2. In the following plots, we show the probability of the different number of rejected tablets per hour from 3 different machines.

Which is the best machine? 3. The bacterial count average for a certain product is 10 CFU/ml (colony-forming unit/ml). Assuming that the Poisson distribution conditions are met, what is the probability of finding less than 10 CFU/ml?

4. William Feller (1968) modeled Nazi bombing raids on London during World War II using a Poisson distribution. The city was divided into 576 small areas of 1/4 km squared. There were a total of 537 bomb hits, so the average number of hits per area was 537/576 = 0.9323.

How many areas do we expect to be hit by 1 or 2 bombs?

5. The average count of Zanthoxylum panamense trees in 1-hectare square areas in the Barro Colorado Island is 1.34 and follows a Poisson distribution. The total area of this forest is 50 hectares square.

How many hectares do we expect to have no trees of this species?

1. The most dangerous rays are ray2 because it has a higher probability for more mutated cells.

For example, the probability of 3 mutated cells in a week for ray2 is nearly 0.1 or 10%, while for ray1 and ray2 is nearly zero.

2. The best machine is machine1 because it has the lowest probability for more rejected tablets.

For example, the probability of 4 rejected tablets in an hour (solid vertical line) in machine2 is higher than in machine3, which is higher than in machine1. 3. The probability of finding less than 10 CFU/ml = probability of 9 CFU/ml + probability of 8 CFU/ml + probability of 7 CFU/ml +………….+ probability of 0 CFU/ml.

• Construct a table for the different number of events (CFU/ml) and add another column named “average^cfu/ml” for the λ^k term. λ is the average bacterial cells/ml = 10 and k = 0,1,2,3,4,5,6,7,8,9.
 CFU/ml average^cfu/ml 0 1e+00 1 1e+01 2 1e+02 3 1e+03 4 1e+04 5 1e+05 6 1e+06 7 1e+07 8 1e+08 9 1e+09
• Add another column named “multiplied average^cfu/ml” for the multiplication of average^cfu/ml by e^(-λ) = 2.71828^-10.
 CFU/ml average^cfu/ml multiplied average^cfu/ml 0 1e+00 4.540024e-05 1 1e+01 4.540024e-04 2 1e+02 4.540024e-03 3 1e+03 4.540024e-02 4 1e+04 4.540024e-01 5 1e+05 4.540024e+00 6 1e+06 4.540024e+01 7 1e+07 4.540024e+02 8 1e+08 4.540024e+03 9 1e+09 4.540024e+04
• Add another column named “probability” by dividing each value of the “multiplied average^cfu/ml” by factorial cfu/ml.

For 0 CFU/ml, the factorial = 1.

For 1 CFU/ml, the factorial = 1.

For 2 CFU/ml, the factorial = 2X1 = 2, and so on.

 CFU/ml average^cfu/ml multiplied average^cfu/ml probability 0 1e+00 4.540024e-05 0.00005 1 1e+01 4.540024e-04 0.00045 2 1e+02 4.540024e-03 0.00227 3 1e+03 4.540024e-02 0.00757 4 1e+04 4.540024e-01 0.01892 5 1e+05 4.540024e+00 0.03783 6 1e+06 4.540024e+01 0.06306 7 1e+07 4.540024e+02 0.09008 8 1e+08 4.540024e+03 0.11260 9 1e+09 4.540024e+04 0.12511
• We sum the probability column to get the probability of finding less than 10 CFU/ml.

0.00005+ 0.00045+ 0.00227+ 0.00757+ 0.01892+ 0.03783+ 0.06306+ 0.09008+ 0.11260+ 0.12511 = 0.45794 or 45.8%.

• We can plot the probabilities for the different numbers of CFU/ml, from 0 to 9. 4. We calculate the probability of hitting by 1 or 2 bombs:

• Construct a table for the different number of events:
 hits 1 2
• Add another column named “average^hits” for the λ^k term. λ is the average events number = 0.9323 and k = 1 or 2.
 hits average^hits 1 0.9323000 2 0.8691833

The first value is 0.9323^1 = 0.9323.

The second value is 0.9323^2 = 0.8691833.

• Add another column named “multiplied average^hits” for the multiplication of average^hits by e^(-λ) = 2.71828^-0.9323.
 hits average^hits multiplied average^hits 1 0.9323000 0.3669976 2 0.8691833 0.3421519
• Add another column named “probability” by dividing each value of the “multiplied average^hits” by factorial hits.

For 1 hit, the factorial = 1.

For 2 hits, the factorial = 2X1 = 2.

 hits average^hits multiplied average^hits probability 1 0.9323000 0.3669976 0.36700 2 0.8691833 0.3421519 0.17108

The probability of getting hit by 1 bomb = 0.367 or 36.7%.

The probability of getting hit by 2 bombs = 0.17108 or 17.1%.

The probability of hit by 1 or 2 bombs = 0.367+0.17108 = 0.538 or 53.8%.

• We can use these probabilities to calculate the number of areas that are expected to receive these hits.

We multiply each probability by 576 as we have 576 small areas of London.

 hits average^hits multiplied average^hits probability expected areas 1 0.9323000 0.3669976 0.36700 211.39 2 0.8691833 0.3421519 0.17108 98.54

Out of the total 576 areas of London, we are expecting 211 areas to receive 1 bomb and 98 areas to receive 2 bombs.

5. We calculate the probability of containing zero trees:

• Calculate “average^trees” for the λ^k term. λ is the average events number = 1.34 and k = 0.

λ^k = 1.34^0 = 1.

• Multiply the value you get by e^(-λ) = 2.71828^-1.34.

1 X 2.71828^-1.34 = 0.2618459.

• Calculate the probability by dividing the value of step 2 by factorial trees.

For 0 trees, the factorial = 1.

probability = 0.2618459/1 = 0.2618459.

The probability of seeing no trees of this species = 0.262 or 26.2%.

• We can use this probability to calculate the number of squared hectares expected to contain no trees of this species.

We multiply the probability by 50 as we have 50 squared hectares in this forest.

Expected hectares = 50 X 0.2618459 = 13.0923.

Out of the total 50 squared hectares of this forest, we expect 13 squared hectares to contain no trees of this species.