 # Logistic Equation – Explanation & Examples The definition of the logistic equation is:

“The logistic equation is a sigmoid function, which takes any real number and outputs a value between zero and certain positive number.”

In this topic, we will discuss the logistic equation from the following aspects:

1. What is the logistic equation?
2. Logistic equation formula.
3. How to solve the logistic equation?
4. Application of logistic function in ecology.
5. Application of logistic function in statistics.
6. Practice questions.

## 1. What is the logistic equation?

The logistic equation is a sigmoid function, which takes any real number from negative infinity -∞ to positive infinity +∞ and outputs a value between zero and a certain positive number.

### Logistic equation formula

The equation is :

f(x)=L/(1+e^(-k(x-x_0)) )

where:

f(x) is the logistic equation or function.

L is the logistic function or curve maximum value.

e is a mathematical constant approximately equal to 2.71828.

k is the logistic growth rate or steepness of the curve.

x_0 is the value of x at the sigmoid curve midpoint.

For values of x between -∞ to +∞, the logistic equation draws an S-curve with the curve f(x) approaching L as x approaches +∞ and approaching zero as x approaches -∞.

#### – Example 1

For the x values:

-6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6.

Draw the logistic curve when L = 1, k = 1, and x_0=0.

1. plot a table of values.

 x -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

2. Using the above formula, calculate the logistic function for each value.

 x f(x) -6 0.002472623 -5 0.006692851 -4 0.017986210 -3 0.047425873 -2 0.119202922 -1 0.268941421 0 0.500000000 1 0.731058579 2 0.880797078 3 0.952574127 4 0.982013790 5 0.993307149 6 0.997527377

3. Plot the x values on the x-axis and the logistic function value on the y-axis.

Connect the intersecting points with a line to draw the sigmoid curve. For comparison, we can add two other equations with the same parameters except that L = 5 and L =10 respectively.

We update the table.

 x f(x)_1 f(x)_5 f(x)_10 -6 0.002472623 0.01236312 0.02472623 -5 0.006692851 0.03346425 0.06692851 -4 0.017986210 0.08993105 0.17986210 -3 0.047425873 0.23712937 0.47425873 -2 0.119202922 0.59601461 1.19202922 -1 0.268941421 1.34470711 2.68941421 0 0.500000000 2.50000000 5.00000000 1 0.731058579 3.65529289 7.31058579 2 0.880797078 4.40398539 8.80797078 3 0.952574127 4.76287063 9.52574127 4 0.982013790 4.91006895 9.82013790 5 0.993307149 4.96653575 9.93307149 6 0.997527377 4.98763688 9.97527377

Where f(x)_1 is the logistic function with L = 1, f(x)_5 is the logistic function with L = 5, and f(x)_10 is the logistic function with L = 10.

and plot the 3 different sigmoid curves. We see that the maximum of each logistic function is its L value.

#### – Example 2

For the x values:

-6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6.

Draw the logistic curve when L = 1, k = 1, 2, or 3, and x_0=0.

1. plot a table of values.

 x -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

2. Using the above formula, calculate the logistic function for each value and each k value.

We add 3 other columns, one for each logistic function.

 x f(x)_1 f(x)_2 f(x)_3 -6 0.002472623 6.144175e-06 1.522998e-08 -5 0.006692851 4.539787e-05 3.059022e-07 -4 0.017986210 3.353501e-04 6.144175e-06 -3 0.047425873 2.472623e-03 1.233946e-04 -2 0.119202922 1.798621e-02 2.472623e-03 -1 0.268941421 1.192029e-01 4.742587e-02 0 0.500000000 5.000000e-01 5.000000e-01 1 0.731058579 8.807971e-01 9.525741e-01 2 0.880797078 9.820138e-01 9.975274e-01 3 0.952574127 9.975274e-01 9.998766e-01 4 0.982013790 9.996646e-01 9.999939e-01 5 0.993307149 9.999546e-01 9.999997e-01 6 0.997527377 9.999939e-01 1.000000e+00

Where f(x)_1 is the logistic function with k = 1, f(x)_2 is the logistic function with L = 2, and f(x)_3 is the logistic function with L = 3.

and plot the 3 different sigmoid curves. With increasing the k value, the sigmoid curve becomes steeper in its growth.

## 2. How to solve the logistic equation?

The logistic function finds applications in many fields, including ecology, chemistry, economics, sociology, political science, linguistics, and statistics.

We will focus on the application and the solving of logistic function in ecology and statistics.

### – Application of logistic function in ecology

A typical application of the logistic equation is to model population growth, where the rate of reproduction is proportional to both the existing population and the amount of available resources.

The logistic differential equation for the population growth is:

dP/dt=rP(1-P/K)

Where:

P is the population size.

t is the time. The units of time can be hours, days, weeks, months, or years.

dP/dt is the instantaneous rate of change of the population as a function of time.
r is the growth rate.

K is the carrying capacity. The carrying capacity of an organism in a given environment is defined to be the maximum population of that organism that the environment can sustain indefinitely. It has the same unit as the population size.

The population growth rate changes over time. Biologists have found that in many biological systems, the population grows until a certain steady-state population is reached.

The concept of carrying capacity allows for the possibility that in a given area, only a certain number of a given organism or animal can thrive without running into resource issues.

Suppose that the initial population is small relative to the carrying capacity. Then P/K is small, possibly close to zero. Thus, the quantity in parentheses on the right-hand side of the logistic equation is close to 1, and the right-hand side of this equation is close to rP. The value of the rate r represents the proportional increase of the population P in one unit of time. So, the population grows rapidly.

However, as the population grows, some members of the population interfere with each other by competing for some critical resource, such as food or living space. The ratio P/K also grows, because K is constant. If the population remains below the carrying capacity, then P/K is less than 1, so 1-(P/K)>0 but less than 1. Therefore the growth rate decreases as a result.

If P=K then the right-hand side is equal to zero, and the population does not change (this is called maturity of the population).

The solution to the equation, with P_0 being the initial population is:

P(t)=K/(1+((K-P_0)/P_0 )e^(-rt) )

Note that K is the limiting value of P:

If the P_0 < K, then population grows till reaching K.

If P_0 >K, then population decreases till approach K.

#### – Example 1

A population of rabbits in a meadow is observed to be 200 rabbits at time t=0. Using an initial population of 200 and a growth rate of 0.04 per month, with a carrying capacity of 750 rabbits.

Draw the logistic curve of growth for this population.

1. The growth rate is per month so the x-axis will be in months. The 0 value will represent the current month, and 1 is the next month, and so on.

We can also plot negative values on the x-axis to represent the previous month and so on.

In a table, we write the next 12 values (next year) and the previous 12 values (previous year).

So the x values will range from -12 to 12.

In a table:

 t_months -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12

2. We know that the population at the current month or t = 0 is 200. We use the above equation to know the population size for all these months:

P(t)=K/(1+((K-P_0)/P_0 )e^(-rt) )=750/(1+((750-200)/200)e^(-0.04t) )

For example, at time = 0:

The population size at time = 0 = P(0) = 750/1+(750-200/200)Xe^(-0.04X0) = 750/(1+((750-200)/200))= 200.

Using the above formula, calculate the population size for each time value and update the table.

 t_months population -12 137.7612 -11 142.3165 -10 146.9863 -9 151.7710 -8 156.6710 -7 161.6865 -6 166.8174 -5 172.0634 -4 177.4243 -3 182.8993 -2 188.4877 -1 194.1884 0 200.0000 1 205.9211 2 211.9500 3 218.0847 4 224.3229 5 230.6621 6 237.0997 7 243.6326 8 250.2578 9 256.9716 10 263.7706 11 270.6506 12 277.6077

3. Plot the t values on the x-axis and the population on the y-axis.

Connect the intersecting points with a line to draw the sigmoid curve. • We see that:
The expected number of rabbits after 12 months or 1 year= P(12) = 278 rabbits approximately.
• It is a part of the Sigmoid curve and so not perfectly S-shaped.

We can see the full sigmoid curve if we extend the time boundaries to between -100 and 100. ### – Application of logistic function in statistics

The logistic function can be used in logistic regression.

Logistic regression is used to model the probability of a dependent variable with two possible values, such as pass/fail given a set of one or more independent variables (predictors).

If we assume that the first level value is “fail” and the second level value is “pass” of the dependent variable.

The probability of the second level value (“pass”), p(Y = “pass”) can vary between 0 (we are certain it is a “fail” event) and 1 (certainly a “pass” event).

The formula of the logistic function that model the probability is:

p(Y)=1/(1+e^(-(β_0+β_1 x)) )

Where:

p(Y) is the probability of the second level value.

x is the independent variable value.

The quantity β_0+β_1 x is the log(odds).

We estimate the values of β_0 and β_1 such that plugging these estimates into the model for p(Y) yields a probability close to 1 for all data that have the second level value (“pass”) and a probability close to 0 for all other data that have the first level value (“fail”).

The odds of an event is the probability of an event occurring divided by the probability of not occurring.

odds = p/(1-p).

While the probability of an event can range between 0 and 1, the odds of an event can range between 0 and +∞.

When p = 0, odds = 0.

when p = 0.25, odds = 0.25/0.75 = 0.33.

when p = 0.5, odds = 0.5/0.5 = 1.

when p = 0.75, odds = 0.75*0.25 = 3.

when p = 0.95, odds = 0.95/0.05 = 19.

when p = 0.99, odds = 0.99/0.01 = 99.

when p = 1, odds = 1/0 = +∞.

The following plot plots the different probabilities on the x-axis and the resulting odds on the y-axis. If we plot the log(odds) on the y-axis, the log(odds) have a range from -∞ to +∞. Because the log(odds) range from -∞ to +∞, they can now be put in the linear equation for a single predictor (x):

log(odds) = β_0+β_1 x

where:

β_0 is the baseline log(odds) when the predictor(x) is zero.

β_1 is the amount of log(odds) increase for each one-unit increase in x.

If β_1 is positive then increasing x will be associated with increasing the log(odds) or p(Y), and if β_1 is negative then increasing x will be associated with decreasing the log(odds) or p(Y).

For different n predictors,x_1,x_2,…..x_n:

log(odds) = β_0+β_1 x+β_2 x_2+…….+β_n x_n

and the logistic equation:

p(Y)=1/(1+e^(-(β_0+β_1 x+β_2 x_2+…….+β_n x_n)) )

p(Y)=1/(1+e^(-(log(odds))) )

p(Y)=odds/(1+odds)

#### – Example 1 for one predictor

The following table shows the data of 25 students illustrating the number of hours per week each student spent watching TV (tv column) and whether they passed or failed a certain exam (pass column).

The pass column has 2 values: 1 for passing and 0 for failure.

 student tv pass 1 0 1 2 0 1 3 0 0 4 5 1 5 7 1 6 10 1 7 11 1 8 12 1 9 14 1 10 15 0 11 15 0 12 16 1 13 17 0 14 19 0 15 20 1 16 20 0 17 20 0 18 23 0 19 24 0 20 25 0 21 27 0 22 28 0 23 30 0 24 30 0 25 32 1

We can see the relation between TV hours and pass in the following plot: We see that increasing the number of hours watching TV is associated with more failures or less passing.

We estimate a logistic regression model for this data, assuming that pass or 1 is the second level value, and found that:

β_0 = 1.90717

β_1 for TV hours (predictor) = -0.13078.

Calculate the probability of passing for all students in this data.

1. Estimate the log(odds) of passing for each student.

log(odds) = β_0+β_1 x = 1.90717 + -0.13078 X TV hours.

The β_1 for TV hours is negative, so increasing the number of hours watching TV will be associated with decreasing the log(odds) or probability of passing.

Update the table with a column for log(odds) of passing.

 student tv pass log(odds) 1 0 1 1.90717 2 0 1 1.90717 3 0 0 1.90717 4 5 1 1.25327 5 7 1 0.99171 6 10 1 0.59937 7 11 1 0.46859 8 12 1 0.33781 9 14 1 0.07625 10 15 0 -0.05453 11 15 0 -0.05453 12 16 1 -0.18531 13 17 0 -0.31609 14 19 0 -0.57765 15 20 1 -0.70843 16 20 0 -0.70843 17 20 0 -0.70843 18 23 0 -1.10077 19 24 0 -1.23155 20 25 0 -1.36233 21 27 0 -1.62389 22 28 0 -1.75467 23 30 0 -2.01623 24 30 0 -2.01623 25 32 1 -2.27779

2., Calculate the probability of passing for each student using the logistic equation:

p(Y)=1/(1+e^(-(β_0+β_1 x)) )=1/(1+e^(-(1.90717+-0.13078x)) )=1/(1+e^(-(log(odds))) )

Update the table with a column for probability of passing for each student.

 student tv pass log(odds) p(pass) 1 0 1 1.90717 0.87070088 2 0 1 1.90717 0.87070088 3 0 0 1.90717 0.87070088 4 5 1 1.25327 0.77786540 5 7 1 0.99171 0.72942555 6 10 1 0.59937 0.64551216 7 11 1 0.46859 0.61504997 8 12 1 0.33781 0.58365845 9 14 1 0.07625 0.51905327 10 15 0 -0.05453 0.48637088 11 15 0 -0.05453 0.48637088 12 16 1 -0.18531 0.45380462 13 17 0 -0.31609 0.42162894 14 19 0 -0.57765 0.35947351 15 20 1 -0.70843 0.32994585 16 20 0 -0.70843 0.32994585 17 20 0 -0.70843 0.32994585 18 23 0 -1.10077 0.24959565 19 24 0 -1.23155 0.22591025 20 25 0 -1.36233 0.20386188 21 27 0 -1.62389 0.16466909 22 28 0 -1.75467 0.14745914 23 30 0 -2.01623 0.11750938 24 30 0 -2.01623 0.11750938 25 32 1 -2.27779 0.09297916

We see that increasing the number of TV hours is associated with decreased probability of passing this exam.

We can plot the tv hours on the x-axis and the probability on the y-axis to see the sigmoid curve of the logistic equation.

The passing and failing students are plotted as black points. We see that the probability of passing the exam decreases with increasing the number of TV hours.

#### – Example 2 for two predictor

The following table shows the data of 100 persons from a certain survey.

The table shows the age of each person (age, in years), body mass index (bmi), and whether they had hypertension (hypertension, No or Yes).

 age bmi hypertension 67 29.02 Yes 63 33.86 Yes 75 34.03 Yes 59 27.73 Yes 68 28.60 Yes 70 29.27 Yes 76 30.08 Yes 74 27.46 No 69 34.31 Yes 70 36.10 Yes 68 24.13 No 67 31.44 Yes 62 34.25 Yes 72 25.22 Yes 77 30.08 Yes 63 25.38 Yes 78 29.90 Yes 58 35.78 Yes 68 39.51 Yes 67 29.90 Yes 65 29.93 Yes 65 34.47 Yes 70 27.26 No 65 34.41 Yes 77 28.89 No 60 33.62 Yes 62 33.24 Yes 62 28.17 Yes 63 31.67 Yes 68 23.57 Yes 55 29.49 Yes 74 31.16 Yes 56 30.80 Yes 67 29.41 Yes 65 30.13 Yes 60 27.77 Yes 55 32.10 Yes 68 29.03 Yes 77 25.47 No 67 29.48 Yes 66 25.39 Yes 72 23.23 Yes 61 33.99 No 58 29.83 Yes 75 25.19 Yes 57 31.64 Yes 65 27.81 Yes 70 24.07 Yes 68 30.32 Yes 55 26.40 Yes 62 34.45 Yes 77 34.62 Yes 56 30.07 Yes 69 27.29 Yes 59 27.30 No 68 30.00 Yes 61 29.08 No 61 29.86 Yes 70 33.87 Yes 75 31.74 Yes 71 32.38 Yes 77 24.03 Yes 65 28.83 Yes 76 37.04 Yes 75 24.18 Yes 67 28.73 Yes 59 33.76 Yes 65 37.04 Yes 64 32.59 Yes 71 27.01 Yes 57 27.73 No 62 32.79 No 68 27.06 Yes 76 39.95 Yes 61 28.35 Yes 67 29.47 Yes 55 23.60 No 64 39.97 Yes 72 30.36 Yes 60 27.79 Yes 65 27.94 Yes 66 27.81 Yes 69 25.61 Yes 66 30.67 Yes 68 26.40 Yes 65 30.52 No 60 33.51 Yes 76 27.20 Yes 57 30.85 Yes 69 27.12 Yes 67 34.65 No 62 25.72 Yes 72 28.15 Yes 70 26.90 Yes 58 25.78 Yes 68 31.48 No 61 42.53 Yes 76 31.45 Yes 64 25.27 Yes 59 34.19 Yes

We can see the relation between age and hypertension in the following box plot: • We see that:
Persons with hypertension are plotted as blue dots while the normotensive persons are plotted as red dots.
• The median age for hypertensive persons (central line of the blue box) is higher than the median age for the normotensive persons (central line of the red box).
• This may indicate that increasing age is associated with an increased probability of hypertension.

We can also see the relation between bmi and hypertension in the following box plot: • We see that:
The median bmi for hypertensive persons (central line of the blue box) is higher than the median bmi for the normotensive persons (central line of the red box).
• This may indicate that increasing body mass index is associated with increased hypertension probability.

We estimate a logistic regression model for this data, assuming that “Yes” is the second level value, and found that:

β_0 = -2.74974.

β_1 for age (predictor1) = 0.02155.

β_2 for bmi (predictor2) = 0.10631.

We will calculate the probability of developing hypertension for each person in this data by following these steps:

1. Estimate the log(odds) of hypertension for each person.

log(odds) = β_0+β_1 x_1+β_2 x_2 = -2.74974 + (0.02155 X age) + (0.10631 X bmi).

The β_1 for age is positive, so increasing age will be associated with increasing the log(odds) or the probability of hypertension.

The β_2 for bmi is positive also, so increasing the body mass index will be associated with increasing the log(odds) or the probability of hypertension.

Update the table with a column for log(odds) of hypertension.

 age bmi hypertension log(odds) 67 29.02 Yes 1.779226 63 33.86 Yes 2.207567 75 34.03 Yes 2.484239 59 27.73 Yes 1.469686 68 28.60 Yes 1.756126 70 29.27 Yes 1.870454 76 30.08 Yes 2.085865 74 27.46 No 1.764233 69 34.31 Yes 2.384706 70 36.10 Yes 2.596551 68 24.13 No 1.280920 67 31.44 Yes 2.036496 62 34.25 Yes 2.227477 72 25.22 Yes 1.482998 77 30.08 Yes 2.107415 63 25.38 Yes 1.306058 78 29.90 Yes 2.109829 58 35.78 Yes 2.303932 68 39.51 Yes 2.915968 67 29.90 Yes 1.872779 65 29.93 Yes 1.832868 65 34.47 Yes 2.315516 70 27.26 No 1.656771 65 34.41 Yes 2.309137 77 28.89 No 1.980906 60 33.62 Yes 2.117402 62 33.24 Yes 2.120104 62 28.17 Yes 1.581113 63 31.67 Yes 1.974748 68 23.57 Yes 1.221387 55 29.49 Yes 1.570592 74 31.16 Yes 2.157580 56 30.80 Yes 1.731408 67 29.41 Yes 1.820687 65 30.13 Yes 1.854130 60 27.77 Yes 1.495489 55 32.10 Yes 1.848061 68 29.03 Yes 1.801839 77 25.47 No 1.617326 67 29.48 Yes 1.828129 66 25.39 Yes 1.371771 72 23.23 Yes 1.271441 61 33.99 No 2.178287 58 29.83 Yes 1.671387 75 25.19 Yes 1.544459 57 31.64 Yes 1.842258 65 27.81 Yes 1.607491 70 24.07 Yes 1.317642 68 30.32 Yes 1.938979 55 26.40 Yes 1.242094 62 34.45 Yes 2.248740 77 34.62 Yes 2.590062 56 30.07 Yes 1.653802 69 27.29 Yes 1.638410 59 27.30 No 1.423973 68 30.00 Yes 1.904960 61 29.08 No 1.656305 61 29.86 Yes 1.739227 70 33.87 Yes 2.359480 75 31.74 Yes 2.240789 71 32.38 Yes 2.222628 77 24.03 Yes 1.464239 65 28.83 Yes 1.715927 76 37.04 Yes 2.825782 75 24.18 Yes 1.437086 67 28.73 Yes 1.748396 59 33.76 Yes 2.110736 65 37.04 Yes 2.588732 64 32.59 Yes 2.094103 71 27.01 Yes 1.651743 57 27.73 No 1.426586 62 32.79 No 2.072265 68 27.06 Yes 1.592409 76 39.95 Yes 3.135145 61 28.35 Yes 1.578699 67 29.47 Yes 1.827066 55 23.60 No 0.944426 64 39.97 Yes 2.878671 72 30.36 Yes 2.029432 60 27.79 Yes 1.497615 65 27.94 Yes 1.621311 66 27.81 Yes 1.629041 69 25.61 Yes 1.459809 66 30.67 Yes 1.933088 68 26.40 Yes 1.522244 65 30.52 No 1.895591 60 33.51 Yes 2.105708 76 27.20 Yes 1.779692 57 30.85 Yes 1.758274 69 27.12 Yes 1.620337 67 34.65 No 2.377751 62 25.72 Yes 1.320653 72 28.15 Yes 1.794487 70 26.90 Yes 1.618499 58 25.78 Yes 1.240832 68 31.48 No 2.062299 61 42.53 Yes 3.086174 76 31.45 Yes 2.231510 64 25.27 Yes 1.315914 59 34.19 Yes 2.156449

For example, the first person has an age of 67 years and 29.02 bmi so:

log(odds) = -2.74974 + (0.02155 X67) + (0.10631X 29.02) = 1.779226.

2. Calculate the probability of hypertension for each person using the logistic equation:

p(Y)=1/(1+e^(-(β_0+β_1 x_1+β_2 x_2)) )=1/(1+e^(-(-2.74974+(0.02155Xage)+(0.10631Xbmi))) )=1/(1+e^(-(log(odds))) )

Update the table with a column for the probability of hypertension for each person.

 age bmi hypertension log(odds) p(hypertension) 67 29.02 Yes 1.779226 0.8556013 63 33.86 Yes 2.207567 0.9009269 75 34.03 Yes 2.484239 0.9230295 59 27.73 Yes 1.469686 0.8130097 68 28.60 Yes 1.756126 0.8527238 70 29.27 Yes 1.870454 0.8665108 76 30.08 Yes 2.085865 0.8895217 74 27.46 No 1.764233 0.8537390 69 34.31 Yes 2.384706 0.9156536 70 36.10 Yes 2.596551 0.9306393 68 24.13 No 1.280920 0.7826064 67 31.44 Yes 2.036496 0.8845760 62 34.25 Yes 2.227477 0.9026900 72 25.22 Yes 1.482998 0.8150250 77 30.08 Yes 2.107415 0.8916218 63 25.38 Yes 1.306058 0.7868527 78 29.90 Yes 2.109829 0.8918548 58 35.78 Yes 2.303932 0.9092021 68 39.51 Yes 2.915968 0.9486302 67 29.90 Yes 1.872779 0.8667795 65 29.93 Yes 1.832868 0.8621031 65 34.47 Yes 2.315516 0.9101539 70 27.26 No 1.656771 0.8398040 65 34.41 Yes 2.309137 0.9096309 77 28.89 No 1.980906 0.8787777 60 33.62 Yes 2.117402 0.8925831 62 33.24 Yes 2.120104 0.8928419 62 28.17 Yes 1.581113 0.8293620 63 31.67 Yes 1.974748 0.8781201 68 23.57 Yes 1.221387 0.7723075 55 29.49 Yes 1.570592 0.8278680 74 31.16 Yes 2.157580 0.8963749 56 30.80 Yes 1.731408 0.8495924 67 29.41 Yes 1.820687 0.8606486 65 30.13 Yes 1.854130 0.8646113 60 27.77 Yes 1.495489 0.8169007 55 32.10 Yes 1.848061 0.8638993 68 29.03 Yes 1.801839 0.8583727 77 25.47 No 1.617326 0.8344260 67 29.48 Yes 1.828129 0.8615387 66 25.39 Yes 1.371771 0.7976661 72 23.23 Yes 1.271441 0.7809894 61 33.99 No 2.178287 0.8982827 58 29.83 Yes 1.671387 0.8417607 75 25.19 Yes 1.544459 0.8241120 57 31.64 Yes 1.842258 0.8632156 65 27.81 Yes 1.607491 0.8330628 70 24.07 Yes 1.317642 0.7887891 68 30.32 Yes 1.938979 0.8742400 55 26.40 Yes 1.242094 0.7759283 62 34.45 Yes 2.248740 0.9045418 77 34.62 Yes 2.590062 0.9302193 56 30.07 Yes 1.653802 0.8394042 69 27.29 Yes 1.638410 0.8373185 59 27.30 No 1.423973 0.8059605 68 30.00 Yes 1.904960 0.8704519 61 29.08 No 1.656305 0.8397413 61 29.86 Yes 1.739227 0.8505888 70 33.87 Yes 2.359480 0.9136848 75 31.74 Yes 2.240789 0.9038531 71 32.38 Yes 2.222628 0.9022632 77 24.03 Yes 1.464239 0.8121802 65 28.83 Yes 1.715927 0.8476035 76 37.04 Yes 2.825782 0.9440533 75 24.18 Yes 1.437086 0.8080030 67 28.73 Yes 1.748396 0.8517504 59 33.76 Yes 2.110736 0.8919423 65 37.04 Yes 2.588732 0.9301329 64 32.59 Yes 2.094103 0.8903287 71 27.01 Yes 1.651743 0.8391265 57 27.73 No 1.426586 0.8063689 62 32.79 No 2.072265 0.8881781 68 27.06 Yes 1.592409 0.8309547 76 39.95 Yes 3.135145 0.9583194 61 28.35 Yes 1.578699 0.8290201 67 29.47 Yes 1.827066 0.8614118 55 23.60 No 0.944426 0.7199928 64 39.97 Yes 2.878671 0.9467819 72 30.36 Yes 2.029432 0.8838527 60 27.79 Yes 1.497615 0.8172185 65 27.94 Yes 1.621311 0.8349759 66 27.81 Yes 1.629041 0.8360382 69 25.61 Yes 1.459809 0.8115035 66 30.67 Yes 1.933088 0.8735908 68 26.40 Yes 1.522244 0.8208687 65 30.52 No 1.895591 0.8693917 60 33.51 Yes 2.105708 0.8914567 76 27.20 Yes 1.779692 0.8556588 57 30.85 Yes 1.758274 0.8529933 69 27.12 Yes 1.620337 0.8348416 67 34.65 No 2.377751 0.9151149 62 25.72 Yes 1.320653 0.7892904 72 28.15 Yes 1.794487 0.8574765 70 26.90 Yes 1.618499 0.8345880 58 25.78 Yes 1.240832 0.7757088 68 31.48 No 2.062299 0.8871845 61 42.53 Yes 3.086174 0.9563188 76 31.45 Yes 2.231510 0.9030436 64 25.27 Yes 1.315914 0.7885010 59 34.19 Yes 2.156449 0.8962699

If we plot the age on the x-axis and the hypertension probability on the y-axis, we will not see the sigmoid curve of the logistic equation.

We see a zipped curve because our logistic equation also takes account of the bmi. We can plot the bmi on the x-axis and the probability on the y-axis with a separate line for each age to see the sigmoid curve of the logistic equation. We see that increasing age (light blue lines) or bmi is associated with an increased probability of hypertension.

## 4. Practice questions

1. A population of tigers in a certain forest has a growth rate of 0.06 or 6% per year, with a carrying capacity of 136 tigers.

If the initial population was 30 tigers, draw the logistic curve of growth for this population.

2. In the above example, if the initial population was 200 tigers, draw the logistic curve of growth for this population.

3. From certain data, we estimate a logistic regression model for the effect of age on developing cardiovascular (cv) events, assuming that the presence of cv is the second level value, and found that:

β_0 = -4.426704.

β_1 for age (predictor) = 0.023421.

Calculate the probability of developing cv events for the age range 20-80 years.

4. From certain data, we estimate a logistic regression model for the effect of age on developing Type-2 diabetes (diab), assuming that the presence of Type-2 diabetes is the second level value, and found that:

β_0 = -1.67193.

β_1 for age (predictor) = 0.02343.

Calculate the probability of developing Type-2 diabetes for the age range 20-80 years.

5. The following plot shows the logistic regression curves that determine the effect of the number of characters in the email, in thousands, and the number of times “password” appeared in the email on the probability of spam email. How do these 2 predictors affect the probability of an email being spam?

1. The growth rate is per year so the x-axis will be in years.

We know that the population at the current year or t = 0 is 30. We use the logistic equation to know the population size for any year:

P(t)=K/(1+((K-P_0)/P_0 )e^(-rt) )=136/(1+((136-30)/30)e^(-0.06t) )

For example, at time = 0:

The population size at time = 0 = P(0) = 136/1+(136-30/30)Xe^(-0.06X0) = 136/(1+((136-30)/30))= 30.

Using the above formula, we can calculate the population size for the past and next 20 years and produce that table.

 year population -20 10.68252 -19 11.28826 -18 11.92508 -17 12.59420 -16 13.29684 -15 14.03422 -14 14.80756 -13 15.61806 -12 16.46689 -11 17.35520 -10 18.28411 -9 19.25466 -8 20.26787 -7 21.32464 -6 22.42585 -5 23.57223 -4 24.76443 -3 26.00299 -2 27.28829 -1 28.62060 0 30.00000 1 31.42643 2 32.89963 3 34.41916 4 35.98437 5 37.59442 6 39.24825 7 40.94455 8 42.68182 9 44.45834 10 46.27213 11 48.12102 12 50.00261 13 51.91432 14 53.85334 15 55.81671 16 57.80130 17 59.80382 18 61.82087 19 63.84895 20 65.88446

Plot the year values on the x-axis and the population size on the y-axis.

Connect the intersecting points with a line to draw the sigmoid curve. For example, the expected number of tigers after 20 years= P(20) = 66 tigers approximately.

2. The initial population is larger than the carrying capacity so the population will decrease with years.

We know that the population at the current year or t = 0 is 200. We use the logistic equation to know the population size for any year:

P(t)=K/(1+((K-P_0)/P_0 )e^(-rt) )=136/(1+((136-200)/200)e^(-0.06t) )

For example, at time = 0:

The population size at time = 0 = P(0) = 136/1+(136-200/200)Xe^(-0.06X0) = 136/(1+((136-200)/200))= 200.

Using the above formula, we can calculate the population size for the past and next 10 years and produce that table.

 year population -10 326.2001 -9 301.6338 -8 281.6574 -7 265.1215 -6 251.2309 -5 239.4176 -4 229.2649 -3 220.4605 -2 212.7656 -1 205.9943 0 200.0000 1 194.6652 2 189.8949 3 185.6114 4 181.7504 5 178.2582 6 175.0900 7 172.2075 8 169.5783 9 167.1746 10 164.9724

Plot the year values on the x-axis and the population size on the y-axis.

Connect the intersecting points with a line to draw the sigmoid curve. For example, the expected number of tigers after 10 years= P(10) = 165 tigers approximately.

3. Estimate the log(odds) of developing cv event for each age value.

log(odds) = β_0+β_1 x = -4.426704 + 0.023421 X age.

The β_1 for age is positive, so increasing age will be associated with increasing the log(odds) or probability of cv event.

The following table will be produced.

 age log(odds) 20 -3.958284 21 -3.934863 22 -3.911442 23 -3.888021 24 -3.864600 25 -3.841179 26 -3.817758 27 -3.794337 28 -3.770916 29 -3.747495 30 -3.724074 31 -3.700653 32 -3.677232 33 -3.653811 34 -3.630390 35 -3.606969 36 -3.583548 37 -3.560127 38 -3.536706 39 -3.513285 40 -3.489864 41 -3.466443 42 -3.443022 43 -3.419601 44 -3.396180 45 -3.372759 46 -3.349338 47 -3.325917 48 -3.302496 49 -3.279075 50 -3.255654 51 -3.232233 52 -3.208812 53 -3.185391 54 -3.161970 55 -3.138549 56 -3.115128 57 -3.091707 58 -3.068286 59 -3.044865 60 -3.021444 61 -2.998023 62 -2.974602 63 -2.951181 64 -2.927760 65 -2.904339 66 -2.880918 67 -2.857497 68 -2.834076 69 -2.810655 70 -2.787234 71 -2.763813 72 -2.740392 73 -2.716971 74 -2.693550 75 -2.670129 76 -2.646708 77 -2.623287 78 -2.599866 79 -2.576445 80 -2.553024

Calculate the probability of cv developing for each age using the logistic equation:

p(Y)=1/(1+e^(-(β_0+β_1 x) )=1/(1+e^(-(-4.426704+0.023421x)) )=1/(1+e^(-(log(odds))) )

Update the table with a column for probability of cv developing for each age value.

 age log(odds) p(cv) 20 -3.958284 0.01873804 21 -3.934863 0.01917357 22 -3.911442 0.01961902 23 -3.888021 0.02007460 24 -3.864600 0.02054055 25 -3.841179 0.02101707 26 -3.817758 0.02150442 27 -3.794337 0.02200280 28 -3.770916 0.02251247 29 -3.747495 0.02303367 30 -3.724074 0.02356665 31 -3.700653 0.02411165 32 -3.677232 0.02466894 33 -3.653811 0.02523878 34 -3.630390 0.02582143 35 -3.606969 0.02641716 36 -3.583548 0.02702626 37 -3.560127 0.02764901 38 -3.536706 0.02828569 39 -3.513285 0.02893659 40 -3.489864 0.02960201 41 -3.466443 0.03028226 42 -3.443022 0.03097764 43 -3.419601 0.03168847 44 -3.396180 0.03241506 45 -3.372759 0.03315775 46 -3.349338 0.03391685 47 -3.325917 0.03469271 48 -3.302496 0.03548566 49 -3.279075 0.03629606 50 -3.255654 0.03712425 51 -3.232233 0.03797059 52 -3.208812 0.03883545 53 -3.185391 0.03971920 54 -3.161970 0.04062221 55 -3.138549 0.04154486 56 -3.115128 0.04248754 57 -3.091707 0.04345063 58 -3.068286 0.04443455 59 -3.044865 0.04543968 60 -3.021444 0.04646645 61 -2.998023 0.04751527 62 -2.974602 0.04858655 63 -2.951181 0.04968072 64 -2.927760 0.05079822 65 -2.904339 0.05193949 66 -2.880918 0.05310496 67 -2.857497 0.05429508 68 -2.834076 0.05551031 69 -2.810655 0.05675111 70 -2.787234 0.05801794 71 -2.763813 0.05931127 72 -2.740392 0.06063157 73 -2.716971 0.06197933 74 -2.693550 0.06335503 75 -2.670129 0.06475916 76 -2.646708 0.06619220 77 -2.623287 0.06765466 78 -2.599866 0.06914704 79 -2.576445 0.07066985 80 -2.553024 0.07222359

We see that increasing age is associated with an increased probability of developing cv.

We can plot the age on the x-axis and the probability on the y-axis to see the sigmoid curve of the logistic equation. 4. Estimate the log(odds) of developing Type-2 diabetes for each age value.

log(odds) = β_0+β_1 x = -1.67193 + 0.02343 X age.

The β_1 for age is positive, so increasing age will be associated with increasing the log(odds) or probability of Type-2 diabetes.

The following table will be produced.

 age log(odds) 20 -1.20333 21 -1.17990 22 -1.15647 23 -1.13304 24 -1.10961 25 -1.08618 26 -1.06275 27 -1.03932 28 -1.01589 29 -0.99246 30 -0.96903 31 -0.94560 32 -0.92217 33 -0.89874 34 -0.87531 35 -0.85188 36 -0.82845 37 -0.80502 38 -0.78159 39 -0.75816 40 -0.73473 41 -0.71130 42 -0.68787 43 -0.66444 44 -0.64101 45 -0.61758 46 -0.59415 47 -0.57072 48 -0.54729 49 -0.52386 50 -0.50043 51 -0.47700 52 -0.45357 53 -0.43014 54 -0.40671 55 -0.38328 56 -0.35985 57 -0.33642 58 -0.31299 59 -0.28956 60 -0.26613 61 -0.24270 62 -0.21927 63 -0.19584 64 -0.17241 65 -0.14898 66 -0.12555 67 -0.10212 68 -0.07869 69 -0.05526 70 -0.03183 71 -0.00840 72 0.01503 73 0.03846 74 0.06189 75 0.08532 76 0.10875 77 0.13218 78 0.15561 79 0.17904 80 0.20247

Calculate the probability of developing Type-2 diabetes for each age value using the logistic equation:

p(Y)=1/(1+e^(-(β_0+β_1 x)) )=1/(1+e^(-(-1.67193+0.02343x)) )=1/(1+e^(-(log(odds))) )

Update the table with a column for probability of developing Type-2 diabetes for each age value.

 age log(odds) p(diab) 20 -1.20333 0.2308834 21 -1.17990 0.2350702 22 -1.15647 0.2393093 23 -1.13304 0.2436005 24 -1.10961 0.2479436 25 -1.08618 0.2523383 26 -1.06275 0.2567843 27 -1.03932 0.2612812 28 -1.01589 0.2658288 29 -0.99246 0.2704265 30 -0.96903 0.2750739 31 -0.94560 0.2797706 32 -0.92217 0.2845159 33 -0.89874 0.2893095 34 -0.87531 0.2941506 35 -0.85188 0.2990386 36 -0.82845 0.3039729 37 -0.80502 0.3089527 38 -0.78159 0.3139773 39 -0.75816 0.3190459 40 -0.73473 0.3241576 41 -0.71130 0.3293117 42 -0.68787 0.3345071 43 -0.66444 0.3397429 44 -0.64101 0.3450183 45 -0.61758 0.3503320 46 -0.59415 0.3556832 47 -0.57072 0.3610707 48 -0.54729 0.3664934 49 -0.52386 0.3719501 50 -0.50043 0.3774396 51 -0.47700 0.3829608 52 -0.45357 0.3885123 53 -0.43014 0.3940929 54 -0.40671 0.3997013 55 -0.38328 0.4053360 56 -0.35985 0.4109959 57 -0.33642 0.4166794 58 -0.31299 0.4223851 59 -0.28956 0.4281116 60 -0.26613 0.4338574 61 -0.24270 0.4396211 62 -0.21927 0.4454011 63 -0.19584 0.4511959 64 -0.17241 0.4570040 65 -0.14898 0.4628237 66 -0.12555 0.4686537 67 -0.10212 0.4744922 68 -0.07869 0.4803376 69 -0.05526 0.4861885 70 -0.03183 0.4920432 71 -0.00840 0.4979000 72 0.01503 0.5037574 73 0.03846 0.5096138 74 0.06189 0.5154676 75 0.08532 0.5213171 76 0.10875 0.5271607 77 0.13218 0.5329970 78 0.15561 0.5388242 79 0.17904 0.5446408 80 0.20247 0.5504453

We see that increasing age is associated with an increased probability of developing Type-2 diabetes.

We can plot the age on the x-axis and the probability on the y-axis to see the sigmoid curve of the logistic equation. 5. Increasing the number of characters in the email is associated with decreased probability of spam email.

Also, increasing the number of times “password” appeared in the email (from 0 to 4) is associated with decreased probability of spam email for a constant small number of characters.