Report an error

8.4 How does logistic regression work?

A logistic regression model seeks to determine the chance of some outcome occurring, given a combination of predictor variables which could be continuous, categorical, or a mixture of both. The outcome is binary (e.g. will a family renew their membership?) and expressed in terms of the log odds.

Whereas a linear regression model’s coefficients are determined through a fitting process known as ordinary least squares, a logistic regression model’s coefficients are determined through a process known as maximum likelihood estimation, or MLE. MLE assigns parameters to the model by answering the question, “Given this set of data, what are the parameters that maximize the likelihood of the event occurring?”

A logistic regression model generates an equation that looks quite a bit like the linear regression equation. Before undergoing any transformations, this equation takes the form of:

p = β₀+ β₁X₁ + β₂X₂+ … + β_qX_q

However, that equation then undergoes the transformation shown in the box below.

This transformation ensures that the p outcome must fall between 0 and 1.⁷ In the formula above, e stands for Euler’s number. For the demonstration example in this section, we will approximate the value of e as 2.71828.

To see this in action, let’s revisit a question from the previous section: Will the night sky light up over Lobster Land?

For this example, let’s suppose that for one season, Lobster Land management decides to keep the visiting public on its toes regarding the question of whether the park will have fireworks on a given night. Rather than announce the fireworks in advance, Lobster Land will launch its show just prior to the 9:00 p.m. closing time on some nights, but not on others. Upper management stays tight-lipped about its decision-making process, but this does not stop the speculation among park employees and die-hard fans regarding the park’s decision-making process. One of the managers at Lobsterama, Tony M, starts to notice a pattern.

He builds a logistic regression equation, using a single input (total daily Lobsterama sales, as of 8 p.m.). The logit takes these coefficients:

β₀= -4.12, and β₁= 0.00019

In Tony’s model, -4.12 is a constant term, whereas 0.00019 is the coefficient associated with Lobsterama sales, in dollars. To use this model, we multiply the day’s total sales receipts by 0.00019, and then subtract -4.12, to arrive at a value that gives us the log-odds that our observation will land in the “1” class.

So what would his model predict? For a day on which Lobsterama revenue is 40,000, the log-odds would come to:

-4.12 + 0.00019 * 40,000 = 3.48

This value, 3.48, is not immediately interpretable, though. To make this interpretable as a probability, we can revisit the transformation above to arrive at a probability.

Inserting 3.48 into the transformation equation as the logit, we arrive at:

1 / ( 1 + 2.71828^ -3.48) = 0.97

According to Tony’s model, there is a 97 percent chance that fireworks will occur in the evening when daily Lobsterama sales hit $40,000.

In case you are wondering, the answer is “yes” – we can do this much more quickly in Python!

On a much slower day for Lobsterama, however, Lobster Land’s on-site restaurant might only gross $12,000 in total receipts. Let’s see what the model would predict on such a day:

-4.12 + 0.00019 * 12,000 = -1.84

On a day like that, there would be just a 13.7 percent chance of seeing fireworks

At this point you might question the connection between odds, and probabilities. They are essentially the same thing, expressed differently.

For instance, as the summer crowd builds up at Lobster Land, we might observe that the probability of selling out the lobster rolls is 80%. That same event can be described in terms of odds. Since odds = probability of success / probability of failure, the odds of a lobster roll sell out would be 0.8/0.2 = 4. Since odds are often expressed as a ratio, we might call this “4 to 1.”

Finally, if we want to use the natural log of odds aka the logit function, we would simply apply the natural logarithm to the odds. Ln(4) = 1.386.

The outcome variable of a logistic regression model is expressed in terms of log odds because of the transformation function a logistic function produces an S-shaped curve regardless of the value of X⁸, ensuring a sensible prediction.

⁶ Grace-Martin, K. (n.d.) ‘What is a logit function and why use logistic regression?’ The Analysis Factor. https://www.theanalysisfactor.com/what-is-logit-function/

⁷ To see why the values are bound by 0 and 1, we encourage you to try this experiment: first, substitute the part of the equation with some very large, positive value. What happens? Now, substitute the part of the equation with some negative value with a large magnitude. What happens?

⁸ James,G. et al (2013). ‘Classification’. An introduction to statistical learning. New York. Springer.