Select Page

8.3 Why do we use logistic regression instead of linear regression?


Logistic regression  Linear regression

A linear regression model seeks to determine the strength and nature of the relationship between x and y. It involves categorical or numeric inputs which predict a continuous numeric outcome. If x represents the number of day passes sold at Lobster Land, and y represents the park’s revenue from merchandise sold, then we can tell from the chart above that Lobster Land sells more merchandise when more people visit the park.  A linear regression model helps us to  answer the question, “is there a relationship between Lobster Land’s visitor numbers and merchandise revenue?”

However, suppose we asked, “will there be fireworks at Lobster Land?”, then a linear regression model would be unable to answer this question, because:

  • The solution lies in probability, which a  linear regression model does not account for
  • The dependent variable in a linear regression model can exceed 0 and 1, thereby defying the definition of probability.

To classify the chances of a fireworks display into ‘yes’ or ‘no’, another type of function must be used. Many functions are capable of achieving this6, but the logistic function is the one used in a logistic regression model.