Select Page

16.2 Polynomial Regression


Lobster Land has just come to us with a fairly straightforward request – they want to know if we can model the total daily number of ice cold lemonades at a concession stand near the park entrance, given a single input, the daily high temperature.  

After reading in the dataset and examining the variable relationship, we can see that sales tend to rise with temperature increases, but only to a point.  Once sales reach a point somewhere just below 200, they seem to level out, even as temperatures go higher.  Why is this the case?  It’s hard to say.  Perhaps lemonade sales start to level out like this because on very hot days, some people start to spend more time inside – or they just go to the beach.  

Since the relationship is linear for the most part, we can model this with a ‘regular’ regression model and still achieve a decent r-squared value of 0.680.  This means that our model explains 68% of the variation in cold lemonade sales. 

Given the curved shape of the variable relationship, perhaps we can still use a polynomial regression model to explain this relationship even more closely.    To achieve this, we will add an exponential term. We first try fitting a curve to the degree of 2 (temperature **2).  

For the polynomial curve model, the r-squared is considerably better than the previous one.  Note that the model is still linear.  To generate the predicted y value, each of the coefficient terms is added together.  

For the first observation in the dataset, the temperature was 87 degrees Fahrenheit.  What would the model predict for such a day?  We will show the model’s prediction below, along with a demonstration of how the regression equation generates this value.  

A comparison of the two plots shown below visually demonstrates the superior fit of the polynomial model.  

In theory, we could try fitting a curve to the degree of 3, 4, or even 20 but this could either distort the curve or lead to overfitting as the line would try to capture the noise in the data. A model such as this would not perform well against unseen data.

We should also bear in mind that polynomial regression models are sensitive to outliers just like their linear regression cousins.