Select Page

11.12 Exponential Smoothing Methods


Single Exponential Smoothing

Single Exponential Smoothing (SES), also sometimes known as Simple Exponential Smoothing, is used for time series forecasting when the data shows neither trend nor seasonality.   The advantage of SES is that it enables the modeler to use a constant, usually referred to as alpha, in order to place more weight on recent forecasts when making a prediction.  This weighting adds a dynamic element that simple moving averages lack, as a simple moving average uses a constant window size and equally weights each observation inside that window.  

The equation shown below this paragraph is how we will represent our single exponential smoothing forecast for a given period.  In this equation Ft+1 is the forecasted value for the next period.  Ft  is the forecasted value for the current period.  Alpha is the adjusting constant, and Erepresents the error, which is found by taking the observed value (i.e. the actual value) for this period and subtracting the predicted value.  

Ft+1 = Ft  + (α)E

Naturally, a question arises here about selecting the best alpha parameter.  Alpha must fall somewhere between 0 and 1.  The closer alpha is to 1, the more weight we will be placing on the most recent observations when determining our forecast value.  If alpha were 0, we would be ignoring the recent data, and making no adjustment based on our forecast error.  Often, alpha is set through trial and error (although, as we will see below, we can use Python to arrive at an optimal alpha value for our known data points).  

To forecast the next period using SES, we only need two pieces of information:  the most recent forecast, and the most recent forecast error.  

To walk through an example, let’s say that we have captured some data from one of our rival theme parks, Turtle Town.  Turtle Town’s annual revenue for each of the past 2 years is shown in the table below, along with our forecast and our forecast error for the 2nd year.  We’d like to predict Turtle Town’s revenue for Year 3.  

With an alpha value of 0.2, we can find the forecasted value for year three by taking the forecasted value for year 2 and adding (0.2)*(0.9).  11.2 + (0.2)*(0.9) = 11.38.  

Now the year 3 results have come in — Turtle Town’s actual revenue dipped a bit, and it’s $10.5 million.  Can we make a Year 4 forecast now? 

We will solve this problem by again using the formula shown above.  Now, however, there’s one key difference from the previous example — the error term is negative.  

We’ll find the Year 4 forecasted value by starting with the Year 3 forecast and adding alpha times the error term: 

Note how the negative error term led to a downward adjustment in our estimate.  We could continue to generate new predictions this way indefinitely.  Along the way, we might want to re-examine our choice of alpha.  

Now that we have worked through the process manually, let’s explore a Python implementation of this Single Exponential Smoothing model.

Note that in the turtle_fit model, we simply selected the smoothing parameter.  We can instead use the software to deliver a best smoothing parameter, based on the known observations – but to do so, we must pass at least 10 known data points.  We will pass a slightly bigger list of Turtle Town revenue numbers now, and then call the fit() function as shown below:

In the automatically fit SES model, the alpha coefficient is very tiny  – this number, in decimal format, is 1.4901, but with the decimal point moved eight places to the left: 0.000000014901.  

Remember that while we can call the automatically generated alpha value the “best” for this data, the important limitation to bear in mind is that it’s only optimal for the past, known values.  This does not necessarily mean it will be the best smoothing constant to use going forward.

Double Exponential Smoothing:  Holt’s Linear Trend

As noted in the section above, SES is appropriate only in instances for which we do not have trend or seasonality.  So what about times when we do have such things?  

To generate predictions using data that includes a trend, but does not have seasonality, we will use Holt’s Linear Trend method, also known as double exponential smoothing.  

We will use Holt’s Linear Trend to predict sales of New Coke at Lobster Land in the summer of 1985.  That year, Coca-Cola released this product amidst much hype and fanfare.  However, New Coke quickly proved to be a flop; consumers did not enjoy the taste, and they rejected this “innovation” in favor of the familiar taste of the classic drink that they had enjoyed for many years prior.  

Lobster Land’s experience with New Coke seems to have paralleled that of other contemporary retailers.  At first, amidst all the buzz of a new product release, New Coke sold pretty well at Lobster Land.  However, as we can see here from the data below,  weekly sales of this new drink declined pretty steadily throughout that summer.  

We can also depict this visually, with the plot below.  

This plot indicates a very clear overall trend in weekly sales across that summer.  Since we have a clear trend in the data, we will predict New Coke sales at Lobster Land in the summer of 1985 using a double exponential smoothing model.  

Again in this section, we’ll demonstrate the process manually, before jumping right into a statsmodels solution.  

Before we go any further, let’s take a moment to go through the notation below:

yt = Lt   + bt

Lt = (α)yt + (1-α) (Lt-1 + bt-1 )

bt = β* (Lt – Lt-1) + (1- β*)bt-1

Trend in a time series can be either additive or multiplicative.  Since the trend here with the New Coke data appears to be linear, we can say this is additive.  If the trend were multiplicative, then the changes would be occurring in similar proportions, rather than in similar absolute amounts – and we would use a slightly different process than the one shown above.  

To generate the predicted value calculations, we need some information to start with.  This includes two initial state values – one for the level, and another for the trend.  Here, we will call the initial level value (Lt) to be 875, and the initial trend term (bt) to be -50.  The smoothing parameters alpha and beta will be 0.4 and 0.15, respectively.  

The forecast value for period 1 will be the sum of Lt   + bt , 825.00. 

Next, we will find the forecast for period 2.  To do so, we must first update the level equation.

Lt = (α)yt + (1-α) (Lt-1 + bt-1 )

Since the actual number of cans sold in that first week was 875, that will be our yt value here.  Therefore, the new Lt will be:   (0.4 * 875) + (0.6 * 825) = 845.00.

Now, we will need to find the new value for the trend equation.  

bt = β* (Lt – Lt-1) + (1- β*)bt-1

Therefore, the new bt will be:   0.15*(-30) + (0.85 * -50) = -47

The forecast value for period 2 will be the sum of Lt   + bt ,  798.00.  We will do one more manual prediction before going to a (much faster!) Python solution.  To get the predicted value for period 3, we start by again updating the level equation.  Noting that the the true week 2 sales number, yt, was 791, and that the forecast value for this period was 798, we’ll arrive at Las follows:

                (0.4*791) + (0.6 * 798) = 795.2

We will update the trend equation as follows:  

                (0.15 * -49.8) + (0.85 * -47) = -47.42

The prediction for period 3 will be the sum of the two equations:

                    795.2 – 47.42 = 747.78

As promised, though, we’ll stop with the manual predictions here.  Using the Holt() function from statsmodels, we can instantiate such a model using the code below.  

Having built fit2, we can then view its in-sample predictions for each of the weeks in the period, using the fittedvalues attribute, shown below.

But wait a minute – how do we know this is a good model?  After all, we offered up those smoothing_trend and smoothing_level values.  Could there be other values that fit the data even better?

To compare the fit of the parameters generated by the function, compared to the ones that we used originally, we can set up a plot that shows both predictions, as well as the true values from the dataset:

Visual inspection of this plot does not seem to indicate a clear, runaway winner between the models.  Another benchmark we can use is the Akaike Information Criterion (AIC) value.  When comparing models built with the same data using AIC, we look to the smaller value as the better one.  

With these results in mind, we can say that fit3 is the better for this data.  

What about seeing into the future, though?  Using the forecast() function, we can carry these predictions forward, towards new, yet-unseen periods.  

Here, we only went three periods into the future.  What if we took this a bit further, though?  Let’s go out to 5 periods:

Of course, with domain knowledge, we might see a couple of problems here.  First, the Lobster Land season has already concluded before Labor Day (so even the three-period forecast earlier wouldn’t be directly applicable).  Of much more fundamental importance, though, is the fact that sales of a beverage – even a complete bust like New Coke – cannot actually become negative.

Thankfully, we can correct for this by using something called a damping parameter. 

The damping parameter “taps the brakes” on the decreasing trend, so to speak.  Adding a damping parameter will prevent a forecasted trend from simply “running away” in a particular direction.  Even with a damping parameter in place, though, the forecasted values would eventually cross the x-axis and turn negative if we try to look too many periods into the future.

Triple Exponential Smoothing:  The Holt-Winters Method

Finally, we will use a third type of exponential smoothing for cases in which we have both trend and seasonality.  

Again in this section, the formulas shown will apply when the trend and seasonality are additive.  The process works in a way that’s similar to the double exponential model, but now we have one extra component to consider, sfor seasonality.  The seasonality smoothing parameter is also known by its Greek name, “gamma.”  The st-w component of the formula shown above indicates the previous value for the seasonal component.  Since this data has weekly seasonality, we can say that if we’re updating sfor a Friday, we would use the previous Friday’s st value as st-w

In 1978, Lobster Land introduced the SuperFrappe, a delicious mix of espresso, ice cream, and whipped cream.  While the SuperFrappe’s exact origins are disputed, the drink may have originated in Rhode Island.  

Throughout that summer, as word-of-mouth spread across New England, more and more Lobster Land visitors tried out this new drink.  Its daily sales totals can be seen in super_frappe_sales.csv, which is available at lobsterland.net/datasets.  

From this graph, we can see clear evidence of both trend and seasonality.  The rising trend appears here because the drink’s popularity caught on among visitors throughout the summer.  The seasonality here just reflects the day-to-day activity patterns at Lobster Land.  Sales tend to pick up heading into the weekend, where they peak before falling off again during the early part of the week.  

We can set up a Holt-Winters Exponential Smoothing model as follows:

Note that when we look at the model summary, below, we can see that the alpha, beta, and gamma components have been selected for us by the function.  Note also that since we have weekly seasonality, there are 7 different values for initial.seasons – these reflect the different starting points that the model uses for generating forecasts on each day of the week.

The graph below shows us that the fithw model fits the actual data quite well.