11.8 Resampling a Time Series

Time series data may not always exist at the desired frequency, or “periodicity”, needed by the modeler.

For example, let’s imagine that Lobster Land installs a sensor that automatically collects air temperature and atmospheric humidity data at the park’s main entrance every five minutes. The sensor records these two data points continuously, regardless of whether the park is open – so for every 24-hour period, it generates 288 measurements for each of these variables. In the moment, perhaps such frequency serves a purpose – maybe if the heat and humidity measurements exceed some particular threshold, the park can issue an advisory to guests and staff, asking them to drink more water.

Later on, a meteorological analyst studying general temperature trends for southern Maine may wish to use this data. However, rather than have 288 values per variable, per day, he would prefer just a single number for each. This would make the data not only much easier to work with, but also more efficient from a storage and processing perspective.

A similar situation could arise with Lobster Land’s logistics team. This team makes weekly orders to its supplier for most of the foodstuffs needed at Lobsterama and the Snack Shack. Lobster Land’s analytics team typically captures and stores the data on a daily basis, though. While the daily data may serve some important analytics needs, it does not provide the weekly picture that the logistics people need to see. Again, resampling the data will help to serve the purpose at hand.

In pandas, the resample() method makes it easy to perform this action. The code below takes the original turtle town data, and converts into monthly mean values, rather than daily ones.

As the output below indicates, the resampled data contains 95% fewer rows than the original data.

More information about the pandas resample() function can be found in the official documentation.