Enhanced week over week models

Week over week model is a useful tool in business applications, where time series exhibits strong weekly seasonality. It’s fast and somewhat accurate. Typical drawbacks of week over week models include not adapting to seasonality (e.g. year-end), fast growth and holiday effects. Also, week over week model is vulnerable to corrupted data such as outliers on last week.

Using aggregated lags such like week over 3 weeks median is more robust to data corruption, but the growth/seasonality/holiday issue is not resolved.

The enhanced version of week over week model fits a two-step model with the MultistageForecast method in Greykite. It first uses a Silverkite model to learn the growth, yearly seasonality and holiday effects. Then it uses a week over week or other lag-based models to model the residual weekly patterns.

In this example, we will learn how to do the original week over week type models and how to use the enhanced versions.

The regular week over week models

Greykite supports the regular lag-based models through LagBasedTemplate. To see a general introduction of how to use model templates, see model templates.

Lag-based methods are invoked by specifying the LAG_BASED model template.

37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
 import warnings

 import pandas as pd
 from greykite.common.data_loader import DataLoader
 from greykite.common.aggregation_function_enum import AggregationFunctionEnum
 from greykite.common import constants as cst
 from greykite.framework.templates.forecaster import Forecaster
 from greykite.framework.templates.model_templates import ModelTemplateEnum
 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
 from greykite.framework.templates.autogen.forecast_config import MetadataParam
 from greykite.framework.templates.autogen.forecast_config import ModelComponentsParam
 from greykite.framework.templates.autogen.forecast_config import EvaluationPeriodParam
 from greykite.framework.templates.multistage_forecast_template import MultistageForecastTemplateConfig
 from greykite.sklearn.estimator.lag_based_estimator import LagUnitEnum

 warnings.filterwarnings("ignore")

 df = DataLoader().load_peyton_manning()
 df[cst.TIME_COL] = pd.to_datetime(df[cst.TIME_COL])

We specify the data set and evaluation parameters below. First, we don’t specify model components. In this case, the default behavior for LAG_BASED model template is the week over week model. If the forecast horizon is longer than a week, the model will use the forecasted value to generate further forecasts.

65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
 metadata = MetadataParam(
     time_col=cst.TIME_COL,
     value_col=cst.VALUE_COL,
     freq="D"
 )

 # Turn off cv and test for faster run.
 evaluation = EvaluationPeriodParam(
     cv_max_splits=0,
     test_horizon=0
 )

 config = ForecastConfig(
     forecast_horizon=7,
     model_template=ModelTemplateEnum.LAG_BASED.name,
     metadata_param=metadata,
     evaluation_period_param=evaluation
 )

 forecaster = Forecaster()
 result = forecaster.run_forecast_config(
     df=df,
     config=config
 )

This is the simple week over week estimation. If we print the results, we can see that the predictions are exactly the same as the last week’s observations.

ts actual forecast
2950 2016-01-07 8.295798 8.004700
2951 2016-01-08 8.290293 7.589336
2952 2016-01-09 7.785721 7.825245
2953 2016-01-10 8.281724 8.249314
2954 2016-01-11 8.470730 9.295141
2955 2016-01-12 8.135054 8.568266
2956 2016-01-13 8.067149 8.352554
2957 2016-01-14 8.023552 8.295798
2958 2016-01-15 8.021913 8.290293
2959 2016-01-16 7.817223 7.785721
2960 2016-01-17 9.273878 8.281724
2961 2016-01-18 10.333775 8.470730
2962 2016-01-19 9.125871 8.135054
2963 2016-01-20 8.891374 8.067149


In general, the lag-based method supports any aggregation of any lag combinations. Now let’s use an example to demonstrate how to do a week-over-3-week median estimation. We override the parameters in ModelComponentsParam.custom dictionary. The parameters that can be customized are

  • lag_unit: the unit of the lags. Options are in LagUnitEnum.

  • lags: a list of integers indicating the lags in lag_unit.

  • agg_func: the aggregation function name. Options are in AggregationFunctionEnum.

  • agg_func_params: a dictionary of parameters to be passed to the aggregation function.

Specifying the following, the forecasts will become week-over-3-week median.

114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
 model_components = ModelComponentsParam(
     custom=dict(
         lag_unit=LagUnitEnum.week.name,                 # unit is "week"
         lags=[1, 2, 3],                                 # lags are 1 week, 2 weeks and 3 weeks
         agg_func=AggregationFunctionEnum.median.name    # aggregation function is "median"
     )
 )

 config = ForecastConfig(
     forecast_horizon=7,
     model_template=ModelTemplateEnum.LAG_BASED.name,
     metadata_param=metadata,
     evaluation_period_param=evaluation,
     model_components_param=model_components
 )

 forecaster = Forecaster()
 result = forecaster.run_forecast_config(
     df=df,
     config=config
 )

 result.forecast.df_train.tail(14)
ts actual forecast
2950 2016-01-07 8.295798 7.591862
2951 2016-01-08 8.290293 7.528869
2952 2016-01-09 7.785721 7.171657
2953 2016-01-10 8.281724 8.249314
2954 2016-01-11 8.470730 9.250618
2955 2016-01-12 8.135054 8.568266
2956 2016-01-13 8.067149 8.352554
2957 2016-01-14 8.023552 8.004700
2958 2016-01-15 8.021913 7.589336
2959 2016-01-16 7.817223 7.785721
2960 2016-01-17 9.273878 8.281724
2961 2016-01-18 10.333775 9.250618
2962 2016-01-19 9.125871 8.568266
2963 2016-01-20 8.891374 8.352554


The enhanced week over week model

The enhanced week over week model consists of a two-stage model:

  • "Silverkite model": the first stage uses a Silverkite model to learn the yearly seasonality, growth and holiday effects.

  • "Lag-based model": the second stage uses a lag-based model to learn the residual effects including weekly seasonality.

The model is available through the MultistageForecastTemplate. For details about the multistage forecast model, see multistage forecast.

To use this two-stage enhanced lag model, specify the model template as SILVERKITE_WOW. The default behavior is to model growth, yearly seasonality and holidays with the automatically inferred parameters from the time series. Then it models the residual with a week over week model.

159
160
161
162
163
164
165
166
167
168
169
170
171
172
 config = ForecastConfig(
     forecast_horizon=7,
     model_template=ModelTemplateEnum.SILVERKITE_WOW.name,
     metadata_param=metadata,
     evaluation_period_param=evaluation
 )

 forecaster = Forecaster()
 result = forecaster.run_forecast_config(
     df=df,
     config=config
 )

 result.forecast.df_train.tail(14)
ts actual forecast
2950 2016-01-07 8.295798 8.217895
2951 2016-01-08 8.290293 8.348070
2952 2016-01-09 7.785721 8.383166
2953 2016-01-10 8.281724 8.438601
2954 2016-01-11 8.470730 9.474756
2955 2016-01-12 8.135054 8.737663
2956 2016-01-13 8.067149 8.511306
2957 2016-01-14 8.023552 8.443560
2958 2016-01-15 8.021913 8.426756
2959 2016-01-16 7.817223 7.401877
2960 2016-01-17 9.273878 8.982437
2961 2016-01-18 10.333775 10.008520
2962 2016-01-19 9.125871 8.558125
2963 2016-01-20 8.891374 8.179056


You may notice that the forecast is not exactly the observations a week ago, because the Silverkite model did some adjustments on the growth, yearly seasonality and holidays.

To override the model parameters, we will follow the rules mentioned in multistage forecast. For each stage of model, if you would like to just change one parameter and keep the other parameters the same, you can specify the same model template for the stage as in SILVERKITE_WOW (they are SILVERKITE_EMPTY and LAG_BASED), and specify a model components object to override the specific parameter. Otherwise, you can specify a new model template. The code below overrides both the Silverkite model and the lag model. In the first stage, it keeps the original configuration but forces turning yearly seasonality off. In the second stage, it uses week-over-3-week median instead of wow model.

191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
 model_components = ModelComponentsParam(
     custom=dict(
         multistage_forecast_configs=[
             MultistageForecastTemplateConfig(
                 train_length="1096D",
                 fit_length=None,
                 agg_func="nanmean",
                 agg_freq="D",
                 # Keeps it the same as the model template in `SILVERKITE_WOW` to override selected parameters below
                 model_template=ModelTemplateEnum.SILVERKITE_EMPTY.name,
                 # Since the model template in this stage is the same as the model template in `SILVERKITE_WOW`,
                 # the parameter below will be applied on top of the existing parameters.
                 model_components=ModelComponentsParam(
                     seasonality={
                         "yearly_seasonality": False  # force turning off yearly seasonality
                     }
                 )
             ),
             MultistageForecastTemplateConfig(
                 train_length="28D",  # any value longer than the lags (21D here)
                 fit_length=None,  # keep as None
                 agg_func="nanmean",
                 agg_freq=None,
                 # Keeps it the same as the model template in `SILVERKITE_WOW` to override selected parameters below
                 model_template=ModelTemplateEnum.LAG_BASED.name,
                 # Since the model template in this stage is the same as the model template in `SILVERKITE_WOW`,
                 # the parameter below will be applied on top of the existing parameters.
                 model_components=ModelComponentsParam(
                     custom={
                         "lags": [1, 2, 3],  # changes to 3 weeks' median, default unit is "week",
                         "lag_unit": LagUnitEnum.week.name,
                         "agg_func": AggregationFunctionEnum.median.name,  # changes to 3 weeks' median
                     }
                 )
             )
         ]
     )
 )

 config = ForecastConfig(
     forecast_horizon=7,
     model_template=ModelTemplateEnum.LAG_BASED.name,
     metadata_param=metadata,
     evaluation_period_param=evaluation,
     model_components_param=model_components
 )

 forecaster = Forecaster()
 result = forecaster.run_forecast_config(
     df=df,
     config=config
 )

 result.forecast.df_train.tail(14)
ts actual forecast
2950 2016-01-07 8.295798 8.004700
2951 2016-01-08 8.290293 7.589336
2952 2016-01-09 7.785721 7.825245
2953 2016-01-10 8.281724 8.249314
2954 2016-01-11 8.470730 9.295141
2955 2016-01-12 8.135054 8.568266
2956 2016-01-13 8.067149 8.352554
2957 2016-01-14 8.023552 8.295798
2958 2016-01-15 8.021913 8.290293
2959 2016-01-16 7.817223 7.785721
2960 2016-01-17 9.273878 8.281724
2961 2016-01-18 10.333775 8.470730
2962 2016-01-19 9.125871 8.135054
2963 2016-01-20 8.891374 8.067149


Gallery generated by Sphinx-Gallery