Enhanced week over week models

Week over week model is a useful tool in business applications, where time series exhibits strong weekly seasonality. It’s fast and somewhat accurate. Typical drawbacks of week over week models include not adapting to seasonality (e.g. year-end), fast growth and holiday effects. Also, week over week model is vulnerable to corrupted data such as outliers on last week.

Using aggregated lags such like week over 3 weeks median is more robust to data corruption, but the growth/seasonality/holiday issue is not resolved.

The enhanced version of week over week model fits a two-step model with the MultistageForecast method in Greykite. It first uses a Silverkite model to learn the growth, yearly seasonality and holiday effects. Then it uses a week over week or other lag-based models to model the residual weekly patterns.

In this example, we will learn how to do the original week over week type models and how to use the enhanced versions.

The regular week over week models

Greykite supports the regular lag-based models through LagBasedTemplate. To see a general introduction of how to use model templates, see model templates.

Lag-based methods are invoked by specifying the LAG_BASED model template.

37 import warnings
38
39 import pandas as pd
40 from greykite.common.data_loader import DataLoader
41 from greykite.common.aggregation_function_enum import AggregationFunctionEnum
42 from greykite.common import constants as cst
43 from greykite.framework.templates.forecaster import Forecaster
44 from greykite.framework.templates.model_templates import ModelTemplateEnum
45 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
46 from greykite.framework.templates.autogen.forecast_config import MetadataParam
47 from greykite.framework.templates.autogen.forecast_config import ModelComponentsParam
48 from greykite.framework.templates.autogen.forecast_config import EvaluationPeriodParam
49 from greykite.framework.templates.multistage_forecast_template import MultistageForecastTemplateConfig
50 from greykite.sklearn.estimator.lag_based_estimator import LagUnitEnum
51
52 warnings.filterwarnings("ignore")
53
54 df = DataLoader().load_peyton_manning()
55 df[cst.TIME_COL] = pd.to_datetime(df[cst.TIME_COL])

We specify the data set and evaluation parameters below. First, we don’t specify model components. In this case, the default behavior for LAG_BASED model template is the week over week model. If the forecast horizon is longer than a week, the model will use the forecasted value to generate further forecasts.

65 metadata = MetadataParam(
66     time_col=cst.TIME_COL,
67     value_col=cst.VALUE_COL,
68     freq="D"
69 )
70
71 # Turn off cv and test for faster run.
72 evaluation = EvaluationPeriodParam(
73     cv_max_splits=0,
74     test_horizon=0
75 )
76
77 config = ForecastConfig(
78     forecast_horizon=7,
79     model_template=ModelTemplateEnum.LAG_BASED.name,
80     metadata_param=metadata,
81     evaluation_period_param=evaluation
82 )
83
84 forecaster = Forecaster()
85 result = forecaster.run_forecast_config(
86     df=df,
87     config=config
88 )

This is the simple week over week estimation. If we print the results, we can see that the predictions are exactly the same as the last week’s observations.

ts actual forecast
2950 2016-01-07 8.295798 8.004700
2951 2016-01-08 8.290293 7.589336
2952 2016-01-09 7.785721 7.825245
2953 2016-01-10 8.281724 8.249314
2954 2016-01-11 8.470730 9.295141
2955 2016-01-12 8.135054 8.568266
2956 2016-01-13 8.067149 8.352554
2957 2016-01-14 8.023552 8.295798
2958 2016-01-15 8.021913 8.290293
2959 2016-01-16 7.817223 7.785721
2960 2016-01-17 9.273878 8.281724
2961 2016-01-18 10.333775 8.470730
2962 2016-01-19 9.125871 8.135054
2963 2016-01-20 8.891374 8.067149


In general, the lag-based method supports any aggregation of any lag combinations. Now let’s use an example to demonstrate how to do a week-over-3-week median estimation. We override the parameters in ModelComponentsParam.custom dictionary. The parameters that can be customized are

  • lag_unit: the unit of the lags. Options are in LagUnitEnum.

  • lags: a list of integers indicating the lags in lag_unit.

  • agg_func: the aggregation function name. Options are in AggregationFunctionEnum.

  • agg_func_params: a dictionary of parameters to be passed to the aggregation function.

Specifying the following, the forecasts will become week-over-3-week median.

114 model_components = ModelComponentsParam(
115     custom=dict(
116         lag_unit=LagUnitEnum.week.name,                 # unit is "week"
117         lags=[1, 2, 3],                                 # lags are 1 week, 2 weeks and 3 weeks
118         agg_func=AggregationFunctionEnum.median.name    # aggregation function is "median"
119     )
120 )
121
122 config = ForecastConfig(
123     forecast_horizon=7,
124     model_template=ModelTemplateEnum.LAG_BASED.name,
125     metadata_param=metadata,
126     evaluation_period_param=evaluation,
127     model_components_param=model_components
128 )
129
130 forecaster = Forecaster()
131 result = forecaster.run_forecast_config(
132     df=df,
133     config=config
134 )
135
136 result.forecast.df_train.tail(14)
ts actual forecast
2950 2016-01-07 8.295798 7.591862
2951 2016-01-08 8.290293 7.528869
2952 2016-01-09 7.785721 7.171657
2953 2016-01-10 8.281724 8.249314
2954 2016-01-11 8.470730 9.250618
2955 2016-01-12 8.135054 8.568266
2956 2016-01-13 8.067149 8.352554
2957 2016-01-14 8.023552 8.004700
2958 2016-01-15 8.021913 7.589336
2959 2016-01-16 7.817223 7.785721
2960 2016-01-17 9.273878 8.281724
2961 2016-01-18 10.333775 9.250618
2962 2016-01-19 9.125871 8.568266
2963 2016-01-20 8.891374 8.352554


The enhanced week over week model

The enhanced week over week model consists of a two-stage model:

  • "Silverkite model": the first stage uses a Silverkite model to learn the yearly seasonality, growth and holiday effects.

  • "Lag-based model": the second stage uses a lag-based model to learn the residual effects including weekly seasonality.

The model is available through the MultistageForecastTemplate. For details about the multistage forecast model, see multistage forecast.

To use this two-stage enhanced lag model, specify the model template as SILVERKITE_WOW. The default behavior is to model growth, yearly seasonality and holidays with the automatically inferred parameters from the time series. Then it models the residual with a week over week model.

159 config = ForecastConfig(
160     forecast_horizon=7,
161     model_template=ModelTemplateEnum.SILVERKITE_WOW.name,
162     metadata_param=metadata,
163     evaluation_period_param=evaluation
164 )
165
166 forecaster = Forecaster()
167 result = forecaster.run_forecast_config(
168     df=df,
169     config=config
170 )
171
172 result.forecast.df_train.tail(14)
ts actual forecast
2950 2016-01-07 8.295798 8.232482
2951 2016-01-08 8.290293 7.812007
2952 2016-01-09 7.785721 8.041193
2953 2016-01-10 8.281724 8.457185
2954 2016-01-11 8.470730 9.493694
2955 2016-01-12 8.135054 8.756363
2956 2016-01-13 8.067149 8.529145
2957 2016-01-14 8.023552 8.459905
2958 2016-01-15 8.021913 8.440993
2959 2016-01-16 7.817223 7.922130
2960 2016-01-17 9.273878 8.402977
2961 2016-01-18 10.333775 9.993372
2962 2016-01-19 9.125871 8.223415
2963 2016-01-20 8.891374 8.137760


You may notice that the forecast is not exactly the observations a week ago, because the Silverkite model did some adjustments on the growth, yearly seasonality and holidays.

To override the model parameters, we will follow the rules mentioned in multistage forecast. For each stage of model, if you would like to just change one parameter and keep the other parameters the same, you can specify the same model template for the stage as in SILVERKITE_WOW (they are SILVERKITE_EMPTY and LAG_BASED), and specify a model components object to override the specific parameter. Otherwise, you can specify a new model template. The code below overrides both the Silverkite model and the lag model. In the first stage, it keeps the original configuration but forces turning yearly seasonality off. In the second stage, it uses week-over-3-week median instead of wow model.

191 model_components = ModelComponentsParam(
192     custom=dict(
193         multistage_forecast_configs=[
194             MultistageForecastTemplateConfig(
195                 train_length="1096D",
196                 fit_length=None,
197                 agg_func="nanmean",
198                 agg_freq="D",
199                 # Keeps it the same as the model template in `SILVERKITE_WOW` to override selected parameters below
200                 model_template=ModelTemplateEnum.SILVERKITE_EMPTY.name,
201                 # Since the model template in this stage is the same as the model template in `SILVERKITE_WOW`,
202                 # the parameter below will be applied on top of the existing parameters.
203                 model_components=ModelComponentsParam(
204                     seasonality={
205                         "yearly_seasonality": False  # force turning off yearly seasonality
206                     }
207                 )
208             ),
209             MultistageForecastTemplateConfig(
210                 train_length="28D",  # any value longer than the lags (21D here)
211                 fit_length=None,  # keep as None
212                 agg_func="nanmean",
213                 agg_freq=None,
214                 # Keeps it the same as the model template in `SILVERKITE_WOW` to override selected parameters below
215                 model_template=ModelTemplateEnum.LAG_BASED.name,
216                 # Since the model template in this stage is the same as the model template in `SILVERKITE_WOW`,
217                 # the parameter below will be applied on top of the existing parameters.
218                 model_components=ModelComponentsParam(
219                     custom={
220                         "lags": [1, 2, 3],  # changes to 3 weeks' median, default unit is "week",
221                         "lag_unit": LagUnitEnum.week.name,
222                         "agg_func": AggregationFunctionEnum.median.name,  # changes to 3 weeks' median
223                     }
224                 )
225             )
226         ]
227     )
228 )
229
230 config = ForecastConfig(
231     forecast_horizon=7,
232     model_template=ModelTemplateEnum.LAG_BASED.name,
233     metadata_param=metadata,
234     evaluation_period_param=evaluation,
235     model_components_param=model_components
236 )
237
238 forecaster = Forecaster()
239 result = forecaster.run_forecast_config(
240     df=df,
241     config=config
242 )
243
244 result.forecast.df_train.tail(14)
ts actual forecast
2950 2016-01-07 8.295798 8.004700
2951 2016-01-08 8.290293 7.589336
2952 2016-01-09 7.785721 7.825245
2953 2016-01-10 8.281724 8.249314
2954 2016-01-11 8.470730 9.295141
2955 2016-01-12 8.135054 8.568266
2956 2016-01-13 8.067149 8.352554
2957 2016-01-14 8.023552 8.295798
2958 2016-01-15 8.021913 8.290293
2959 2016-01-16 7.817223 7.785721
2960 2016-01-17 9.273878 8.281724
2961 2016-01-18 10.333775 8.470730
2962 2016-01-19 9.125871 8.135054
2963 2016-01-20 8.891374 8.067149


Gallery generated by Sphinx-Gallery