Note
Click here to download the full example code
Enhanced week over week models
Week over week model is a useful tool in business applications, where time series exhibits strong weekly seasonality. It’s fast and somewhat accurate. Typical drawbacks of week over week models include not adapting to seasonality (e.g. year-end), fast growth and holiday effects. Also, week over week model is vulnerable to corrupted data such as outliers on last week.
Using aggregated lags such like week over 3 weeks median is more robust to data corruption, but the growth/seasonality/holiday issue is not resolved.
The enhanced version of week over week model fits a two-step
model with the MultistageForecast
method in Greykite.
It first uses a Silverkite
model to learn the growth,
yearly seasonality and holiday effects.
Then it uses a week over week or other lag-based models to model the residual
weekly patterns.
In this example, we will learn how to do the original week over week type models and how to use the enhanced versions.
The regular week over week models
Greykite supports the regular lag-based models through LagBasedTemplate
.
To see a general introduction of how to use model templates,
see model templates.
Lag-based methods are invoked by specifying the LAG_BASED
model template.
37 import warnings
38
39 import pandas as pd
40 from greykite.common.data_loader import DataLoader
41 from greykite.common.aggregation_function_enum import AggregationFunctionEnum
42 from greykite.common import constants as cst
43 from greykite.framework.templates.forecaster import Forecaster
44 from greykite.framework.templates.model_templates import ModelTemplateEnum
45 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
46 from greykite.framework.templates.autogen.forecast_config import MetadataParam
47 from greykite.framework.templates.autogen.forecast_config import ModelComponentsParam
48 from greykite.framework.templates.autogen.forecast_config import EvaluationPeriodParam
49 from greykite.framework.templates.multistage_forecast_template import MultistageForecastTemplateConfig
50 from greykite.sklearn.estimator.lag_based_estimator import LagUnitEnum
51
52 warnings.filterwarnings("ignore")
53
54 df = DataLoader().load_peyton_manning()
55 df[cst.TIME_COL] = pd.to_datetime(df[cst.TIME_COL])
We specify the data set and evaluation parameters below.
First, we don’t specify model components.
In this case, the default behavior for LAG_BASED
model template
is the week over week model.
If the forecast horizon is longer than a week,
the model will use the forecasted value to generate further forecasts.
65 metadata = MetadataParam(
66 time_col=cst.TIME_COL,
67 value_col=cst.VALUE_COL,
68 freq="D"
69 )
70
71 # Turn off cv and test for faster run.
72 evaluation = EvaluationPeriodParam(
73 cv_max_splits=0,
74 test_horizon=0
75 )
76
77 config = ForecastConfig(
78 forecast_horizon=7,
79 model_template=ModelTemplateEnum.LAG_BASED.name,
80 metadata_param=metadata,
81 evaluation_period_param=evaluation
82 )
83
84 forecaster = Forecaster()
85 result = forecaster.run_forecast_config(
86 df=df,
87 config=config
88 )
This is the simple week over week estimation. If we print the results, we can see that the predictions are exactly the same as the last week’s observations.
In general, the lag-based method supports any
aggregation of any lag combinations.
Now let’s use an example to demonstrate how to do a
week-over-3-week median estimation.
We override the parameters in ModelComponentsParam.custom
dictionary.
The parameters that can be customized are
lag_unit
: the unit of the lags. Options are inLagUnitEnum
.
lags
: a list of integers indicating the lags inlag_unit
.
agg_func
: the aggregation function name. Options are inAggregationFunctionEnum
.
agg_func_params
: a dictionary of parameters to be passed to the aggregation function.
Specifying the following, the forecasts will become week-over-3-week median.
114 model_components = ModelComponentsParam(
115 custom=dict(
116 lag_unit=LagUnitEnum.week.name, # unit is "week"
117 lags=[1, 2, 3], # lags are 1 week, 2 weeks and 3 weeks
118 agg_func=AggregationFunctionEnum.median.name # aggregation function is "median"
119 )
120 )
121
122 config = ForecastConfig(
123 forecast_horizon=7,
124 model_template=ModelTemplateEnum.LAG_BASED.name,
125 metadata_param=metadata,
126 evaluation_period_param=evaluation,
127 model_components_param=model_components
128 )
129
130 forecaster = Forecaster()
131 result = forecaster.run_forecast_config(
132 df=df,
133 config=config
134 )
135
136 result.forecast.df_train.tail(14)
The enhanced week over week model
The enhanced week over week model consists of a two-stage model:
"Silverkite model"
: the first stage uses a Silverkite model to learn the yearly seasonality, growth and holiday effects.
"Lag-based model"
: the second stage uses a lag-based model to learn the residual effects including weekly seasonality.
The model is available through the MultistageForecastTemplate
.
For details about the multistage forecast model, see
multistage forecast.
To use this two-stage enhanced lag model,
specify the model template as SILVERKITE_WOW
.
The default behavior is to model growth, yearly seasonality and holidays
with the automatically inferred parameters from the time series.
Then it models the residual with a week over week model.
159 config = ForecastConfig(
160 forecast_horizon=7,
161 model_template=ModelTemplateEnum.SILVERKITE_WOW.name,
162 metadata_param=metadata,
163 evaluation_period_param=evaluation
164 )
165
166 forecaster = Forecaster()
167 result = forecaster.run_forecast_config(
168 df=df,
169 config=config
170 )
171
172 result.forecast.df_train.tail(14)
You may notice that the forecast is not exactly the observations a week ago, because the Silverkite model did some adjustments on the growth, yearly seasonality and holidays.
To override the model parameters, we will follow the rules mentioned in
multistage forecast.
For each stage of model, if you would like to just change one parameter
and keep the other parameters the same,
you can specify the same model template for the stage as in SILVERKITE_WOW
(they are SILVERKITE_EMPTY
and LAG_BASED
),
and specify a model components object to override the specific parameter.
Otherwise, you can specify a new model template.
The code below overrides both the Silverkite model and the lag model.
In the first stage, it keeps the original configuration but forces turning yearly seasonality off.
In the second stage, it uses week-over-3-week median instead of wow model.
191 model_components = ModelComponentsParam(
192 custom=dict(
193 multistage_forecast_configs=[
194 MultistageForecastTemplateConfig(
195 train_length="1096D",
196 fit_length=None,
197 agg_func="nanmean",
198 agg_freq="D",
199 # Keeps it the same as the model template in `SILVERKITE_WOW` to override selected parameters below
200 model_template=ModelTemplateEnum.SILVERKITE_EMPTY.name,
201 # Since the model template in this stage is the same as the model template in `SILVERKITE_WOW`,
202 # the parameter below will be applied on top of the existing parameters.
203 model_components=ModelComponentsParam(
204 seasonality={
205 "yearly_seasonality": False # force turning off yearly seasonality
206 }
207 )
208 ),
209 MultistageForecastTemplateConfig(
210 train_length="28D", # any value longer than the lags (21D here)
211 fit_length=None, # keep as None
212 agg_func="nanmean",
213 agg_freq=None,
214 # Keeps it the same as the model template in `SILVERKITE_WOW` to override selected parameters below
215 model_template=ModelTemplateEnum.LAG_BASED.name,
216 # Since the model template in this stage is the same as the model template in `SILVERKITE_WOW`,
217 # the parameter below will be applied on top of the existing parameters.
218 model_components=ModelComponentsParam(
219 custom={
220 "lags": [1, 2, 3], # changes to 3 weeks' median, default unit is "week",
221 "lag_unit": LagUnitEnum.week.name,
222 "agg_func": AggregationFunctionEnum.median.name, # changes to 3 weeks' median
223 }
224 )
225 )
226 ]
227 )
228 )
229
230 config = ForecastConfig(
231 forecast_horizon=7,
232 model_template=ModelTemplateEnum.LAG_BASED.name,
233 metadata_param=metadata,
234 evaluation_period_param=evaluation,
235 model_components_param=model_components
236 )
237
238 forecaster = Forecaster()
239 result = forecaster.run_forecast_config(
240 df=df,
241 config=config
242 )
243
244 result.forecast.df_train.tail(14)