Example for monthly data

This is a basic example for monthly data using Silverkite. Note that here we are fitting a few simple models and the goal is not to optimize the results as much as possible.

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
 import warnings
 from collections import defaultdict

 import plotly
 import pandas as pd

 from greykite.framework.benchmark.data_loader_ts import DataLoaderTS
 from greykite.framework.templates.autogen.forecast_config import EvaluationPeriodParam
 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
 from greykite.framework.templates.autogen.forecast_config import MetadataParam
 from greykite.framework.templates.autogen.forecast_config import ModelComponentsParam
 from greykite.framework.templates.forecaster import Forecaster
 from greykite.framework.templates.model_templates import ModelTemplateEnum
 from greykite.framework.utils.result_summary import summarize_grid_search_results
 from greykite.framework.input.univariate_time_series import UnivariateTimeSeries

 warnings.filterwarnings("ignore")

Loads dataset into UnivariateTimeSeries.

30
31
32
33
34
35
36
37
38
39
40
41
 dl = DataLoaderTS()
 agg_func = {"count": "sum"}
 df = dl.load_bikesharing(agg_freq="monthly", agg_func=agg_func)
 # In this monthly data the last month data is incomplete, therefore we drop it
 df.drop(df.tail(1).index,inplace=True)
 df.reset_index(drop=True)
 ts = UnivariateTimeSeries()
 ts.load_data(
     df=df,
     time_col="ts",
     value_col="count",
     freq="MS")

Out:

<greykite.framework.input.univariate_time_series.UnivariateTimeSeries object at 0x19ed3f3d0>

Exploratory data analysis (EDA)

After reading in a time series, we could first do some exploratory data analysis. The UnivariateTimeSeries class is used to store a timeseries and perform EDA.

A quick description of the data can be obtained as follows.

52
53
54
 print(ts.describe_time_col())
 print(ts.describe_value_col())
 print(df.head())

Out:

{'data_points': 108, 'mean_increment_secs': 2629143.925233645, 'min_timestamp': Timestamp('2010-09-01 00:00:00'), 'max_timestamp': Timestamp('2019-08-01 00:00:00')}
count       108.000000
mean     231254.101852
std      106017.804606
min        4001.000000
25%      144661.750000
50%      227332.000000
75%      327851.250000
max      404811.000000
Name: y, dtype: float64
          ts  count
0 2010-09-01   4001
1 2010-10-01  35949
2 2010-11-01  47391
3 2010-12-01  28253
4 2011-01-01  37499

Let’s plot the original timeseries. (The interactive plot is generated by plotly: click to zoom!)

59
60
 fig = ts.plot()
 plotly.io.show(fig)

Exploratory plots can be plotted to reveal the time series’s properties. Monthly overlay plot can be used to inspect the annual patterns. This plot overlays various years on top of each other.

66
67
68
69
70
71
72
73
74
75
76
77
 fig = ts.plot_quantiles_and_overlays(
      groupby_time_feature="month",
      show_mean=False,
      show_quantiles=False,
      show_overlays=True,
      overlay_label_time_feature="year",
      overlay_style={"line": {"width": 1}, "opacity": 0.5},
      center_values=False,
      xlabel="month of year",
      ylabel=ts.original_value_col,
      title="yearly seasonality for each year (centered)",)
 plotly.io.show(fig)

Specify common metadata.

81
82
83
84
85
86
87
88
 forecast_horizon = 4
 time_col = "ts"
 value_col = "count"
 meta_data_params = MetadataParam(
     time_col=time_col,
     value_col=value_col,
     freq="MS",
 )

Specify common evaluation parameters. Set minimum input data for training.

 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
 cv_min_train_periods = 24
 # Let CV use most recent splits for cross-validation.
 cv_use_most_recent_splits = True
 # Determine the maximum number of validations.
 cv_max_splits = 5
 evaluation_period_param = EvaluationPeriodParam(
     test_horizon=forecast_horizon,
     cv_horizon=forecast_horizon,
     periods_between_train_test=0,
     cv_min_train_periods=cv_min_train_periods,
     cv_expanding_window=True,
     cv_use_most_recent_splits=cv_use_most_recent_splits,
     cv_periods_between_splits=None,
     cv_periods_between_train_test=0,
     cv_max_splits=cv_max_splits,
 )

Fit a simple model without autoregression. The important modeling parameters for monthly data are as follows. These are plugged into ModelComponentsParam. The extra_pred_cols is used to specify growth and annual seasonality Growth is modelled with both “ct_sqrt”, “ct1” for extra flexibility as we have longterm data and ridge regularization will avoid over-fitting the trend. The annual seasonality is modelled categorically with “C(month)” instead of Fourier series. This is because in monthly data, the number of data points in year is rather small (12) as opposed to daily data where there are many points in the year, which makes categorical representation non-feasible. The categorical representation of monthly also is more explainable/interpretable in the model summary.

123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
 extra_pred_cols = ["ct_sqrt", "ct1", "C(month, levels=list(range(1, 13)))"]
 autoregression = None

 # Specify the model parameters
 model_components = ModelComponentsParam(
     growth=dict(growth_term=None),
     seasonality=dict(
         yearly_seasonality=[False],
         quarterly_seasonality=[False],
         monthly_seasonality=[False],
         weekly_seasonality=[False],
         daily_seasonality=[False]
     ),
     custom=dict(
         fit_algorithm_dict=dict(fit_algorithm="ridge"),
         extra_pred_cols=extra_pred_cols
     ),
     regressors=dict(regressor_cols=None),
     autoregression=autoregression,
     uncertainty=dict(uncertainty_dict=None),
     events=dict(holiday_lookup_countries=None),
 )

 # Run the forecast model
 forecaster = Forecaster()
 result = forecaster.run_forecast_config(
     df=df,
     config=ForecastConfig(
         model_template=ModelTemplateEnum.SILVERKITE.name,
         coverage=0.95,
         forecast_horizon=forecast_horizon,
         metadata_param=meta_data_params,
         evaluation_period_param=evaluation_period_param,
         model_components_param=model_components
     )
 )

 # Get the useful fields from the forecast result
 model = result.model[-1]
 backtest = result.backtest
 forecast = result.forecast
 grid_search = result.grid_search

 # Check model coefficients / variables
 # Get model summary with p-values
 print(model.summary())

 # Get cross-validation results
 cv_results = summarize_grid_search_results(
     grid_search=grid_search,
     decimals=2,
     cv_report_metrics=None,
     column_order=[
         "rank", "mean_test", "split_test", "mean_train", "split_train",
         "mean_fit_time", "mean_score_time", "params"])
 # Transposes to save space in the printed output
 print(cv_results.transpose())

 # Check historical evaluation metrics (on the historical training/test set).
 backtest_eval = defaultdict(list)
 for metric, value in backtest.train_evaluation.items():
     backtest_eval[metric].append(value)
     backtest_eval[metric].append(backtest.test_evaluation[metric])
 metrics = pd.DataFrame(backtest_eval, index=["train", "test"]).T
 print(metrics)

Out:

Fitting 5 folds for each of 1 candidates, totalling 5 fits
================================ Model Summary =================================

Number of observations: 108,   Number of features: 21
Method: Ridge regression
Number of nonzero features: 21
Regularization parameter: 0.01269

Residuals:
         Min           1Q       Median           3Q          Max
  -5.631e+04   -2.219e+04       2946.0    2.172e+04    6.649e+04

            Pred_col    Estimate   Std. Err Pr(>)_boot sig. code                     95%CI
           Intercept  -9.460e+04  3.439e+04      0.010         *  (-1.464e+05, -1.203e+04)
 C(month,... 13)))_2      5660.0  1.875e+04      0.740             (-3.299e+04, 4.029e+04)
 C(month,... 13)))_3   6.530e+04  1.754e+04     <2e-16       ***    (3.438e+04, 1.028e+05)
 C(month,... 13)))_4   1.362e+05  1.590e+04     <2e-16       ***    (1.045e+05, 1.677e+05)
 C(month,... 13)))_5   1.534e+05  1.657e+04     <2e-16       ***    (1.215e+05, 1.872e+05)
 C(month,... 13)))_6   1.675e+05  1.782e+04      0.002        **    (1.370e+05, 2.002e+05)
 C(month,... 13)))_7   1.756e+05  1.671e+04     <2e-16       ***    (1.417e+05, 2.069e+05)
 C(month,... 13)))_8   1.758e+05  1.689e+04     <2e-16       ***    (1.427e+05, 2.092e+05)
 C(month,... 13)))_9   1.477e+05  1.749e+04     <2e-16       ***    (1.112e+05, 1.828e+05)
 C(month,...13)))_10   1.345e+05  1.645e+04     <2e-16       ***    (1.019e+05, 1.675e+05)
 C(month,...13)))_11   6.066e+04  1.500e+04     <2e-16       ***    (3.115e+04, 8.971e+04)
 C(month,...13)))_12   1.422e+04  1.748e+04      0.404             (-1.796e+04, 4.928e+04)
             ct_sqrt   3.313e+05  1.175e+05      0.004        **    (3.869e+04, 4.618e+05)
                 ct1   3.895e+04  1.280e+05      0.782             (-1.761e+05, 2.901e+05)
   cp0_2011_12_31_00   2.954e+04  8.431e+04      0.726             (-1.244e+05, 2.166e+05)
   cp1_2012_01_30_00   1.218e+04  8.109e+04      0.880             (-1.390e+05, 1.885e+05)
   cp2_2012_12_31_00  -7.390e+04  1.024e+05      0.472             (-2.949e+05, 1.051e+05)
   cp3_2014_12_30_00  -1.254e+04  6.107e+04      0.822             (-1.265e+05, 1.166e+05)
   cp4_2015_02_01_00   4.932e+04  4.751e+04      0.310             (-4.634e+04, 1.330e+05)
   cp5_2015_04_29_00  -3.631e+04  9.086e+04      0.708             (-2.161e+05, 1.553e+05)
   cp6_2017_08_31_00  -7.053e+04  2.199e+04      0.002        **  (-1.126e+05, -2.873e+04)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.9248,   Adjusted R-squared: 0.9113
F-statistic: 68.337 on 16 and 90 DF,   p-value: 1.110e-16
Model AIC: 2759.1,   model BIC: 2805.3

WARNING: the condition number is large, 2.44e+04. This might indicate that there are strong multicollinearity or other numerical problems.
WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.

                                                   0
rank_test_MAPE                                     1
mean_test_MAPE                                 17.95
split_test_MAPE   (16.97, 21.68, 5.09, 23.25, 22.77)
mean_train_MAPE                                30.74
split_train_MAPE  (34.41, 28.6, 31.42, 29.18, 30.07)
mean_fit_time                                    1.5
mean_score_time                                 0.24
params                                            []
                                                          train         test
CORR                                                   0.959601     0.959809
R2                                                     0.920783     -2.06113
MSE                                                 8.70384e+08  2.24026e+08
RMSE                                                    29502.3      14967.5
MAE                                                     25057.5      14721.3
MedAE                                                   23885.3      13428.2
MAPE                                                    31.1795      4.18228
MedAPE                                                  9.40904      3.79933
sMAPE                                                   10.5786      2.13708
Q80                                                     12528.8        11777
Q95                                                     12528.8      13985.2
Q99                                                     12528.8      14574.1
OutsideTolerance1p                                     0.980769            1
OutsideTolerance2p                                     0.894231            1
OutsideTolerance3p                                     0.836538            1
OutsideTolerance4p                                     0.826923         0.25
OutsideTolerance5p                                     0.740385         0.25
Outside Tolerance (fraction)                               None         None
R2_null_model_score                                        None         None
Prediction Band Width (%)                               98.5155      33.1166
Prediction Band Coverage (fraction)                    0.980769            1
Coverage: Lower Band                                        0.5            0
Coverage: Upper Band                                   0.480769            1
Coverage Diff: Actual_Coverage - Intended_Coverage    0.0307692         0.05

Fit/backtest plot:

191
192
 fig = backtest.plot()
 plotly.io.show(fig)

Forecast plot:

196
197
 fig = forecast.plot()
 plotly.io.show(fig)

The components plot:

201
202
 fig = forecast.plot_components()
 plotly.io.show(fig)

Fit a simple model with autoregression. This is done by specifying the autoregression parameter in ModelComponentsParam. Note that the auto-regressive structure can be customized further depending on your data.

208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
 extra_pred_cols = ["ct_sqrt", "ct1", "C(month, levels=list(range(1, 13)))"]
 autoregression = {
     "autoreg_dict": {
         "lag_dict": {"orders": [1]},
         "agg_lag_dict": None
     }
 }

 # Specify the model parameters
 model_components = ModelComponentsParam(
     growth=dict(growth_term=None),
     seasonality=dict(
         yearly_seasonality=[False],
         quarterly_seasonality=[False],
         monthly_seasonality=[False],
         weekly_seasonality=[False],
         daily_seasonality=[False]
     ),
     custom=dict(
         fit_algorithm_dict=dict(fit_algorithm="ridge"),
         extra_pred_cols=extra_pred_cols
     ),
     regressors=dict(regressor_cols=None),
     autoregression=autoregression,
     uncertainty=dict(uncertainty_dict=None),
     events=dict(holiday_lookup_countries=None),
 )

 # Run the forecast model
 forecaster = Forecaster()
 result = forecaster.run_forecast_config(
     df=df,
     config=ForecastConfig(
         model_template=ModelTemplateEnum.SILVERKITE.name,
         coverage=0.95,
         forecast_horizon=forecast_horizon,
         metadata_param=meta_data_params,
         evaluation_period_param=evaluation_period_param,
         model_components_param=model_components
     )
 )

 # Get the useful fields from the forecast result
 model = result.model[-1]
 backtest = result.backtest
 forecast = result.forecast
 grid_search = result.grid_search

 # Check model coefficients / variables
 # Get model summary with p-values
 print(model.summary())

 # Get cross-validation results
 cv_results = summarize_grid_search_results(
     grid_search=grid_search,
     decimals=2,
     cv_report_metrics=None,
     column_order=[
         "rank", "mean_test", "split_test", "mean_train", "split_train",
         "mean_fit_time", "mean_score_time", "params"])
 # Transposes to save space in the printed output
 print(cv_results.transpose())

 # Check historical evaluation metrics (on the historical training/test set).
 backtest_eval = defaultdict(list)
 for metric, value in backtest.train_evaluation.items():
     backtest_eval[metric].append(value)
     backtest_eval[metric].append(backtest.test_evaluation[metric])
 metrics = pd.DataFrame(backtest_eval, index=["train", "test"]).T
 print(metrics)

Out:

Fitting 5 folds for each of 1 candidates, totalling 5 fits
================================ Model Summary =================================

Number of observations: 108,   Number of features: 22
Method: Ridge regression
Number of nonzero features: 22
Regularization parameter: 0.0621

Residuals:
         Min           1Q       Median           3Q          Max
  -5.655e+04   -1.618e+04      -1849.0    1.957e+04    6.007e+04

            Pred_col    Estimate   Std. Err Pr(>)_boot sig. code                    95%CI
           Intercept  -2.605e+04  1.765e+04      0.128               (-6.082e+04, 5663.0)
 C(month,... 13)))_2   1.142e+04  1.253e+04      0.336            (-1.417e+04, 3.417e+04)
 C(month,... 13)))_3   6.686e+04  1.407e+04     <2e-16       ***   (4.060e+04, 9.746e+04)
 C(month,... 13)))_4   1.060e+05  1.553e+04     <2e-16       ***   (7.367e+04, 1.327e+05)
 C(month,... 13)))_5   8.563e+04  1.535e+04     <2e-16       ***   (5.626e+04, 1.173e+05)
 C(month,... 13)))_6   9.056e+04  1.626e+04     <2e-16       ***   (5.808e+04, 1.204e+05)
 C(month,... 13)))_7   9.126e+04  1.611e+04     <2e-16       ***   (5.848e+04, 1.234e+05)
 C(month,... 13)))_8   8.720e+04  1.615e+04     <2e-16       ***   (5.631e+04, 1.183e+05)
 C(month,... 13)))_9   6.215e+04  1.638e+04     <2e-16       ***   (3.267e+04, 9.527e+04)
 C(month,...13)))_10   6.108e+04  1.480e+04     <2e-16       ***   (3.271e+04, 9.215e+04)
 C(month,...13)))_11     -6119.0  1.718e+04      0.688            (-4.439e+04, 2.556e+04)
 C(month,...13)))_12  -1.324e+04  1.319e+04      0.286            (-4.037e+04, 1.257e+04)
             ct_sqrt   9.290e+04  3.920e+04      0.012         *      (8616.0, 1.634e+05)
                 ct1   4.863e+04  2.159e+04      0.024         *      (7906.0, 9.485e+04)
   cp0_2011_12_31_00   2.021e+04  2.720e+04      0.430            (-2.641e+04, 7.852e+04)
   cp1_2012_01_30_00   1.920e+04  2.499e+04      0.412            (-2.329e+04, 7.171e+04)
   cp2_2012_12_31_00  -3.002e+04  3.298e+04      0.370            (-9.062e+04, 3.715e+04)
   cp3_2014_12_30_00      -945.8  1.874e+04      0.966            (-3.848e+04, 3.266e+04)
   cp4_2015_02_01_00      1769.0  1.357e+04      0.892            (-2.739e+04, 2.510e+04)
   cp5_2015_04_29_00  -1.569e+04  3.222e+04      0.630            (-7.719e+04, 5.183e+04)
   cp6_2017_08_31_00  -3.195e+04  1.898e+04      0.090         .     (-6.886e+04, 7482.0)
              y_lag1   2.133e+05  2.997e+04     <2e-16       ***   (1.530e+05, 2.681e+05)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.9451,   Adjusted R-squared: 0.9355
F-statistic: 97.446 on 15 and 91 DF,   p-value: 1.110e-16
Model AIC: 2724.5,   model BIC: 2769.8

WARNING: the condition number is large, 5.60e+03. This might indicate that there are strong multicollinearity or other numerical problems.
WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.

                                                    0
rank_test_MAPE                                      1
mean_test_MAPE                                  16.81
split_test_MAPE     (14.18, 11.56, 9.9, 29.35, 19.04)
mean_train_MAPE                                 22.43
split_train_MAPE  (23.53, 22.22, 22.94, 22.04, 21.44)
mean_fit_time                                    1.34
mean_score_time                                  1.84
params                                             []
                                                          train         test
CORR                                                   0.970891     0.810576
R2                                                     0.942621     0.171875
MSE                                                 6.30447e+08  6.06056e+07
RMSE                                                    25108.7      7784.96
MAE                                                       20697      6872.87
MedAE                                                   18654.1      6918.05
MAPE                                                    20.9768      1.95055
MedAPE                                                  8.37506      1.99794
sMAPE                                                   8.81036     0.983418
Q80                                                     10348.5      4526.72
Q95                                                     10348.5      5071.85
Q99                                                     10348.5      5217.22
OutsideTolerance1p                                     0.932692         0.75
OutsideTolerance2p                                        0.875          0.5
OutsideTolerance3p                                     0.788462         0.25
OutsideTolerance4p                                         0.75            0
OutsideTolerance5p                                     0.673077            0
Outside Tolerance (fraction)                               None         None
R2_null_model_score                                        None         None
Prediction Band Width (%)                               83.8443      18.4221
Prediction Band Coverage (fraction)                    0.951923            1
Coverage: Lower Band                                   0.490385         0.25
Coverage: Upper Band                                   0.461538         0.75
Coverage Diff: Actual_Coverage - Intended_Coverage   0.00192308         0.05

Fit/backtest plot:

281
282
 fig = backtest.plot()
 plotly.io.show(fig)

Forecast plot:

286
287
 fig = forecast.plot()
 plotly.io.show(fig)

The components plot:

291
292
 fig = forecast.plot_components()
 plotly.io.show(fig)

Fit a model with time-varying seasonality (month effect). This is achieved by adding "ct1*C(month)" to ModelComponentsParam. Note that this feature may or may not be useful in your use case. We have included this for demonstration purposes only. In this example, while the fit has improved the backtest is inferior to the previous setting.

300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
 extra_pred_cols = ["ct_sqrt", "ct1", "C(month, levels=list(range(1, 13)))",
                    "ct1*C(month, levels=list(range(1, 13)))"]
 autoregression = {
     "autoreg_dict": {
         "lag_dict": {"orders": [1]},
         "agg_lag_dict": None
     }
 }

 # Specify the model parameters
 model_components = ModelComponentsParam(
     growth=dict(growth_term=None),
     seasonality=dict(
         yearly_seasonality=[False],
         quarterly_seasonality=[False],
         monthly_seasonality=[False],
         weekly_seasonality=[False],
         daily_seasonality=[False]
     ),
     custom=dict(
         fit_algorithm_dict=dict(fit_algorithm="ridge"),
         extra_pred_cols=extra_pred_cols
     ),
     regressors=dict(regressor_cols=None),
     autoregression=autoregression,
     uncertainty=dict(uncertainty_dict=None),
     events=dict(holiday_lookup_countries=None),
 )

 # Run the forecast model
 forecaster = Forecaster()
 result = forecaster.run_forecast_config(
     df=df,
     config=ForecastConfig(
         model_template=ModelTemplateEnum.SILVERKITE.name,
         coverage=0.95,
         forecast_horizon=forecast_horizon,
         metadata_param=meta_data_params,
         evaluation_period_param=evaluation_period_param,
         model_components_param=model_components
     )
 )

 # Get the useful fields from the forecast result
 model = result.model[-1]
 backtest = result.backtest
 forecast = result.forecast
 grid_search = result.grid_search

 # Check model coefficients / variables
 # Get model summary with p-values
 print(model.summary())

 # Get cross-validation results
 cv_results = summarize_grid_search_results(
     grid_search=grid_search,
     decimals=2,
     cv_report_metrics=None,
     column_order=[
         "rank", "mean_test", "split_test", "mean_train", "split_train",
         "mean_fit_time", "mean_score_time", "params"])
 # Transposes to save space in the printed output
 print(cv_results.transpose())

 # Check historical evaluation metrics (on the historical training/test set).
 backtest_eval = defaultdict(list)
 for metric, value in backtest.train_evaluation.items():
     backtest_eval[metric].append(value)
     backtest_eval[metric].append(backtest.test_evaluation[metric])
 metrics = pd.DataFrame(backtest_eval, index=["train", "test"]).T
 print(metrics)

Out:

Fitting 5 folds for each of 1 candidates, totalling 5 fits
================================ Model Summary =================================

Number of observations: 108,   Number of features: 33
Method: Ridge regression
Number of nonzero features: 33
Regularization parameter: 0.01269

Residuals:
         Min           1Q       Median           3Q          Max
  -5.127e+04   -1.256e+04        752.4    1.392e+04    5.073e+04

            Pred_col    Estimate   Std. Err Pr(>)_boot sig. code                     95%CI
           Intercept  -2.220e+04  1.954e+04      0.250                (-6.576e+04, 9544.0)
 C(month,... 13)))_2     -1857.0  2.435e+04      0.916             (-5.949e+04, 4.069e+04)
 C(month,... 13)))_3   3.125e+04  2.271e+04      0.134                (-6432.0, 8.940e+04)
 C(month,... 13)))_4   5.244e+04  2.191e+04      0.024         *    (2.614e+04, 1.088e+05)
 C(month,... 13)))_5   7.419e+04  2.033e+04      0.010         *    (4.666e+04, 1.221e+05)
 C(month,... 13)))_6   5.570e+04  2.270e+04      0.034         *    (2.342e+04, 1.120e+05)
 C(month,... 13)))_7   5.992e+04  2.234e+04      0.012         *    (2.880e+04, 1.150e+05)
 C(month,... 13)))_8   5.781e+04  2.159e+04      0.016         *    (2.876e+04, 1.103e+05)
 C(month,... 13)))_9   4.858e+04  2.831e+04      0.076         .    (1.729e+04, 1.255e+05)
 C(month,...13)))_10   3.069e+04  1.817e+04      0.090         .       (1147.0, 7.843e+04)
 C(month,...13)))_11   2.508e+04  1.637e+04      0.110                (-1479.0, 6.628e+04)
 C(month,...13)))_12     -1322.0  1.694e+04      0.942             (-3.032e+04, 4.180e+04)
             ct_sqrt   1.757e+05  6.048e+04     <2e-16       ***    (4.610e+04, 2.765e+05)
                 ct1   3.731e+04  4.906e+04      0.494             (-3.618e+04, 1.449e+05)
 ct1:C(mo... 13)))_2   2.775e+04  3.884e+04      0.352             (-4.481e+04, 1.132e+05)
 ct1:C(mo... 13)))_3   7.465e+04  3.912e+04      0.062         .   (-1.808e+04, 1.414e+05)
 ct1:C(mo... 13)))_4   1.332e+05  3.916e+04      0.008        **    (4.876e+04, 1.982e+05)
 ct1:C(mo... 13)))_5   8.293e+04  3.897e+04      0.038         *      (-1101.0, 1.576e+05)
 ct1:C(mo... 13)))_6   1.336e+05  3.438e+04      0.006        **    (6.264e+04, 1.912e+05)
 ct1:C(mo... 13)))_7   1.330e+05  3.510e+04     <2e-16       ***    (6.143e+04, 2.012e+05)
 ct1:C(mo... 13)))_8   1.329e+05  3.458e+04     <2e-16       ***    (5.925e+04, 1.964e+05)
 ct1:C(mo... 13)))_9   9.543e+04  4.620e+04      0.042         *       (3490.0, 1.782e+05)
 ct1:C(mo...13)))_10   1.198e+05  2.909e+04      0.002        **    (5.748e+04, 1.782e+05)
 ct1:C(mo...13)))_11     -6389.0  3.095e+04      0.844             (-6.537e+04, 5.810e+04)
 ct1:C(mo...13)))_12      1972.0  3.133e+04      0.952             (-5.922e+04, 5.842e+04)
   cp0_2011_12_31_00      6598.0  3.495e+04      0.870             (-6.051e+04, 7.260e+04)
   cp1_2012_01_30_00     -9129.0  3.703e+04      0.834             (-7.853e+04, 6.467e+04)
   cp2_2012_12_31_00  -6.343e+04  5.699e+04      0.286             (-1.746e+05, 3.395e+04)
   cp3_2014_12_30_00     -6911.0  5.326e+04      0.900             (-1.081e+05, 9.562e+04)
   cp4_2015_02_01_00   3.416e+04  3.812e+04      0.366             (-5.045e+04, 1.023e+05)
   cp5_2015_04_29_00  -2.803e+04  8.469e+04      0.698             (-1.832e+05, 1.527e+05)
   cp6_2017_08_31_00  -5.819e+04  2.060e+04      0.012         *  (-9.439e+04, -1.231e+04)
              y_lag1   1.281e+05  4.927e+04      0.012         *    (4.335e+04, 2.308e+05)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.9678,   Adjusted R-squared: 0.9566
F-statistic: 85.908 on 27 and 79 DF,   p-value: 1.110e-16
Model AIC: 2690.2,   model BIC: 2767.1

WARNING: the condition number is large, 2.75e+04. This might indicate that there are strong multicollinearity or other numerical problems.
WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.

                                                  0
rank_test_MAPE                                    1
mean_test_MAPE                                 8.19
split_test_MAPE    (3.45, 11.02, 6.03, 16.43, 4.02)
mean_train_MAPE                               12.42
split_train_MAPE  (15.4, 11.02, 11.4, 11.17, 13.12)
mean_fit_time                                  1.29
mean_score_time                                1.75
params                                           []
                                                          train         test
CORR                                                   0.983665     0.942998
R2                                                     0.967592     -28.7164
MSE                                                 3.56083e+08  2.17477e+09
RMSE                                                    18870.2      46634.4
MAE                                                     15056.5      42831.5
MedAE                                                   13205.8      44422.5
MAPE                                                    13.8329      12.0911
MedAPE                                                  6.51192      12.4955
sMAPE                                                   5.01372      5.64679
Q80                                                     7528.25      8566.29
Q95                                                     7528.25      2141.57
Q99                                                     7528.25      428.315
OutsideTolerance1p                                     0.913462            1
OutsideTolerance2p                                     0.798077            1
OutsideTolerance3p                                     0.759615            1
OutsideTolerance4p                                     0.701923            1
OutsideTolerance5p                                        0.625         0.75
Outside Tolerance (fraction)                               None         None
R2_null_model_score                                        None         None
Prediction Band Width (%)                               63.0122      19.5714
Prediction Band Coverage (fraction)                    0.923077         0.25
Coverage: Lower Band                                   0.442308         0.25
Coverage: Upper Band                                   0.480769            0
Coverage Diff: Actual_Coverage - Intended_Coverage   -0.0269231         -0.7

Fit/backtest plot:

374
375
 fig = backtest.plot()
 plotly.io.show(fig)

Forecast plot:

379
380
 fig = forecast.plot()
 plotly.io.show(fig)

The components plot:

384
385
 fig = forecast.plot_components()
 plotly.io.show(fig)

Total running time of the script: ( 1 minutes 0.759 seconds)

Gallery generated by Sphinx-Gallery