Example for monthly data

This is a basic example for monthly data using Silverkite. Note that here we are fitting a few simple models and the goal is not to optimize the results as much as possible.

10 import warnings
11 from collections import defaultdict
12
13 import plotly
14 import pandas as pd
15
16 from greykite.framework.benchmark.data_loader_ts import DataLoaderTS
17 from greykite.framework.templates.autogen.forecast_config import EvaluationPeriodParam
18 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
19 from greykite.framework.templates.autogen.forecast_config import MetadataParam
20 from greykite.framework.templates.autogen.forecast_config import ModelComponentsParam
21 from greykite.framework.templates.forecaster import Forecaster
22 from greykite.framework.templates.model_templates import ModelTemplateEnum
23 from greykite.framework.utils.result_summary import summarize_grid_search_results
24 from greykite.framework.input.univariate_time_series import UnivariateTimeSeries
25
26 warnings.filterwarnings("ignore")

Loads dataset into UnivariateTimeSeries.

30 dl = DataLoaderTS()
31 agg_func = {"count": "sum"}
32 df = dl.load_bikesharing(agg_freq="monthly", agg_func=agg_func)
33 # In this monthly data the last month data is incomplete, therefore we drop it
34 df.drop(df.tail(1).index,inplace=True)
35 df.reset_index(drop=True)
36 ts = UnivariateTimeSeries()
37 ts.load_data(
38     df=df,
39     time_col="ts",
40     value_col="count",
41     freq="MS")

Out:

<greykite.framework.input.univariate_time_series.UnivariateTimeSeries object at 0x18592c040>

Exploratory data analysis (EDA)

After reading in a time series, we could first do some exploratory data analysis. The UnivariateTimeSeries class is used to store a timeseries and perform EDA.

A quick description of the data can be obtained as follows.

52 print(ts.describe_time_col())
53 print(ts.describe_value_col())
54 print(df.head())

Out:

{'data_points': 108, 'mean_increment_secs': 2629143.925233645, 'min_timestamp': Timestamp('2010-09-01 00:00:00'), 'max_timestamp': Timestamp('2019-08-01 00:00:00')}
count       108.000000
mean     231254.101852
std      106017.804606
min        4001.000000
25%      144661.750000
50%      227332.000000
75%      327851.250000
max      404811.000000
Name: y, dtype: float64
          ts  count
0 2010-09-01   4001
1 2010-10-01  35949
2 2010-11-01  47391
3 2010-12-01  28253
4 2011-01-01  37499

Let’s plot the original timeseries. (The interactive plot is generated by plotly: click to zoom!)

59 fig = ts.plot()
60 plotly.io.show(fig)

Exploratory plots can be plotted to reveal the time series’s properties. Monthly overlay plot can be used to inspect the annual patterns. This plot overlays various years on top of each other.

66 fig = ts.plot_quantiles_and_overlays(
67      groupby_time_feature="month",
68      show_mean=False,
69      show_quantiles=False,
70      show_overlays=True,
71      overlay_label_time_feature="year",
72      overlay_style={"line": {"width": 1}, "opacity": 0.5},
73      center_values=False,
74      xlabel="month of year",
75      ylabel=ts.original_value_col,
76      title="yearly seasonality for each year (centered)",)
77 plotly.io.show(fig)

Specify common metadata.

81 forecast_horizon = 4
82 time_col = "ts"
83 value_col = "count"
84 meta_data_params = MetadataParam(
85     time_col=time_col,
86     value_col=value_col,
87     freq="MS",
88 )

Specify common evaluation parameters. Set minimum input data for training.

 93 cv_min_train_periods = 24
 94 # Let CV use most recent splits for cross-validation.
 95 cv_use_most_recent_splits = True
 96 # Determine the maximum number of validations.
 97 cv_max_splits = 5
 98 evaluation_period_param = EvaluationPeriodParam(
 99     test_horizon=forecast_horizon,
100     cv_horizon=forecast_horizon,
101     periods_between_train_test=0,
102     cv_min_train_periods=cv_min_train_periods,
103     cv_expanding_window=True,
104     cv_use_most_recent_splits=cv_use_most_recent_splits,
105     cv_periods_between_splits=None,
106     cv_periods_between_train_test=0,
107     cv_max_splits=cv_max_splits,
108 )

Fit a simple model without autoregression. The important modeling parameters for monthly data are as follows. These are plugged into ModelComponentsParam. The extra_pred_cols is used to specify growth and annual seasonality Growth is modelled with both “ct_sqrt”, “ct1” for extra flexibility as we have longterm data and ridge regularization will avoid over-fitting the trend. The annual seasonality is modelled categorically with “C(month)” instead of Fourier series. This is because in monthly data, the number of data points in year is rather small (12) as opposed to daily data where there are many points in the year, which makes categorical representation non-feasible. The categorical representation of monthly also is more explainable/interpretable in the model summary.

123 extra_pred_cols = ["ct_sqrt", "ct1", "C(month, levels=list(range(1, 13)))"]
124 autoregression = None
125
126 # Specify the model parameters
127 model_components = ModelComponentsParam(
128     growth=dict(growth_term=None),
129     seasonality=dict(
130         yearly_seasonality=[False],
131         quarterly_seasonality=[False],
132         monthly_seasonality=[False],
133         weekly_seasonality=[False],
134         daily_seasonality=[False]
135     ),
136     custom=dict(
137         fit_algorithm_dict=dict(fit_algorithm="ridge"),
138         extra_pred_cols=extra_pred_cols
139     ),
140     regressors=dict(regressor_cols=None),
141     autoregression=autoregression,
142     uncertainty=dict(uncertainty_dict=None),
143     events=dict(holiday_lookup_countries=None),
144 )
145
146 # Run the forecast model
147 forecaster = Forecaster()
148 result = forecaster.run_forecast_config(
149     df=df,
150     config=ForecastConfig(
151         model_template=ModelTemplateEnum.SILVERKITE.name,
152         coverage=0.95,
153         forecast_horizon=forecast_horizon,
154         metadata_param=meta_data_params,
155         evaluation_period_param=evaluation_period_param,
156         model_components_param=model_components
157     )
158 )
159
160 # Get the useful fields from the forecast result
161 model = result.model[-1]
162 backtest = result.backtest
163 forecast = result.forecast
164 grid_search = result.grid_search
165
166 # Check model coefficients / variables
167 # Get model summary with p-values
168 print(model.summary())
169
170 # Get cross-validation results
171 cv_results = summarize_grid_search_results(
172     grid_search=grid_search,
173     decimals=2,
174     cv_report_metrics=None,
175     column_order=[
176         "rank", "mean_test", "split_test", "mean_train", "split_train",
177         "mean_fit_time", "mean_score_time", "params"])
178 # Transposes to save space in the printed output
179 print(cv_results.transpose())
180
181 # Check historical evaluation metrics (on the historical training/test set).
182 backtest_eval = defaultdict(list)
183 for metric, value in backtest.train_evaluation.items():
184     backtest_eval[metric].append(value)
185     backtest_eval[metric].append(backtest.test_evaluation[metric])
186 metrics = pd.DataFrame(backtest_eval, index=["train", "test"]).T
187 print(metrics)

Out:

Fitting 5 folds for each of 1 candidates, totalling 5 fits
================================ Model Summary =================================

Number of observations: 108,   Number of features: 21
Method: Ridge regression
Number of nonzero features: 21
Regularization parameter: 0.01269

Residuals:
         Min           1Q       Median           3Q          Max
  -5.631e+04   -2.219e+04       2946.0    2.172e+04    6.649e+04

           Pred_col   Estimate  Std. Err Pr(>)_boot sig. code                    95%CI
          Intercept -9.460e+04 3.435e+04      0.010         * (-1.465e+05, -1.473e+04)
C(month,... 13)))_2     5660.0 1.972e+04      0.744            (-3.247e+04, 4.688e+04)
C(month,... 13)))_3  6.530e+04 1.697e+04     <2e-16       ***   (3.605e+04, 1.018e+05)
C(month,... 13)))_4  1.362e+05 1.641e+04     <2e-16       ***   (1.079e+05, 1.703e+05)
C(month,... 13)))_5  1.534e+05 1.803e+04     <2e-16       ***   (1.180e+05, 1.888e+05)
C(month,... 13)))_6  1.675e+05 1.568e+04     <2e-16       ***   (1.366e+05, 1.988e+05)
C(month,... 13)))_7  1.756e+05 1.652e+04     <2e-16       ***   (1.432e+05, 2.084e+05)
C(month,... 13)))_8  1.758e+05 1.740e+04     <2e-16       ***   (1.423e+05, 2.097e+05)
C(month,... 13)))_9  1.477e+05 1.721e+04     <2e-16       ***   (1.188e+05, 1.824e+05)
C(month,...13)))_10  1.345e+05 1.678e+04     <2e-16       ***   (1.030e+05, 1.691e+05)
C(month,...13)))_11  6.066e+04 1.653e+04      0.004        **   (2.614e+04, 9.191e+04)
C(month,...13)))_12  1.422e+04 1.813e+04      0.418            (-1.592e+04, 5.303e+04)
            ct_sqrt  3.313e+05 1.122e+05      0.010         *   (4.492e+04, 4.749e+05)
                ct1  3.895e+04 1.324e+05      0.818            (-1.951e+05, 3.008e+05)
  cp0_2011_12_31_00  2.954e+04 8.497e+04      0.728            (-1.268e+05, 1.991e+05)
  cp1_2012_01_30_00  1.218e+04 8.100e+04      0.882            (-1.386e+05, 1.818e+05)
  cp2_2012_12_31_00 -7.390e+04 1.017e+05      0.488            (-2.857e+05, 1.108e+05)
  cp3_2014_12_30_00 -1.254e+04 5.829e+04      0.822            (-1.251e+05, 9.673e+04)
  cp4_2015_02_01_00  4.932e+04 4.555e+04      0.292            (-4.064e+04, 1.325e+05)
  cp5_2015_04_29_00 -3.631e+04 8.881e+04      0.694            (-2.009e+05, 1.464e+05)
  cp6_2017_08_31_00 -7.053e+04 2.232e+04      0.006        ** (-1.143e+05, -2.355e+04)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.9248,   Adjusted R-squared: 0.9113
F-statistic: 68.337 on 16 and 90 DF,   p-value: 1.110e-16
Model AIC: 2759.1,   model BIC: 2805.3

WARNING: the condition number is large, 2.44e+04. This might indicate that there are strong multicollinearity or other numerical problems.
WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.

                                                   0
rank_test_MAPE                                     1
mean_test_MAPE                                 17.95
split_test_MAPE   (16.97, 21.68, 5.09, 23.25, 22.77)
mean_train_MAPE                                30.74
split_train_MAPE  (34.41, 28.6, 31.42, 29.18, 30.07)
mean_fit_time                                   1.15
mean_score_time                                 0.18
params                                            []
                                                               train              test
CORR                                                        0.959601          0.959809
R2                                                          0.920783          -2.06113
MSE                                                 870383619.075692  224026154.952861
RMSE                                                    29502.264643      14967.503297
MAE                                                     25057.504616      14721.300357
MedAE                                                   23885.308137      13428.221312
MAPE                                                       31.179462          4.182279
MedAPE                                                      9.409044          3.799328
sMAPE                                                      10.578591          2.137084
Q80                                                     12528.752308      11777.040286
Q95                                                     12528.752308      13985.235339
Q99                                                     12528.752308      14574.087354
OutsideTolerance1p                                          0.980769               1.0
OutsideTolerance2p                                          0.894231               1.0
OutsideTolerance3p                                          0.836538               1.0
OutsideTolerance4p                                          0.826923              0.25
OutsideTolerance5p                                          0.740385              0.25
Outside Tolerance (fraction)                                    None              None
R2_null_model_score                                             None              None
Prediction Band Width (%)                                 117.482609         41.759833
Prediction Band Coverage (fraction)                              1.0               1.0
Coverage: Lower Band                                             0.5               0.0
Coverage: Upper Band                                             0.5               1.0
Coverage Diff: Actual_Coverage - Intended_Coverage              0.05              0.05
MIS                                                    135585.622803     146570.265182

Fit/backtest plot:

191 fig = backtest.plot()
192 plotly.io.show(fig)

Forecast plot:

196 fig = forecast.plot()
197 plotly.io.show(fig)

The components plot:

201 fig = forecast.plot_components()
202 plotly.io.show(fig)

Fit a simple model with autoregression. This is done by specifying the autoregression parameter in ModelComponentsParam. Note that the auto-regressive structure can be customized further depending on your data.

208 extra_pred_cols = ["ct_sqrt", "ct1", "C(month, levels=list(range(1, 13)))"]
209 autoregression = {
210     "autoreg_dict": {
211         "lag_dict": {"orders": [1]},
212         "agg_lag_dict": None
213     }
214 }
215
216 # Specify the model parameters
217 model_components = ModelComponentsParam(
218     growth=dict(growth_term=None),
219     seasonality=dict(
220         yearly_seasonality=[False],
221         quarterly_seasonality=[False],
222         monthly_seasonality=[False],
223         weekly_seasonality=[False],
224         daily_seasonality=[False]
225     ),
226     custom=dict(
227         fit_algorithm_dict=dict(fit_algorithm="ridge"),
228         extra_pred_cols=extra_pred_cols
229     ),
230     regressors=dict(regressor_cols=None),
231     autoregression=autoregression,
232     uncertainty=dict(uncertainty_dict=None),
233     events=dict(holiday_lookup_countries=None),
234 )
235
236 # Run the forecast model
237 forecaster = Forecaster()
238 result = forecaster.run_forecast_config(
239     df=df,
240     config=ForecastConfig(
241         model_template=ModelTemplateEnum.SILVERKITE.name,
242         coverage=0.95,
243         forecast_horizon=forecast_horizon,
244         metadata_param=meta_data_params,
245         evaluation_period_param=evaluation_period_param,
246         model_components_param=model_components
247     )
248 )
249
250 # Get the useful fields from the forecast result
251 model = result.model[-1]
252 backtest = result.backtest
253 forecast = result.forecast
254 grid_search = result.grid_search
255
256 # Check model coefficients / variables
257 # Get model summary with p-values
258 print(model.summary())
259
260 # Get cross-validation results
261 cv_results = summarize_grid_search_results(
262     grid_search=grid_search,
263     decimals=2,
264     cv_report_metrics=None,
265     column_order=[
266         "rank", "mean_test", "split_test", "mean_train", "split_train",
267         "mean_fit_time", "mean_score_time", "params"])
268 # Transposes to save space in the printed output
269 print(cv_results.transpose())
270
271 # Check historical evaluation metrics (on the historical training/test set).
272 backtest_eval = defaultdict(list)
273 for metric, value in backtest.train_evaluation.items():
274     backtest_eval[metric].append(value)
275     backtest_eval[metric].append(backtest.test_evaluation[metric])
276 metrics = pd.DataFrame(backtest_eval, index=["train", "test"]).T
277 print(metrics)

Out:

Fitting 5 folds for each of 1 candidates, totalling 5 fits
================================ Model Summary =================================

Number of observations: 108,   Number of features: 22
Method: Ridge regression
Number of nonzero features: 22
Regularization parameter: 0.0621

Residuals:
         Min           1Q       Median           3Q          Max
  -5.655e+04   -1.618e+04      -1849.0    1.957e+04    6.007e+04

           Pred_col   Estimate  Std. Err Pr(>)_boot sig. code                   95%CI
          Intercept -2.605e+04 1.775e+04      0.140           (-6.200e+04, 1.034e+04)
C(month,... 13)))_2  1.142e+04 1.256e+04      0.312           (-1.458e+04, 3.203e+04)
C(month,... 13)))_3  6.686e+04 1.395e+04     <2e-16       ***  (3.867e+04, 9.241e+04)
C(month,... 13)))_4  1.060e+05 1.579e+04     <2e-16       ***  (7.340e+04, 1.341e+05)
C(month,... 13)))_5  8.563e+04 1.547e+04     <2e-16       ***  (5.997e+04, 1.188e+05)
C(month,... 13)))_6  9.056e+04 1.666e+04     <2e-16       ***  (6.158e+04, 1.240e+05)
C(month,... 13)))_7  9.126e+04 1.686e+04     <2e-16       ***  (6.038e+04, 1.269e+05)
C(month,... 13)))_8  8.720e+04 1.665e+04     <2e-16       ***  (5.678e+04, 1.242e+05)
C(month,... 13)))_9  6.215e+04 1.698e+04     <2e-16       ***  (3.256e+04, 9.816e+04)
C(month,...13)))_10  6.108e+04 1.454e+04     <2e-16       ***  (3.565e+04, 9.005e+04)
C(month,...13)))_11    -6119.0 1.705e+04      0.676           (-3.883e+04, 2.795e+04)
C(month,...13)))_12 -1.324e+04 1.361e+04      0.342           (-3.924e+04, 1.331e+04)
            ct_sqrt  9.290e+04 3.905e+04      0.014         *     (7934.0, 1.561e+05)
                ct1  4.863e+04 2.277e+04      0.024         *     (7442.0, 9.384e+04)
  cp0_2011_12_31_00  2.021e+04 2.612e+04      0.456           (-2.618e+04, 7.280e+04)
  cp1_2012_01_30_00  1.920e+04 2.416e+04      0.440           (-2.125e+04, 6.781e+04)
  cp2_2012_12_31_00 -3.002e+04 3.405e+04      0.370           (-9.800e+04, 4.523e+04)
  cp3_2014_12_30_00     -945.8 1.830e+04      0.962           (-3.608e+04, 3.037e+04)
  cp4_2015_02_01_00     1769.0 1.379e+04      0.890           (-2.779e+04, 2.447e+04)
  cp5_2015_04_29_00 -1.569e+04 3.309e+04      0.634           (-7.615e+04, 5.063e+04)
  cp6_2017_08_31_00 -3.195e+04 1.899e+04      0.092         .    (-7.166e+04, 8231.0)
             y_lag1  2.133e+05 3.008e+04     <2e-16       ***  (1.480e+05, 2.662e+05)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.9451,   Adjusted R-squared: 0.9355
F-statistic: 97.446 on 15 and 91 DF,   p-value: 1.110e-16
Model AIC: 2724.5,   model BIC: 2769.8

WARNING: the condition number is large, 5.60e+03. This might indicate that there are strong multicollinearity or other numerical problems.
WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.

                                                    0
rank_test_MAPE                                      1
mean_test_MAPE                                  19.83
split_test_MAPE   (17.11, 22.78, 10.41, 28.12, 20.74)
mean_train_MAPE                                 22.43
split_train_MAPE  (23.53, 22.22, 22.94, 22.04, 21.44)
mean_fit_time                                    0.98
mean_score_time                                  1.63
params                                             []
                                                              train              test
CORR                                                       0.970891          0.695871
R2                                                         0.942621         -0.664472
MSE                                                 630447348.36414  121812957.346216
RMSE                                                   25108.710607      11036.890746
MAE                                                    20696.988688       8770.558524
MedAE                                                  18654.125854        6009.20302
MAPE                                                      20.976826          2.484444
MedAPE                                                      8.37506          1.739458
sMAPE                                                      8.810362          1.266087
Q80                                                    10348.494344       6559.678408
Q95                                                    10348.494344        7646.87798
Q99                                                    10348.494344       7936.797866
OutsideTolerance1p                                         0.932692              0.75
OutsideTolerance2p                                            0.875               0.5
OutsideTolerance3p                                         0.788462              0.25
OutsideTolerance4p                                             0.75              0.25
OutsideTolerance5p                                         0.673077              0.25
Outside Tolerance (fraction)                                   None              None
R2_null_model_score                                            None              None
Prediction Band Width (%)                                101.082202         29.838175
Prediction Band Coverage (fraction)                        0.990385               1.0
Coverage: Lower Band                                       0.519231              0.25
Coverage: Upper Band                                       0.471154              0.75
Coverage Diff: Actual_Coverage - Intended_Coverage         0.040385              0.05
MIS                                                   117076.015393     105298.326977

Fit/backtest plot:

281 fig = backtest.plot()
282 plotly.io.show(fig)

Forecast plot:

286 fig = forecast.plot()
287 plotly.io.show(fig)

The components plot:

291 fig = forecast.plot_components()
292 plotly.io.show(fig)

Fit a model with time-varying seasonality (month effect). This is achieved by adding "ct1*C(month)" to ModelComponentsParam. Note that this feature may or may not be useful in your use case. We have included this for demonstration purposes only. In this example, while the fit has improved the backtest is inferior to the previous setting.

300 extra_pred_cols = ["ct_sqrt", "ct1", "C(month, levels=list(range(1, 13)))",
301                    "ct1*C(month, levels=list(range(1, 13)))"]
302 autoregression = {
303     "autoreg_dict": {
304         "lag_dict": {"orders": [1]},
305         "agg_lag_dict": None
306     }
307 }
308
309 # Specify the model parameters
310 model_components = ModelComponentsParam(
311     growth=dict(growth_term=None),
312     seasonality=dict(
313         yearly_seasonality=[False],
314         quarterly_seasonality=[False],
315         monthly_seasonality=[False],
316         weekly_seasonality=[False],
317         daily_seasonality=[False]
318     ),
319     custom=dict(
320         fit_algorithm_dict=dict(fit_algorithm="ridge"),
321         extra_pred_cols=extra_pred_cols
322     ),
323     regressors=dict(regressor_cols=None),
324     autoregression=autoregression,
325     uncertainty=dict(uncertainty_dict=None),
326     events=dict(holiday_lookup_countries=None),
327 )
328
329 # Run the forecast model
330 forecaster = Forecaster()
331 result = forecaster.run_forecast_config(
332     df=df,
333     config=ForecastConfig(
334         model_template=ModelTemplateEnum.SILVERKITE.name,
335         coverage=0.95,
336         forecast_horizon=forecast_horizon,
337         metadata_param=meta_data_params,
338         evaluation_period_param=evaluation_period_param,
339         model_components_param=model_components
340     )
341 )
342
343 # Get the useful fields from the forecast result
344 model = result.model[-1]
345 backtest = result.backtest
346 forecast = result.forecast
347 grid_search = result.grid_search
348
349 # Check model coefficients / variables
350 # Get model summary with p-values
351 print(model.summary())
352
353 # Get cross-validation results
354 cv_results = summarize_grid_search_results(
355     grid_search=grid_search,
356     decimals=2,
357     cv_report_metrics=None,
358     column_order=[
359         "rank", "mean_test", "split_test", "mean_train", "split_train",
360         "mean_fit_time", "mean_score_time", "params"])
361 # Transposes to save space in the printed output
362 print(cv_results.transpose())
363
364 # Check historical evaluation metrics (on the historical training/test set).
365 backtest_eval = defaultdict(list)
366 for metric, value in backtest.train_evaluation.items():
367     backtest_eval[metric].append(value)
368     backtest_eval[metric].append(backtest.test_evaluation[metric])
369 metrics = pd.DataFrame(backtest_eval, index=["train", "test"]).T
370 print(metrics)

Out:

Fitting 5 folds for each of 1 candidates, totalling 5 fits
================================ Model Summary =================================

Number of observations: 108,   Number of features: 33
Method: Ridge regression
Number of nonzero features: 33
Regularization parameter: 0.01269

Residuals:
         Min           1Q       Median           3Q          Max
  -5.127e+04   -1.256e+04        752.4    1.392e+04    5.073e+04

           Pred_col   Estimate  Std. Err Pr(>)_boot sig. code                    95%CI
          Intercept -2.220e+04 2.148e+04      0.250               (-7.674e+04, 9245.0)
C(month,... 13)))_2    -1857.0 2.658e+04      0.920            (-6.166e+04, 4.752e+04)
C(month,... 13)))_3  3.125e+04 2.626e+04      0.158            (-1.154e+04, 9.033e+04)
C(month,... 13)))_4  5.244e+04 2.502e+04      0.040         *   (1.939e+04, 1.156e+05)
C(month,... 13)))_5  7.419e+04 2.350e+04      0.012         *   (4.820e+04, 1.421e+05)
C(month,... 13)))_6  5.570e+04 2.196e+04      0.022         *   (2.728e+04, 1.104e+05)
C(month,... 13)))_7  5.992e+04 2.383e+04      0.028         *   (3.061e+04, 1.294e+05)
C(month,... 13)))_8  5.781e+04 2.284e+04      0.016         *   (2.714e+04, 1.160e+05)
C(month,... 13)))_9  4.858e+04 2.505e+04      0.046         *   (1.699e+04, 1.145e+05)
C(month,...13)))_10  3.069e+04 1.948e+04      0.104                (-390.0, 8.172e+04)
C(month,...13)))_11  2.508e+04 1.854e+04      0.134               (-2367.0, 7.676e+04)
C(month,...13)))_12    -1322.0 1.898e+04      0.922            (-3.195e+04, 4.938e+04)
            ct_sqrt  1.757e+05 6.094e+04      0.006        **   (4.648e+04, 2.875e+05)
                ct1  3.731e+04 5.078e+04      0.508            (-4.089e+04, 1.403e+05)
ct1:C(mo... 13)))_2  2.775e+04 4.363e+04      0.372            (-8.118e+04, 1.170e+05)
ct1:C(mo... 13)))_3  7.465e+04 4.548e+04      0.086         .  (-1.543e+04, 1.665e+05)
ct1:C(mo... 13)))_4  1.332e+05 4.228e+04      0.016         *   (3.994e+04, 2.174e+05)
ct1:C(mo... 13)))_5  8.293e+04 3.941e+04      0.038         *     (-1569.0, 1.506e+05)
ct1:C(mo... 13)))_6  1.336e+05 3.381e+04      0.004        **   (6.684e+04, 2.042e+05)
ct1:C(mo... 13)))_7  1.330e+05 3.719e+04     <2e-16       ***   (5.292e+04, 2.050e+05)
ct1:C(mo... 13)))_8  1.329e+05 3.535e+04     <2e-16       ***   (5.568e+04, 1.974e+05)
ct1:C(mo... 13)))_9  9.543e+04 4.436e+04      0.034         *      (3558.0, 1.769e+05)
ct1:C(mo...13)))_10  1.198e+05 3.048e+04     <2e-16       ***   (5.392e+04, 1.723e+05)
ct1:C(mo...13)))_11    -6389.0 3.218e+04      0.842            (-6.712e+04, 5.681e+04)
ct1:C(mo...13)))_12     1972.0 3.225e+04      0.958            (-7.000e+04, 6.390e+04)
  cp0_2011_12_31_00     6598.0 3.477e+04      0.840            (-6.437e+04, 7.330e+04)
  cp1_2012_01_30_00    -9129.0 3.628e+04      0.802            (-7.798e+04, 6.929e+04)
  cp2_2012_12_31_00 -6.343e+04 5.610e+04      0.262            (-1.748e+05, 4.424e+04)
  cp3_2014_12_30_00    -6911.0 5.068e+04      0.896            (-1.033e+05, 8.457e+04)
  cp4_2015_02_01_00  3.416e+04 3.760e+04      0.384            (-4.905e+04, 9.823e+04)
  cp5_2015_04_29_00 -2.803e+04 7.950e+04      0.716            (-1.710e+05, 1.383e+05)
  cp6_2017_08_31_00 -5.819e+04 2.157e+04      0.004        ** (-9.676e+04, -1.510e+04)
             y_lag1  1.281e+05 5.206e+04      0.012         *   (3.716e+04, 2.337e+05)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.9678,   Adjusted R-squared: 0.9566
F-statistic: 85.908 on 27 and 79 DF,   p-value: 1.110e-16
Model AIC: 2690.2,   model BIC: 2767.1

WARNING: the condition number is large, 2.75e+04. This might indicate that there are strong multicollinearity or other numerical problems.
WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.

                                                  0
rank_test_MAPE                                    1
mean_test_MAPE                                10.98
split_test_MAPE   (1.69, 16.74, 10.22, 18.02, 8.24)
mean_train_MAPE                               12.42
split_train_MAPE  (15.4, 11.02, 11.4, 11.17, 13.12)
mean_fit_time                                  0.86
mean_score_time                                1.46
params                                           []
                                                               train               test
CORR                                                        0.983665           0.843115
R2                                                          0.967592         -22.623624
MSE                                                 356082880.527499  1728874340.802915
RMSE                                                    18870.158466       41579.734737
MAE                                                      15056.50024       40880.699047
MedAE                                                    13205.77968       39562.603905
MAPE                                                       13.832901          11.630886
MedAPE                                                       6.51192          11.043292
sMAPE                                                       5.013717           5.486557
Q80                                                       7528.25012        8176.139809
Q95                                                       7528.25012        2044.034952
Q99                                                       7528.25012          408.80699
OutsideTolerance1p                                          0.913462                1.0
OutsideTolerance2p                                          0.798077                1.0
OutsideTolerance3p                                          0.759615                1.0
OutsideTolerance4p                                          0.701923                1.0
OutsideTolerance5p                                             0.625                1.0
Outside Tolerance (fraction)                                    None               None
R2_null_model_score                                             None               None
Prediction Band Width (%)                                  85.852501          28.732234
Prediction Band Coverage (fraction)                         0.980769                1.0
Coverage: Lower Band                                        0.480769                1.0
Coverage: Upper Band                                             0.5                0.0
Coverage Diff: Actual_Coverage - Intended_Coverage          0.030769               0.05
MIS                                                     99811.825941      101049.526246

Fit/backtest plot:

374 fig = backtest.plot()
375 plotly.io.show(fig)

Forecast plot:

379 fig = forecast.plot()
380 plotly.io.show(fig)

The components plot:

384 fig = forecast.plot_components()
385 plotly.io.show(fig)

Total running time of the script: ( 0 minutes 47.884 seconds)

Gallery generated by Sphinx-Gallery