Note
Click here to download the full example code
Example for monthly data
This is a basic example for monthly data using Silverkite. Note that here we are fitting a few simple models and the goal is not to optimize the results as much as possible.
10 import warnings
11 from collections import defaultdict
12
13 import plotly
14 import pandas as pd
15
16 from greykite.framework.benchmark.data_loader_ts import DataLoaderTS
17 from greykite.framework.templates.autogen.forecast_config import EvaluationPeriodParam
18 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
19 from greykite.framework.templates.autogen.forecast_config import MetadataParam
20 from greykite.framework.templates.autogen.forecast_config import ModelComponentsParam
21 from greykite.framework.templates.forecaster import Forecaster
22 from greykite.framework.templates.model_templates import ModelTemplateEnum
23 from greykite.framework.utils.result_summary import summarize_grid_search_results
24 from greykite.framework.input.univariate_time_series import UnivariateTimeSeries
25
26 warnings.filterwarnings("ignore")
Loads dataset into UnivariateTimeSeries
.
30 dl = DataLoaderTS()
31 agg_func = {"count": "sum"}
32 df = dl.load_bikesharing(agg_freq="monthly", agg_func=agg_func)
33 # In this monthly data the last month data is incomplete, therefore we drop it
34 df.drop(df.tail(1).index,inplace=True)
35 df.reset_index(drop=True)
36 ts = UnivariateTimeSeries()
37 ts.load_data(
38 df=df,
39 time_col="ts",
40 value_col="count",
41 freq="MS")
Out:
<greykite.framework.input.univariate_time_series.UnivariateTimeSeries object at 0x7f33a9d17970>
Exploratory data analysis (EDA)
After reading in a time series, we could first do some exploratory data analysis.
The UnivariateTimeSeries
class is
used to store a timeseries and perform EDA.
A quick description of the data can be obtained as follows.
52 print(ts.describe_time_col())
53 print(ts.describe_value_col())
54 print(df.head())
Out:
{'data_points': 108, 'mean_increment_secs': 2629143.925233645, 'min_timestamp': Timestamp('2010-09-01 00:00:00'), 'max_timestamp': Timestamp('2019-08-01 00:00:00')}
count 108.000000
mean 231254.101852
std 106017.804606
min 4001.000000
25% 144661.750000
50% 227332.000000
75% 327851.250000
max 404811.000000
Name: y, dtype: float64
ts count
0 2010-09-01 4001
1 2010-10-01 35949
2 2010-11-01 47391
3 2010-12-01 28253
4 2011-01-01 37499
Let’s plot the original timeseries.
(The interactive plot is generated by plotly
: click to zoom!)
59 fig = ts.plot()
60 plotly.io.show(fig)
Exploratory plots can be plotted to reveal the time series’s properties. Monthly overlay plot can be used to inspect the annual patterns. This plot overlays various years on top of each other.
66 fig = ts.plot_quantiles_and_overlays(
67 groupby_time_feature="month",
68 show_mean=False,
69 show_quantiles=False,
70 show_overlays=True,
71 overlay_label_time_feature="year",
72 overlay_style={"line": {"width": 1}, "opacity": 0.5},
73 center_values=False,
74 xlabel="month of year",
75 ylabel=ts.original_value_col,
76 title="yearly seasonality for each year (centered)",)
77 plotly.io.show(fig)
Specify common metadata.
Specify common evaluation parameters. Set minimum input data for training.
93 cv_min_train_periods = 24
94 # Let CV use most recent splits for cross-validation.
95 cv_use_most_recent_splits = True
96 # Determine the maximum number of validations.
97 cv_max_splits = 5
98 evaluation_period_param = EvaluationPeriodParam(
99 test_horizon=forecast_horizon,
100 cv_horizon=forecast_horizon,
101 periods_between_train_test=0,
102 cv_min_train_periods=cv_min_train_periods,
103 cv_expanding_window=True,
104 cv_use_most_recent_splits=cv_use_most_recent_splits,
105 cv_periods_between_splits=None,
106 cv_periods_between_train_test=0,
107 cv_max_splits=cv_max_splits,
108 )
Fit a simple model without autoregression.
The important modeling parameters for monthly data are as follows.
These are plugged into ModelComponentsParam
.
The extra_pred_cols
is used to specify growth and annual seasonality
Growth is modelled with both “ct_sqrt”, “ct1” for extra flexibility as we have
longterm data and ridge regularization will avoid over-fitting the trend.
The annual seasonality is modelled categorically with “C(month)” instead of
Fourier series. This is because in monthly data, the number of data points in
year is rather small (12) as opposed to daily data where there are many points in
the year, which makes categorical representation non-feasible.
The categorical representation of monthly also is more explainable/interpretable in the model
summary.
123 extra_pred_cols = ["ct_sqrt", "ct1", "C(month, levels=list(range(1, 13)))"]
124 autoregression = None
125
126 # Specify the model parameters
127 model_components = ModelComponentsParam(
128 growth=dict(growth_term=None),
129 seasonality=dict(
130 yearly_seasonality=[False],
131 quarterly_seasonality=[False],
132 monthly_seasonality=[False],
133 weekly_seasonality=[False],
134 daily_seasonality=[False]
135 ),
136 custom=dict(
137 fit_algorithm_dict=dict(fit_algorithm="ridge"),
138 extra_pred_cols=extra_pred_cols
139 ),
140 regressors=dict(regressor_cols=None),
141 autoregression=autoregression,
142 uncertainty=dict(uncertainty_dict=None),
143 events=dict(holiday_lookup_countries=None),
144 )
145
146 # Run the forecast model
147 forecaster = Forecaster()
148 result = forecaster.run_forecast_config(
149 df=df,
150 config=ForecastConfig(
151 model_template=ModelTemplateEnum.SILVERKITE.name,
152 coverage=0.95,
153 forecast_horizon=forecast_horizon,
154 metadata_param=meta_data_params,
155 evaluation_period_param=evaluation_period_param,
156 model_components_param=model_components
157 )
158 )
159
160 # Get the useful fields from the forecast result
161 model = result.model[-1]
162 backtest = result.backtest
163 forecast = result.forecast
164 grid_search = result.grid_search
165
166 # Check model coefficients / variables
167 # Get model summary with p-values
168 print(model.summary())
169
170 # Get cross-validation results
171 cv_results = summarize_grid_search_results(
172 grid_search=grid_search,
173 decimals=2,
174 cv_report_metrics=None,
175 column_order=[
176 "rank", "mean_test", "split_test", "mean_train", "split_train",
177 "mean_fit_time", "mean_score_time", "params"])
178 # Transposes to save space in the printed output
179 print(cv_results.transpose())
180
181 # Check historical evaluation metrics (on the historical training/test set).
182 backtest_eval = defaultdict(list)
183 for metric, value in backtest.train_evaluation.items():
184 backtest_eval[metric].append(value)
185 backtest_eval[metric].append(backtest.test_evaluation[metric])
186 metrics = pd.DataFrame(backtest_eval, index=["train", "test"]).T
187 print(metrics)
Out:
Fitting 5 folds for each of 1 candidates, totalling 5 fits
============================ Forecast Model Summary ============================
Number of observations: 108, Number of features: 21
Method: Ridge regression
Number of nonzero features: 21
Regularization parameter: 0.01269
Residuals:
Min 1Q Median 3Q Max
-5.631e+04 -2.219e+04 2946.0 2.172e+04 6.649e+04
Pred_col Estimate Std. Err Pr(>)_boot sig. code 95%CI
Intercept -9.460e+04 3.409e+04 0.010 * (-1.434e+05, -1.253e+04)
C(month,... 13)))_2 5660.0 2.061e+04 0.762 (-3.709e+04, 5.027e+04)
C(month,... 13)))_3 6.530e+04 1.818e+04 <2e-16 *** (2.972e+04, 1.009e+05)
C(month,... 13)))_4 1.362e+05 1.720e+04 <2e-16 *** (9.867e+04, 1.690e+05)
C(month,... 13)))_5 1.534e+05 1.933e+04 <2e-16 *** (1.143e+05, 1.927e+05)
C(month,... 13)))_6 1.675e+05 1.715e+04 <2e-16 *** (1.344e+05, 2.014e+05)
C(month,... 13)))_7 1.756e+05 1.697e+04 <2e-16 *** (1.407e+05, 2.075e+05)
C(month,... 13)))_8 1.758e+05 1.777e+04 <2e-16 *** (1.414e+05, 2.083e+05)
C(month,... 13)))_9 1.477e+05 1.848e+04 <2e-16 *** (1.135e+05, 1.818e+05)
C(month,...13)))_10 1.345e+05 1.767e+04 <2e-16 *** (9.636e+04, 1.688e+05)
C(month,...13)))_11 6.066e+04 1.747e+04 <2e-16 *** (2.344e+04, 9.388e+04)
C(month,...13)))_12 1.422e+04 1.840e+04 0.424 (-2.463e+04, 4.871e+04)
ct_sqrt 3.313e+05 1.135e+05 0.006 ** (1.785e+04, 4.547e+05)
ct1 3.895e+04 1.219e+05 0.798 (-1.812e+05, 2.819e+05)
cp0_2011_12_31_00 2.954e+04 8.122e+04 0.716 (-1.163e+05, 2.138e+05)
cp1_2012_01_30_00 1.218e+04 7.376e+04 0.872 (-1.319e+05, 1.611e+05)
cp2_2012_12_31_00 -7.390e+04 1.008e+05 0.452 (-2.810e+05, 1.155e+05)
cp3_2014_12_30_00 -1.254e+04 5.737e+04 0.806 (-1.298e+05, 9.590e+04)
cp4_2015_02_01_00 4.932e+04 4.764e+04 0.298 (-4.476e+04, 1.379e+05)
cp5_2015_04_29_00 -3.631e+04 8.895e+04 0.700 (-1.923e+05, 1.410e+05)
cp6_2017_08_31_00 -7.053e+04 2.190e+04 0.006 ** (-1.148e+05, -2.587e+04)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.9248, Adjusted R-squared: 0.9113
F-statistic: 68.337 on 16 and 90 DF, p-value: 1.110e-16
Model AIC: 2759.1, model BIC: 2805.3
WARNING: the condition number is large, 2.44e+04. This might indicate that there are strong multicollinearity or other numerical problems.
WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.
0
rank_test_MAPE 1
mean_test_MAPE 17.95
split_test_MAPE (16.97, 21.68, 5.09, 23.25, 22.77)
mean_train_MAPE 30.74
split_train_MAPE (34.41, 28.6, 31.42, 29.18, 30.07)
mean_fit_time 1.52
mean_score_time 0.21
param_estimator__auto_holiday_params None
params []
train test
CORR 0.959601 0.959809
R2 0.920783 -2.06113
MSE 870383619.075692 224026154.952861
RMSE 29502.264643 14967.503297
MAE 25057.504616 14721.300357
MedAE 23885.308137 13428.221312
MAPE 31.179462 4.182279
MedAPE 9.409044 3.799328
sMAPE 10.578591 2.137084
Q80 12528.752308 11777.040286
Q95 12528.752308 13985.235339
Q99 12528.752308 14574.087354
OutsideTolerance1p 0.980769 1.0
OutsideTolerance2p 0.894231 1.0
OutsideTolerance3p 0.836538 1.0
OutsideTolerance4p 0.826923 0.25
OutsideTolerance5p 0.740385 0.25
Outside Tolerance (fraction) None None
R2_null_model_score None None
Prediction Band Width (%) 117.482609 41.759833
Prediction Band Coverage (fraction) 1.0 1.0
Coverage: Lower Band 0.5 0.0
Coverage: Upper Band 0.5 1.0
Coverage Diff: Actual_Coverage - Intended_Coverage 0.05 0.05
MIS 135585.622803 146570.265182
Fit/backtest plot:
191 fig = backtest.plot()
192 plotly.io.show(fig)
Forecast plot:
196 fig = forecast.plot()
197 plotly.io.show(fig)
The components plot:
201 fig = forecast.plot_components()
202 plotly.io.show(fig)
Fit a simple model with autoregression.
This is done by specifying the autoregression
parameter in ModelComponentsParam
.
Note that the auto-regressive structure can be customized further depending on your data.
208 extra_pred_cols = ["ct_sqrt", "ct1", "C(month, levels=list(range(1, 13)))"]
209 autoregression = {
210 "autoreg_dict": {
211 "lag_dict": {"orders": [1]},
212 "agg_lag_dict": None
213 }
214 }
215
216 # Specify the model parameters
217 model_components = ModelComponentsParam(
218 growth=dict(growth_term=None),
219 seasonality=dict(
220 yearly_seasonality=[False],
221 quarterly_seasonality=[False],
222 monthly_seasonality=[False],
223 weekly_seasonality=[False],
224 daily_seasonality=[False]
225 ),
226 custom=dict(
227 fit_algorithm_dict=dict(fit_algorithm="ridge"),
228 extra_pred_cols=extra_pred_cols
229 ),
230 regressors=dict(regressor_cols=None),
231 autoregression=autoregression,
232 uncertainty=dict(uncertainty_dict=None),
233 events=dict(holiday_lookup_countries=None),
234 )
235
236 # Run the forecast model
237 forecaster = Forecaster()
238 result = forecaster.run_forecast_config(
239 df=df,
240 config=ForecastConfig(
241 model_template=ModelTemplateEnum.SILVERKITE.name,
242 coverage=0.95,
243 forecast_horizon=forecast_horizon,
244 metadata_param=meta_data_params,
245 evaluation_period_param=evaluation_period_param,
246 model_components_param=model_components
247 )
248 )
249
250 # Get the useful fields from the forecast result
251 model = result.model[-1]
252 backtest = result.backtest
253 forecast = result.forecast
254 grid_search = result.grid_search
255
256 # Check model coefficients / variables
257 # Get model summary with p-values
258 print(model.summary())
259
260 # Get cross-validation results
261 cv_results = summarize_grid_search_results(
262 grid_search=grid_search,
263 decimals=2,
264 cv_report_metrics=None,
265 column_order=[
266 "rank", "mean_test", "split_test", "mean_train", "split_train",
267 "mean_fit_time", "mean_score_time", "params"])
268 # Transposes to save space in the printed output
269 print(cv_results.transpose())
270
271 # Check historical evaluation metrics (on the historical training/test set).
272 backtest_eval = defaultdict(list)
273 for metric, value in backtest.train_evaluation.items():
274 backtest_eval[metric].append(value)
275 backtest_eval[metric].append(backtest.test_evaluation[metric])
276 metrics = pd.DataFrame(backtest_eval, index=["train", "test"]).T
277 print(metrics)
Out:
Fitting 5 folds for each of 1 candidates, totalling 5 fits
============================ Forecast Model Summary ============================
Number of observations: 108, Number of features: 22
Method: Ridge regression
Number of nonzero features: 22
Regularization parameter: 0.0621
Residuals:
Min 1Q Median 3Q Max
-5.655e+04 -1.618e+04 -1849.0 1.957e+04 6.007e+04
Pred_col Estimate Std. Err Pr(>)_boot sig. code 95%CI
Intercept -2.605e+04 1.715e+04 0.136 (-6.200e+04, 7222.0)
C(month,... 13)))_2 1.142e+04 1.266e+04 0.374 (-1.351e+04, 3.409e+04)
C(month,... 13)))_3 6.686e+04 1.321e+04 <2e-16 *** (4.238e+04, 9.340e+04)
C(month,... 13)))_4 1.060e+05 1.559e+04 <2e-16 *** (6.996e+04, 1.340e+05)
C(month,... 13)))_5 8.563e+04 1.450e+04 <2e-16 *** (5.937e+04, 1.130e+05)
C(month,... 13)))_6 9.056e+04 1.571e+04 <2e-16 *** (5.812e+04, 1.202e+05)
C(month,... 13)))_7 9.126e+04 1.504e+04 <2e-16 *** (6.233e+04, 1.206e+05)
C(month,... 13)))_8 8.720e+04 1.561e+04 <2e-16 *** (5.815e+04, 1.166e+05)
C(month,... 13)))_9 6.215e+04 1.592e+04 <2e-16 *** (3.276e+04, 9.404e+04)
C(month,...13)))_10 6.108e+04 1.332e+04 <2e-16 *** (3.539e+04, 8.957e+04)
C(month,...13)))_11 -6119.0 1.621e+04 0.724 (-3.837e+04, 2.401e+04)
C(month,...13)))_12 -1.324e+04 1.269e+04 0.296 (-3.694e+04, 1.192e+04)
ct_sqrt 9.290e+04 3.981e+04 0.024 * (1.445e+04, 1.738e+05)
ct1 4.863e+04 2.302e+04 0.036 * (1578.0, 9.009e+04)
cp0_2011_12_31_00 2.021e+04 2.622e+04 0.448 (-2.782e+04, 7.215e+04)
cp1_2012_01_30_00 1.920e+04 2.403e+04 0.446 (-2.574e+04, 6.653e+04)
cp2_2012_12_31_00 -3.002e+04 3.455e+04 0.394 (-8.821e+04, 4.370e+04)
cp3_2014_12_30_00 -945.8 1.701e+04 0.954 (-3.553e+04, 3.390e+04)
cp4_2015_02_01_00 1769.0 1.338e+04 0.872 (-2.584e+04, 2.755e+04)
cp5_2015_04_29_00 -1.569e+04 3.195e+04 0.644 (-7.442e+04, 4.624e+04)
cp6_2017_08_31_00 -3.195e+04 1.826e+04 0.076 . (-6.455e+04, 4787.0)
y_lag1 2.133e+05 2.941e+04 <2e-16 *** (1.570e+05, 2.693e+05)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.9451, Adjusted R-squared: 0.9355
F-statistic: 97.446 on 15 and 91 DF, p-value: 1.110e-16
Model AIC: 2724.5, model BIC: 2769.8
WARNING: the condition number is large, 5.60e+03. This might indicate that there are strong multicollinearity or other numerical problems.
WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.
0
rank_test_MAPE 1
mean_test_MAPE 15.01
split_test_MAPE (17.46, 18.54, 6.4, 22.12, 10.53)
mean_train_MAPE 22.43
split_train_MAPE (23.53, 22.22, 22.94, 22.04, 21.44)
mean_fit_time 1.48
mean_score_time 2.3
param_estimator__auto_holiday_params None
params []
train test
CORR 0.970891 -0.179218
R2 0.942621 -0.530915
MSE 630447348.36414 112038658.953697
RMSE 25108.710607 10584.831551
MAE 20696.988688 8598.010439
MedAE 18654.125854 6850.313966
MAPE 20.976826 2.444577
MedAPE 8.37506 2.004225
sMAPE 8.810362 1.232123
Q80 10348.494344 4869.999814
Q95 10348.494344 5155.497111
Q99 10348.494344 5231.629724
OutsideTolerance1p 0.932692 0.5
OutsideTolerance2p 0.875 0.5
OutsideTolerance3p 0.788462 0.5
OutsideTolerance4p 0.75 0.25
OutsideTolerance5p 0.673077 0.0
Outside Tolerance (fraction) None None
R2_null_model_score None None
Prediction Band Width (%) 101.082202 30.502409
Prediction Band Coverage (fraction) 0.990385 1.0
Coverage: Lower Band 0.519231 0.5
Coverage: Upper Band 0.471154 0.5
Coverage Diff: Actual_Coverage - Intended_Coverage 0.040385 0.05
MIS 117076.015393 107169.752382
Fit/backtest plot:
281 fig = backtest.plot()
282 plotly.io.show(fig)
Forecast plot:
286 fig = forecast.plot()
287 plotly.io.show(fig)
The components plot:
291 fig = forecast.plot_components()
292 plotly.io.show(fig)
Fit a model with time-varying seasonality (month effect).
This is achieved by adding "ct1*C(month)"
to ModelComponentsParam
.
Note that this feature may or may not be useful in your use case.
We have included this for demonstration purposes only.
In this example, while the fit has improved the backtest is inferior to the previous setting.
300 extra_pred_cols = ["ct_sqrt", "ct1", "C(month, levels=list(range(1, 13)))",
301 "ct1*C(month, levels=list(range(1, 13)))"]
302 autoregression = {
303 "autoreg_dict": {
304 "lag_dict": {"orders": [1]},
305 "agg_lag_dict": None
306 }
307 }
308
309 # Specify the model parameters
310 model_components = ModelComponentsParam(
311 growth=dict(growth_term=None),
312 seasonality=dict(
313 yearly_seasonality=[False],
314 quarterly_seasonality=[False],
315 monthly_seasonality=[False],
316 weekly_seasonality=[False],
317 daily_seasonality=[False]
318 ),
319 custom=dict(
320 fit_algorithm_dict=dict(fit_algorithm="ridge"),
321 extra_pred_cols=extra_pred_cols
322 ),
323 regressors=dict(regressor_cols=None),
324 autoregression=autoregression,
325 uncertainty=dict(uncertainty_dict=None),
326 events=dict(holiday_lookup_countries=None),
327 )
328
329 # Run the forecast model
330 forecaster = Forecaster()
331 result = forecaster.run_forecast_config(
332 df=df,
333 config=ForecastConfig(
334 model_template=ModelTemplateEnum.SILVERKITE.name,
335 coverage=0.95,
336 forecast_horizon=forecast_horizon,
337 metadata_param=meta_data_params,
338 evaluation_period_param=evaluation_period_param,
339 model_components_param=model_components
340 )
341 )
342
343 # Get the useful fields from the forecast result
344 model = result.model[-1]
345 backtest = result.backtest
346 forecast = result.forecast
347 grid_search = result.grid_search
348
349 # Check model coefficients / variables
350 # Get model summary with p-values
351 print(model.summary())
352
353 # Get cross-validation results
354 cv_results = summarize_grid_search_results(
355 grid_search=grid_search,
356 decimals=2,
357 cv_report_metrics=None,
358 column_order=[
359 "rank", "mean_test", "split_test", "mean_train", "split_train",
360 "mean_fit_time", "mean_score_time", "params"])
361 # Transposes to save space in the printed output
362 print(cv_results.transpose())
363
364 # Check historical evaluation metrics (on the historical training/test set).
365 backtest_eval = defaultdict(list)
366 for metric, value in backtest.train_evaluation.items():
367 backtest_eval[metric].append(value)
368 backtest_eval[metric].append(backtest.test_evaluation[metric])
369 metrics = pd.DataFrame(backtest_eval, index=["train", "test"]).T
370 print(metrics)
Out:
Fitting 5 folds for each of 1 candidates, totalling 5 fits
============================ Forecast Model Summary ============================
Number of observations: 108, Number of features: 33
Method: Ridge regression
Number of nonzero features: 33
Regularization parameter: 0.01269
Residuals:
Min 1Q Median 3Q Max
-5.127e+04 -1.256e+04 752.4 1.392e+04 5.073e+04
Pred_col Estimate Std. Err Pr(>)_boot sig. code 95%CI
Intercept -2.220e+04 2.130e+04 0.238 (-7.564e+04, 1.149e+04)
C(month,... 13)))_2 -1857.0 2.646e+04 0.920 (-6.688e+04, 4.786e+04)
C(month,... 13)))_3 3.125e+04 2.301e+04 0.144 (-3173.0, 8.565e+04)
C(month,... 13)))_4 5.244e+04 2.626e+04 0.046 * (2.464e+04, 1.178e+05)
C(month,... 13)))_5 7.419e+04 2.424e+04 0.018 * (4.792e+04, 1.336e+05)
C(month,... 13)))_6 5.570e+04 2.221e+04 0.030 * (2.310e+04, 1.151e+05)
C(month,... 13)))_7 5.992e+04 2.446e+04 0.026 * (3.027e+04, 1.287e+05)
C(month,... 13)))_8 5.781e+04 2.216e+04 0.010 * (3.024e+04, 1.152e+05)
C(month,... 13)))_9 4.858e+04 2.647e+04 0.068 . (1.883e+04, 1.197e+05)
C(month,...13)))_10 3.069e+04 1.860e+04 0.080 . (5225.0, 7.923e+04)
C(month,...13)))_11 2.508e+04 1.821e+04 0.112 (972.3, 7.268e+04)
C(month,...13)))_12 -1322.0 1.801e+04 0.926 (-3.062e+04, 4.328e+04)
ct_sqrt 1.757e+05 5.780e+04 0.004 ** (5.089e+04, 2.730e+05)
ct1 3.731e+04 5.211e+04 0.506 (-4.376e+04, 1.460e+05)
ct1:C(mo... 13)))_2 2.775e+04 4.287e+04 0.396 (-5.849e+04, 1.190e+05)
ct1:C(mo... 13)))_3 7.465e+04 3.701e+04 0.050 . (-5378.0, 1.433e+05)
ct1:C(mo... 13)))_4 1.332e+05 3.981e+04 0.008 ** (4.167e+04, 1.833e+05)
ct1:C(mo... 13)))_5 8.293e+04 4.323e+04 0.054 . (-2495.0, 1.742e+05)
ct1:C(mo... 13)))_6 1.336e+05 3.183e+04 0.004 ** (6.376e+04, 1.914e+05)
ct1:C(mo... 13)))_7 1.330e+05 3.490e+04 <2e-16 *** (5.261e+04, 1.926e+05)
ct1:C(mo... 13)))_8 1.329e+05 3.295e+04 <2e-16 *** (5.574e+04, 1.932e+05)
ct1:C(mo... 13)))_9 9.543e+04 4.594e+04 0.050 . (2917.0, 1.872e+05)
ct1:C(mo...13)))_10 1.198e+05 2.940e+04 0.002 ** (4.842e+04, 1.658e+05)
ct1:C(mo...13)))_11 -6389.0 2.975e+04 0.792 (-7.002e+04, 5.051e+04)
ct1:C(mo...13)))_12 1972.0 2.950e+04 0.960 (-6.472e+04, 5.189e+04)
cp0_2011_12_31_00 6598.0 3.660e+04 0.836 (-5.905e+04, 8.383e+04)
cp1_2012_01_30_00 -9129.0 3.889e+04 0.828 (-7.437e+04, 7.405e+04)
cp2_2012_12_31_00 -6.343e+04 5.903e+04 0.284 (-1.849e+05, 4.453e+04)
cp3_2014_12_30_00 -6911.0 5.294e+04 0.920 (-1.147e+05, 9.819e+04)
cp4_2015_02_01_00 3.416e+04 3.916e+04 0.396 (-4.742e+04, 1.019e+05)
cp5_2015_04_29_00 -2.803e+04 8.388e+04 0.720 (-1.790e+05, 1.468e+05)
cp6_2017_08_31_00 -5.819e+04 2.044e+04 0.004 ** (-9.714e+04, -1.625e+04)
y_lag1 1.281e+05 5.048e+04 0.010 * (3.538e+04, 2.244e+05)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.9678, Adjusted R-squared: 0.9566
F-statistic: 85.908 on 27 and 79 DF, p-value: 1.110e-16
Model AIC: 2690.2, model BIC: 2767.1
WARNING: the condition number is large, 2.75e+04. This might indicate that there are strong multicollinearity or other numerical problems.
WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.
0
rank_test_MAPE 1
mean_test_MAPE 9.72
split_test_MAPE (4.47, 10.77, 4.61, 16.91, 11.82)
mean_train_MAPE 12.42
split_train_MAPE (15.4, 11.02, 11.4, 11.17, 13.12)
mean_fit_time 1.53
mean_score_time 2.49
param_estimator__auto_holiday_params None
params []
train test
CORR 0.983665 0.912135
R2 0.967592 -26.885277
MSE 356082880.527499 2040759674.813501
RMSE 18870.158466 45174.768121
MAE 15056.50024 41832.26242
MedAE 13205.77968 45919.500594
MAPE 13.832901 11.823066
MedAPE 6.51192 12.947484
sMAPE 5.013717 5.534011
Q80 7528.25012 8366.452484
Q95 7528.25012 2091.613121
Q99 7528.25012 418.322624
OutsideTolerance1p 0.913462 1.0
OutsideTolerance2p 0.798077 1.0
OutsideTolerance3p 0.759615 1.0
OutsideTolerance4p 0.701923 1.0
OutsideTolerance5p 0.625 0.75
Outside Tolerance (fraction) None None
R2_null_model_score None None
Prediction Band Width (%) 85.852501 29.154853
Prediction Band Coverage (fraction) 0.980769 0.5
Coverage: Lower Band 0.480769 0.5
Coverage: Upper Band 0.5 0.0
Coverage Diff: Actual_Coverage - Intended_Coverage 0.030769 -0.45
MIS 99811.825941 401578.201277
Fit/backtest plot:
374 fig = backtest.plot()
375 plotly.io.show(fig)
Forecast plot:
379 fig = forecast.plot()
380 plotly.io.show(fig)
The components plot:
384 fig = forecast.plot_components()
385 plotly.io.show(fig)
Total running time of the script: ( 1 minutes 13.650 seconds)