Interpretability

Silverkite generates easily interpretable forecasting models when using its default ML algorithms (e.g. Ridge). This is because after transforming the raw features to basis functions (transformed features), the model uses an additive structure. Silverkite can break down each forecast into various summable components e.g. long-term growth, seasonality, holidays, events, short-term growth (auto-regression), regressors impact etc.

The approach to generate these breakdowns consists of two steps:

  1. Group the transformed variables into various meaningful groups.

  2. Calculate the sum of the features multiplied by their regression coefficients within each group.

These breakdowns then can be used to answer questions such as:

  • Question 1: How is the forecast value generated?

  • Question 2: What is driving the change of the forecast as new data comes in?

Forecast components can also help us analyze model behavior and sensitivity. This is because while it is not feasible to compare a large set of features across two model settings, it can be quite practical and informative to compare a few well-defined components.

This tutorial discusses in detail the usage of forecast_breakdown and how to estimate forecast components using custom component dictionaries. Some of this functionality has been built in estimators using the method plot_components(...). An example of this usage is in the “Simple Forecast” tutorial in the Quick Start.

31 # required imports
32 import plotly
33 import warnings
34 import pandas as pd
35 from greykite.framework.benchmark.data_loader_ts import DataLoaderTS
36 from greykite.framework.templates.autogen.forecast_config import EvaluationPeriodParam
37 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
38 from greykite.framework.templates.autogen.forecast_config import MetadataParam
39 from greykite.framework.templates.autogen.forecast_config import ModelComponentsParam
40 from greykite.framework.templates.forecaster import Forecaster
41 from greykite.framework.templates.model_templates import ModelTemplateEnum
42 from greykite.framework.utils.result_summary import summarize_grid_search_results
43 from greykite.common.viz.timeseries_plotting import plot_multivariate
44 warnings.filterwarnings("ignore")

Function to load and prepare data

This is the code to upload and prepare the daily bike-sharing data in Washington DC.

51 def prepare_bikesharing_data():
52     """Loads bike-sharing data and adds proper regressors."""
53     dl = DataLoaderTS()
54     agg_func = {"count": "sum", "tmin": "mean", "tmax": "mean", "pn": "mean"}
55     df = dl.load_bikesharing(agg_freq="daily", agg_func=agg_func)
56
57     # There are some zero values which cause issue for MAPE
58     # This adds a small number to all data to avoid that issue
59     value_col = "count"
60     df[value_col] += 10
61     # We drop last value as data might be incorrect as original data is hourly
62     df.drop(df.tail(1).index, inplace=True)
63     # We only use data from 2018 for demonstration purposes (run time is shorter)
64     df = df.loc[df["ts"] > "2018-01-01"]
65     df.reset_index(drop=True, inplace=True)
66
67     print(f"\n df.tail(): \n {df.tail()}")
68
69     # Creates useful regressors from existing raw regressors
70     df["bin_pn"] = (df["pn"] > 5).map(float)
71     df["bin_heavy_pn"] = (df["pn"] > 20).map(float)
72     df.columns = [
73         "ts",
74         value_col,
75         "regressor_tmin",
76         "regressor_tmax",
77         "regressor_pn",
78         "regressor_bin_pn",
79         "regressor_bin_heavy_pn"]
80
81     forecast_horizon = 7
82     train_df = df.copy()
83     test_df = df.tail(forecast_horizon).reset_index(drop=True)
84     # When using the pipeline (as done in the ``fit_forecast`` below),
85     # fitting and prediction are done in one step
86     # Therefore for demonstration purpose we remove the response values of last 7 days.
87     # This is needed because we are using regressors,
88     # and future regressor data must be augmented to ``df``.
89     # We mimic that by removal of the values of the response.
90     train_df.loc[(len(train_df) - forecast_horizon):len(train_df), value_col] = None
91
92     print(f"train_df shape: \n {train_df.shape}")
93     print(f"test_df shape: \n {test_df.shape}")
94     print(f"train_df.tail(14): \n {train_df.tail(14)}")
95     print(f"test_df: \n {test_df}")
96
97     return {
98         "train_df": train_df,
99         "test_df": test_df}

Function to fit silverkite

This is the code for fitting a silverkite model to the data.

106 def fit_forecast(
107         df,
108         time_col,
109         value_col):
110     """Fits a daily model for this use case.
111     The daily model is a generic silverkite model with regressors."""
112
113     meta_data_params = MetadataParam(
114         time_col=time_col,
115         value_col=value_col,
116         freq="D",
117     )
118
119     # Autoregression to be used in the function
120     autoregression = {
121         "autoreg_dict": {
122             "lag_dict": {"orders": [1, 2, 3]},
123             "agg_lag_dict": {
124                 "orders_list": [[7, 7*2, 7*3]],
125                 "interval_list": [(1, 7), (8, 7*2)]},
126             "series_na_fill_func": lambda s: s.bfill().ffill()},
127         "fast_simulation": True
128     }
129
130     # Changepoints configuration
131     # The config includes changepoints both in trend and seasonality
132     changepoints = {
133         "changepoints_dict": {
134             "method": "auto",
135             "yearly_seasonality_order": 15,
136             "resample_freq": "2D",
137             "actual_changepoint_min_distance": "100D",
138             "potential_changepoint_distance": "50D",
139             "no_changepoint_distance_from_end": "50D"},
140         "seasonality_changepoints_dict": {
141             "method": "auto",
142             "yearly_seasonality_order": 15,
143             "resample_freq": "2D",
144             "actual_changepoint_min_distance": "100D",
145             "potential_changepoint_distance": "50D",
146             "no_changepoint_distance_from_end": "50D"}
147         }
148
149     regressor_cols = [
150         "regressor_tmin",
151         "regressor_bin_pn",
152         "regressor_bin_heavy_pn",
153     ]
154
155     # Model parameters
156     model_components = ModelComponentsParam(
157         growth=dict(growth_term="linear"),
158         seasonality=dict(
159             yearly_seasonality=[15],
160             quarterly_seasonality=[False],
161             monthly_seasonality=[False],
162             weekly_seasonality=[7],
163             daily_seasonality=[False]
164         ),
165         custom=dict(
166             fit_algorithm_dict=dict(fit_algorithm="ridge"),
167             extra_pred_cols=None,
168             normalize_method="statistical"
169         ),
170         regressors=dict(regressor_cols=regressor_cols),
171         autoregression=autoregression,
172         uncertainty=dict(uncertainty_dict=None),
173         events=dict(holiday_lookup_countries=["US"]),
174         changepoints=changepoints
175      )
176
177     # Evaluation is done on same ``forecast_horizon`` as desired for output
178     evaluation_period_param = EvaluationPeriodParam(
179         test_horizon=None,
180         cv_horizon=forecast_horizon,
181         cv_min_train_periods=365*2,
182         cv_expanding_window=True,
183         cv_use_most_recent_splits=False,
184         cv_periods_between_splits=None,
185         cv_periods_between_train_test=0,
186         cv_max_splits=5,
187     )
188
189     # Runs the forecast model using "SILVERKITE" template
190     forecaster = Forecaster()
191     result = forecaster.run_forecast_config(
192         df=df,
193         config=ForecastConfig(
194             model_template=ModelTemplateEnum.SILVERKITE.name,
195             coverage=0.95,
196             forecast_horizon=forecast_horizon,
197             metadata_param=meta_data_params,
198             evaluation_period_param=evaluation_period_param,
199             model_components_param=model_components
200         )
201     )
202
203     # Gets cross-validation results
204     grid_search = result.grid_search
205     cv_results = summarize_grid_search_results(
206         grid_search=grid_search,
207         decimals=2,
208         cv_report_metrics=None)
209     cv_results = cv_results.transpose()
210     cv_results = pd.DataFrame(cv_results)
211     cv_results.columns = ["err_value"]
212     cv_results["err_name"] = cv_results.index
213     cv_results = cv_results.reset_index(drop=True)
214     cv_results = cv_results[["err_name", "err_value"]]
215
216     print(f"\n cv_results: \n {cv_results}")
217
218     return result

Loads and prepares data

The data is loaded and some information about the input data is printed. We use the number of daily rented bikes in Washington DC over time. The data is augmented with weather data (precipitation, min/max daily temperature).

226 data = prepare_bikesharing_data()

Out:

 df.tail():
             ts  count  tmin  tmax   pn
602 2019-08-27  12216  17.2  26.7  0.0
603 2019-08-28  11401  18.3  27.8  0.0
604 2019-08-29  12685  16.7  28.9  0.0
605 2019-08-30  12097  14.4  32.8  0.0
606 2019-08-31  11281  17.8  31.1  0.0
train_df shape:
 (607, 7)
test_df shape:
 (7, 7)
train_df.tail(14):
             ts    count  regressor_tmin  regressor_tmax  regressor_pn  regressor_bin_pn  regressor_bin_heavy_pn
593 2019-08-18   9655.0            22.2            35.6           0.3               0.0                     0.0
594 2019-08-19  10579.0            21.1            37.2           0.0               0.0                     0.0
595 2019-08-20   8898.0            22.2            36.1           0.0               0.0                     0.0
596 2019-08-21  11648.0            21.7            35.0           1.8               0.0                     0.0
597 2019-08-22  11724.0            21.7            35.0          30.7               1.0                     1.0
598 2019-08-23   8158.0            17.8            23.3           1.8               0.0                     0.0
599 2019-08-24  12475.0            16.7            26.1           0.0               0.0                     0.0
600 2019-08-25      NaN            15.6            26.7           0.0               0.0                     0.0
601 2019-08-26      NaN            17.2            25.0           0.0               0.0                     0.0
602 2019-08-27      NaN            17.2            26.7           0.0               0.0                     0.0
603 2019-08-28      NaN            18.3            27.8           0.0               0.0                     0.0
604 2019-08-29      NaN            16.7            28.9           0.0               0.0                     0.0
605 2019-08-30      NaN            14.4            32.8           0.0               0.0                     0.0
606 2019-08-31      NaN            17.8            31.1           0.0               0.0                     0.0
test_df:
           ts  count  regressor_tmin  regressor_tmax  regressor_pn  regressor_bin_pn  regressor_bin_heavy_pn
0 2019-08-25  11634            15.6            26.7           0.0               0.0                     0.0
1 2019-08-26  11747            17.2            25.0           0.0               0.0                     0.0
2 2019-08-27  12216            17.2            26.7           0.0               0.0                     0.0
3 2019-08-28  11401            18.3            27.8           0.0               0.0                     0.0
4 2019-08-29  12685            16.7            28.9           0.0               0.0                     0.0
5 2019-08-30  12097            14.4            32.8           0.0               0.0                     0.0
6 2019-08-31  11281            17.8            31.1           0.0               0.0                     0.0

Fits model to daily data

In this step we fit a silverkite model to the data which uses weather regressors, holidays, auto-regression etc.

233 df = data["train_df"]
234 time_col = "ts"
235 value_col = "count"
236 forecast_horizon = 7
237
238 result = fit_forecast(
239     df=df,
240     time_col=time_col,
241     value_col=value_col)
242 trained_estimator = result.model[-1]
243 # Checks model coefficients and p-values
244 print("\n Model Summary:")
245 print(trained_estimator.summary())

Out:

Fitting 1 folds for each of 1 candidates, totalling 1 fits

 cv_results:
                                              err_name                                          err_value
0                                      rank_test_MAPE                                                  1
1                                      mean_test_MAPE                                              10.28
2                                     split_test_MAPE                                           (10.28,)
3                                     mean_train_MAPE                                              21.71
4                                              params                                                 []
5                 param_estimator__yearly_seasonality                                                 15
6                 param_estimator__weekly_seasonality                                                  7
7                   param_estimator__uncertainty_dict                                               None
8                  param_estimator__training_fraction                                               None
9                  param_estimator__train_test_thresh                                               None
10                   param_estimator__time_properties  {'period': 86400, 'simple_freq': SimpleTimeFre...
11                    param_estimator__simulation_num                                                 10
12     param_estimator__seasonality_changepoints_dict  {'method': 'auto', 'yearly_seasonality_order':...
13                  param_estimator__remove_intercept                                              False
14                    param_estimator__regressor_cols  [regressor_tmin, regressor_bin_pn, regressor_b...
15             param_estimator__regression_weight_col                                               None
16             param_estimator__quarterly_seasonality                                              False
17              param_estimator__origin_for_time_vars                                               None
18                  param_estimator__normalize_method                                        statistical
19               param_estimator__monthly_seasonality                                              False
20              param_estimator__min_admissible_value                                               None
21  param_estimator__max_weekly_seas_interaction_o...                                                  2
22  param_estimator__max_daily_seas_interaction_order                                                  5
23              param_estimator__max_admissible_value                                               None
24             param_estimator__lagged_regressor_dict                                               None
25      param_estimator__holidays_to_model_separately                                               auto
26         param_estimator__holiday_pre_post_num_dict                                               None
27              param_estimator__holiday_pre_num_days                                                  2
28             param_estimator__holiday_post_num_days                                                  2
29          param_estimator__holiday_lookup_countries                                               [US]
30                       param_estimator__growth_term                                             linear
31                param_estimator__fit_algorithm_dict                         {'fit_algorithm': 'ridge'}
32              param_estimator__feature_sets_enabled                                               auto
33                   param_estimator__fast_simulation                                               True
34                   param_estimator__extra_pred_cols                                               None
35                param_estimator__explicit_pred_cols                                               None
36                    param_estimator__drop_pred_cols                                               None
37                 param_estimator__daily_seasonality                                              False
38        param_estimator__daily_event_shifted_effect                                               None
39       param_estimator__daily_event_neighbor_impact                                               None
40               param_estimator__daily_event_df_dict                                               None
41                 param_estimator__changepoints_dict  {'method': 'auto', 'yearly_seasonality_order':...
42                      param_estimator__autoreg_dict  {'lag_dict': {'orders': [1, 2, 3]}, 'agg_lag_d...
43                  param_estimator__auto_seasonality                                              False
44                      param_estimator__auto_holiday                                              False
45                       param_estimator__auto_growth                                              False
46                                   split_train_MAPE                                           (21.71,)
47                                      mean_fit_time                                              11.44
48                                       std_fit_time                                                0.0
49                                    mean_score_time                                              22.95
50                                     std_score_time                                                0.0
51                                   split0_test_MAPE                                              10.28
52                                      std_test_MAPE                                                0.0
53                                  split0_train_MAPE                                              21.71
54                                     std_train_MAPE                                                0.0

 Model Summary:
================================ Model Summary =================================

Number of observations: 600,   Number of features: 134
Method: Ridge regression
Number of nonzero features: 133
Regularization parameter: 174.3

Residuals:
         Min           1Q       Median           3Q          Max
     -7532.0       -907.2        85.25        986.3       7618.0

            Pred_col Estimate Std. Err Pr(>)_boot sig. code               95%CI
           Intercept   9633.0    73.89     <2e-16       ***    (9518.0, 9788.0)
events_Christmas Day   -144.7    74.95     <2e-16       *** (-184.8, 9.867e-28)
 events_C...as Day-1   -135.8    70.13     <2e-16       *** (-175.9, 8.573e-28)
 events_C...as Day-2    -51.6    29.47      0.010         * (-82.11, 8.971e-28)
 events_C...as Day+1   -72.73    40.14     <2e-16       *** (-105.0, 9.241e-28)
 events_C...as Day+2   -23.44    17.99      0.110           (-54.61, 1.070e-27)
 events_I...ence Day    45.49    22.73      0.016         * (-1.097e-27, 78.67)
 events_I...ce Day-1    -27.7    20.49      0.142               (-64.51, 11.58)
 events_I...ce Day-2   -14.33     29.2      0.582               (-66.92, 42.99)
 events_I...ce Day+1   -16.08    15.31      0.248               (-42.61, 16.35)
 events_I...ce Day+2   -65.19     44.9      0.096         .     (-137.0, 12.05)
    events_Labor Day    -61.5    33.47      0.006        ** (-87.95, 9.708e-28)
  events_Labor Day-1    92.35     48.5     <2e-16       *** (-8.479e-28, 127.7)
  events_Labor Day-2   -59.02    32.73      0.006        ** (-91.87, 9.757e-28)
  events_Labor Day+1   -51.11    28.74      0.020         * (-83.07, 8.825e-28)
  events_Labor Day+2   -3.722    11.14      0.490               (-30.07, 18.41)
 events_Memorial Day   -41.91    20.48      0.024         * (-73.42, 1.147e-27)
 events_M...al Day-1    125.5    76.81      0.048         * (-1.136e-27, 240.3)
 events_M...al Day-2   -28.66    20.23      0.134                (-63.1, 9.638)
 events_M...al Day+1   -57.03    52.58      0.326               (-139.8, 31.93)
 events_M...al Day+2   -35.61    19.67      0.030         * (-64.91, 1.310e-27)
events_New Years Day   -46.77    27.12      0.014         * (-74.62, 9.757e-28)
 events_N...rs Day-1   -42.77    25.87      0.032         * (-73.62, 9.662e-28)
 events_N...rs Day-2    7.616    11.58      0.378               (-18.38, 30.21)
 events_N...rs Day+1   -23.72    33.53      0.600               (-88.15, 30.91)
 events_N...rs Day+2    33.46    33.29      0.330               (-33.38, 87.92)
        events_Other   -109.0    52.65      0.048         *    (-215.9, -3.222)
      events_Other-1    43.86    43.91      0.326               (-58.32, 113.1)
      events_Other-2   -93.26    42.67      0.032         *    (-170.1, -5.821)
      events_Other+1    48.48    43.58      0.294               (-33.27, 133.0)
      events_Other+2    -31.5    64.32      0.624               (-164.3, 89.07)
 events_Thanksgiving   -184.1    94.95     <2e-16       *** (-229.3, 8.093e-28)
 events_T...giving-1    -46.3    26.67      0.008        ** (-70.37, 7.000e-28)
 events_T...giving-2    2.582    10.16      0.570                (-21.51, 23.0)
 events_T...giving+1   -128.7    68.18     <2e-16       *** (-168.7, 8.405e-28)
 events_T...giving+2   -52.92     32.2      0.030         * (-89.96, 7.698e-28)
 events_Veterans Day   -31.03    20.53      0.090         . (-58.95, 1.294e-27)
 events_V...ns Day-1   -42.18    25.97      0.036         * (-75.49, 8.799e-28)
 events_V...ns Day-2   -77.52    40.11     <2e-16       *** (-109.6, 7.602e-28)
 events_V...ns Day+1     24.4    18.54      0.156                (-12.5, 54.45)
 events_V...ns Day+2    1.288    13.84      0.626                (-34.22, 26.1)
       str_dow_2-Tue    19.98    28.02      0.500               (-32.04, 72.93)
       str_dow_3-Wed    20.59    23.84      0.412               (-27.51, 64.33)
       str_dow_4-Thu    28.45    27.59      0.278                (-29.2, 81.47)
       str_dow_5-Fri    42.23    32.73      0.218               (-21.38, 98.27)
       str_dow_6-Sat   -9.575    36.02      0.778               (-75.66, 63.68)
       str_dow_7-Sun   -105.6    30.48      0.002        **    (-164.1, -45.57)
      regressor_tmin    599.7    62.55     <2e-16       ***      (446.7, 697.5)
    regressor_bin_pn   -836.8     62.6     <2e-16       ***    (-934.8, -687.2)
 regresso...heavy_pn   -363.7    87.44     <2e-16       ***    (-539.3, -192.1)
                 ct1   -7.363    30.67      0.840               (-76.78, 48.19)
      is_weekend:ct1   -13.75    24.55      0.572               (-68.44, 29.78)
   str_dow_2-Tue:ct1    20.77     24.2      0.368               (-28.55, 71.35)
   str_dow_3-Wed:ct1    16.21    20.44      0.428               (-23.91, 58.72)
   str_dow_4-Thu:ct1    3.699    24.25      0.882               (-40.43, 52.44)
   str_dow_5-Fri:ct1    8.268    27.81      0.786               (-44.75, 65.17)
   str_dow_6-Sat:ct1    17.78    34.23      0.620               (-53.82, 75.82)
   str_dow_7-Sun:ct1   -36.41    28.85      0.216                (-89.24, 19.7)
   cp0_2018_07_21_00   -155.3    26.11     <2e-16       ***    (-200.6, -98.09)
 is_weeke...07_21_00   -26.32     28.9      0.366                (-81.6, 33.33)
 str_dow_...07_21_00   -38.36    34.63      0.246               (-104.2, 31.29)
 str_dow_...07_21_00   -34.16    22.91      0.138                (-77.5, 13.91)
 str_dow_...07_21_00   -13.82    33.41      0.690               (-78.71, 47.62)
 str_dow_...07_21_00   -88.42    41.78      0.036         *      (-165.3, 1.37)
 str_dow_...07_21_00    16.16    42.66      0.684               (-75.19, 104.3)
 str_dow_...07_21_00   -52.69    42.95      0.232               (-131.6, 35.51)
 ct1:sin1_tow_weekly     20.6    21.44      0.328                (-17.0, 66.06)
 ct1:cos1_tow_weekly   -34.91    23.95      0.146               (-79.08, 10.08)
 ct1:sin2_tow_weekly    30.22    22.03      0.166                (-8.901, 72.1)
 ct1:cos2_tow_weekly   -30.71    24.11      0.180               (-80.64, 18.59)
 cp0_2018...w_weekly   -2.956    27.37      0.922               (-56.38, 46.37)
 cp0_2018...w_weekly   -25.93    33.53      0.424               (-87.89, 39.71)
 cp0_2018...w_weekly   -12.03    33.01      0.704               (-73.65, 50.75)
 cp0_2018...w_weekly   -61.05    31.05      0.054         .    (-124.4, -2.265)
     sin1_tow_weekly    60.01     26.8      0.024         *      (9.316, 117.1)
     cos1_tow_weekly   -57.42    30.95      0.058         .     (-113.2, 4.766)
     sin2_tow_weekly    59.25    30.36      0.052         .     (-2.357, 117.3)
     cos2_tow_weekly    27.97    30.99      0.352               (-37.52, 85.31)
     sin3_tow_weekly    8.528    29.24      0.770               (-45.01, 65.81)
     cos3_tow_weekly    35.31    31.45      0.282               (-26.83, 96.46)
     sin4_tow_weekly   -8.528    29.24      0.770               (-65.81, 45.01)
     cos4_tow_weekly    35.31    31.45      0.282               (-26.83, 96.46)
     sin5_tow_weekly   -59.25    30.36      0.052         .     (-117.3, 2.357)
     cos5_tow_weekly    27.97    30.99      0.352               (-37.52, 85.31)
     sin6_tow_weekly   -60.01     26.8      0.024         *    (-117.1, -9.316)
     cos6_tow_weekly   -57.42    30.95      0.058         .     (-113.2, 4.766)
     sin7_tow_weekly    63.68    24.77      0.016         *      (19.59, 117.5)
     cos7_tow_weekly       0.       0.      1.000                      (0., 0.)
     sin1_ct1_yearly     14.3    48.56      0.768               (-86.86, 107.8)
     cos1_ct1_yearly   -524.4    39.51     <2e-16       ***    (-589.7, -436.9)
     sin2_ct1_yearly   -206.7    51.84     <2e-16       ***    (-313.3, -106.7)
     cos2_ct1_yearly   -85.62     53.0      0.112               (-182.6, 21.98)
     sin3_ct1_yearly   -73.14    52.12      0.172               (-179.9, 21.69)
     cos3_ct1_yearly   -41.59    51.32      0.438               (-125.3, 64.83)
     sin4_ct1_yearly    33.59    49.09      0.532               (-65.17, 124.1)
     cos4_ct1_yearly     36.9    52.16      0.482               (-52.18, 146.3)
     sin5_ct1_yearly   -54.65    53.42      0.294               (-151.5, 59.96)
     cos5_ct1_yearly   -57.41    56.79      0.316               (-168.5, 53.99)
     sin6_ct1_yearly   -17.45    53.74      0.774               (-120.6, 86.91)
     cos6_ct1_yearly   -194.4    58.25     <2e-16       ***    (-296.1, -72.46)
     sin7_ct1_yearly   -29.26     55.9      0.582               (-135.8, 86.51)
     cos7_ct1_yearly    59.93    60.79      0.352               (-49.52, 175.3)
     sin8_ct1_yearly    11.46    57.74      0.836               (-103.7, 121.9)
     cos8_ct1_yearly    22.87    60.04      0.686               (-84.53, 137.4)
     sin9_ct1_yearly   -20.75    60.91      0.748               (-142.4, 95.79)
     cos9_ct1_yearly    -36.6    56.24      0.514               (-138.7, 79.24)
    sin10_ct1_yearly    98.24    57.25      0.094         .     (-18.03, 208.3)
    cos10_ct1_yearly   -19.53    52.35      0.696               (-108.9, 105.4)
    sin11_ct1_yearly   -15.28    55.27      0.780               (-132.0, 81.42)
    cos11_ct1_yearly    1.071    61.27      0.986               (-124.6, 120.5)
    sin12_ct1_yearly   -46.26    53.78      0.392                (-152.5, 54.0)
    cos12_ct1_yearly    126.5    65.47      0.062         .     (-15.14, 246.2)
    sin13_ct1_yearly   -78.99    55.71      0.142               (-192.3, 25.18)
    cos13_ct1_yearly   -50.87     58.4      0.378               (-167.9, 61.57)
    sin14_ct1_yearly   -50.02    57.75      0.378               (-173.3, 56.79)
    cos14_ct1_yearly    -25.4    56.43      0.680               (-128.3, 79.32)
    sin15_ct1_yearly   -161.4    57.85      0.008        **    (-272.7, -53.63)
    cos15_ct1_yearly   -38.67    56.92      0.504               (-133.4, 86.56)
 sin1_con...07_21_00    37.09    48.08      0.438               (-65.83, 124.0)
 cos1_con...07_21_00   -136.3    51.51      0.012         *    (-246.0, -46.07)
 sin2_con...07_21_00    30.48    44.87      0.500               (-57.98, 113.8)
 cos2_con...07_21_00   -172.3    60.19      0.002        **    (-277.0, -46.76)
 sin3_con...07_21_00   -29.37    53.95      0.534               (-136.2, 81.71)
 cos3_con...07_21_00    -4.81    49.64      0.896               (-97.93, 93.85)
 sin4_con...07_21_00   -18.05    48.79      0.698               (-116.7, 72.59)
 cos4_con...07_21_00     42.7    54.46      0.432               (-78.26, 130.6)
 sin5_con...07_21_00    46.26    59.68      0.412               (-77.74, 166.1)
 cos5_con...07_21_00    69.93    49.19      0.160               (-23.68, 171.8)
              y_lag1    608.2    82.55     <2e-16       ***      (421.6, 749.2)
              y_lag2    87.56    64.72      0.178               (-13.06, 223.5)
              y_lag3    157.6    71.12      0.028         *       (31.9, 303.0)
    y_avglag_7_14_21    333.6    55.86     <2e-16       ***      (220.2, 448.8)
     y_avglag_1_to_7    235.1    44.19     <2e-16       ***      (157.0, 331.9)
    y_avglag_8_to_14    337.7    52.37     <2e-16       ***      (233.5, 447.6)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.7793,   Adjusted R-squared: 0.7498
F-statistic: 22.607 on 70 and 528 DF,   p-value: 1.110e-16
Model AIC: 12945.0,   model BIC: 13260.0

WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.
WARNING: the following columns have estimated coefficients equal to zero, while ridge is not supposed to have zero estimates. This is probably because these columns are degenerate in the design matrix. Make sure these columns do not have constant values.
['cos7_tow_weekly']
WARNING: the following columns are degenerate, do you really want to include them in your model? This may cause some of them to show unrealistic significance. Consider using the `drop_degenerate` transformer.
['Intercept', 'cos7_tow_weekly']

Grouping of variables

Regex expressions are used to group variables in the breakdown plot. Each group is given in one key of this dictionary. The grouping is done using variable names and for each group multiple regex are given. For each group, variables that satisfy EITHER regex are chosen. Note that this grouping assumes that regressor variables start with “regressor_”. Also note that the order of this grouping matters (Python treats the dictionary as ordered in 3.6+). That means the variables chosen using regex in top groups will not be picked up again. If some variables do not satisfy any of the groupings, they will be grouped into “OTHER”. The following breakdown dictionary should work for many use cases. However, the users can customize it as needed.

Two alternative dictionaries are included in constants in the variables DEFAULT_COMPONENTS_REGEX_DICT and DETAILED_SEASONALITY_COMPONENTS_REGEX_DICT.

265 grouping_regex_patterns_dict = {
266     "regressors": "regressor_.*",  # regressor effects
267     "AR": ".*lag",  # autoregression component
268     "events": ".*events_.*",  # events and holidays
269     "seasonality": ".*quarter.*|.*month.*|.*C\(dow.*|.*C\(dow_hr.*|sin.*|cos.*|.*doq.*|.*dom.*|.*str_dow.*|.*is_weekend.*|.*tow_weekly.*",  # seasonality
270     "trend": "ct1|ct2|ct_sqrt|ct3|ct_root3|.*changepoint.*",  # long term trend (includes changepoints)
271 }

Creates forecast breakdown

This is generated for observed data plus the prediction data (available in df). Each component is centered around zero and the sum of all components is equal to forecast.

279 breakdown_result = trained_estimator.forecast_breakdown(
280     grouping_regex_patterns_dict=grouping_regex_patterns_dict,
281     center_components=True,
282     plt_title="forecast breakdowns")
283 forecast_breakdown_df = breakdown_result["breakdown_df_with_index_col"]
284 forecast_components_fig = breakdown_result["breakdown_fig"]
285 plotly.io.show(forecast_components_fig)

Standardization of the components

Next we provide a more “standardized” view of the breakdown. This is achieved by dividing all components by observed absolute value of the metric. By doing so, intercept should be mapped to 1 and the y-axis changes can be viewed relative to the average magnitude of the series. The sum of all components at each time point will be equal to “forecast / obs_abs_mean”.

296 column_grouping_result = breakdown_result["column_grouping_result"]
297 component_cols = list(grouping_regex_patterns_dict.keys())
298 forecast_breakdown_stdzd_df = forecast_breakdown_df.copy()
299 obs_abs_mean = abs(df[value_col]).mean()
300 for col in component_cols + ["Intercept", "OTHER"]:
301     if col in forecast_breakdown_stdzd_df.columns:
302         forecast_breakdown_stdzd_df[col] /= obs_abs_mean
303 forecast_breakdown_stdzd_fig = plot_multivariate(
304     df=forecast_breakdown_stdzd_df,
305     x_col=time_col,
306     title="forecast breakdowns divided by mean of abs value of response",
307     ylabel="component")
308 forecast_breakdown_stdzd_fig.update_layout(yaxis_range=[-1.1, 1.1])
309 plotly.io.show(forecast_breakdown_stdzd_fig)

Breaking down the predictions

Next we perform a prediction and generate a breakdown plot for that prediction.

315 test_df = data["test_df"].reset_index()
316 test_df[value_col] = None
317 print(f"\n test_df: \n {test_df}")
318 pred_df = trained_estimator.predict(test_df)
319 forecast_x_mat = trained_estimator.forecast_x_mat
320 # Generate the breakdown plot
321 breakdown_result = trained_estimator.forecast_breakdown(
322     grouping_regex_patterns_dict=grouping_regex_patterns_dict,
323     forecast_x_mat=forecast_x_mat,
324     time_values=pred_df[time_col])
325
326 breakdown_fig = breakdown_result["breakdown_fig"]
327 plotly.io.show(breakdown_fig)

Out:

 test_df:
    index         ts count  ...  regressor_pn  regressor_bin_pn  regressor_bin_heavy_pn
0      0 2019-08-25  None  ...           0.0               0.0                     0.0
1      1 2019-08-26  None  ...           0.0               0.0                     0.0
2      2 2019-08-27  None  ...           0.0               0.0                     0.0
3      3 2019-08-28  None  ...           0.0               0.0                     0.0
4      4 2019-08-29  None  ...           0.0               0.0                     0.0
5      5 2019-08-30  None  ...           0.0               0.0                     0.0
6      6 2019-08-31  None  ...           0.0               0.0                     0.0

[7 rows x 8 columns]

Demonstrating a scenario-based breakdown

We artificially inject a “bad weather” day into test data on the second day of prediction. This is done to observe if the breakdown plot captures a decrease in the collective regressors’ effect. The impact of the change in the regressor values can be clearly seen in the updated breakdown.

337 # Altering the test data.
338 # We alter the normal weather conditions on the second day to heavy precipitation and low temperature.
339 test_df["regressor_bin_pn"] = [0, 1, 0, 0, 0, 0, 0]
340 test_df["regressor_bin_heavy_pn"] = [0, 1, 0, 0, 0, 0, 0]
341 test_df["regressor_tmin"] = [15, 0, 15, 15,  15, 15, 15]
342 print(f"altered test_df: \n {test_df}")
343
344 # Gets predictions and the design matrix used during predictions.
345 pred_df = trained_estimator.predict(test_df.reset_index())
346 forecast_x_mat = trained_estimator.forecast_x_mat
347
348 # Generates the breakdown plot.
349 breakdown_result = trained_estimator.forecast_breakdown(
350     grouping_regex_patterns_dict=grouping_regex_patterns_dict,
351     forecast_x_mat=forecast_x_mat,
352     time_values=pred_df[time_col])
353 breakdown_fig = breakdown_result["breakdown_fig"]
354 plotly.io.show(breakdown_fig)

Out:

altered test_df:
    index         ts count  ...  regressor_pn  regressor_bin_pn  regressor_bin_heavy_pn
0      0 2019-08-25  None  ...           0.0                 0                       0
1      1 2019-08-26  None  ...           0.0                 1                       1
2      2 2019-08-27  None  ...           0.0                 0                       0
3      3 2019-08-28  None  ...           0.0                 0                       0
4      4 2019-08-29  None  ...           0.0                 0                       0
5      5 2019-08-30  None  ...           0.0                 0                       0
6      6 2019-08-31  None  ...           0.0                 0                       0

[7 rows x 8 columns]

Total running time of the script: ( 1 minutes 8.661 seconds)

Gallery generated by Sphinx-Gallery