Interpretability

Silverkite generates easily interpretable forecasting models when using its default ML algorithms (e.g. Ridge). This is because after transforming the raw features to basis functions (transformed features), the model uses an additive structure. Silverkite can break down each forecast into various summable components e.g. long-term growth, seasonality, holidays, events, short-term growth (auto-regression), regressors impact etc.

The approach to generate these breakdowns consists of two steps:

Group the transformed variables into various meaningful groups.
Calculate the sum of the features multiplied by their regression coefficients within each group.

These breakdowns then can be used to answer questions such as:

Question 1: How is the forecast value generated?
Question 2: What is driving the change of the forecast as new data comes in?

Forecast components can also help us analyze model behavior and sensitivity. This is because while it is not feasible to compare a large set of features across two model settings, it can be quite practical and informative to compare a few well-defined components.

This tutorial discusses in detail the usage of forecast_breakdown and how to estimate forecast components using custom component dictionaries. Some of this functionality has been built in estimators using the method plot_components(...). An example of this usage is in the “Simple Forecast” tutorial in the Quick Start.

 # required imports
 import plotly
 import warnings
 import pandas as pd
 from greykite.framework.benchmark.data_loader_ts import DataLoaderTS
 from greykite.framework.templates.autogen.forecast_config import EvaluationPeriodParam
 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
 from greykite.framework.templates.autogen.forecast_config import MetadataParam
 from greykite.framework.templates.autogen.forecast_config import ModelComponentsParam
 from greykite.framework.templates.forecaster import Forecaster
 from greykite.framework.templates.model_templates import ModelTemplateEnum
 from greykite.framework.utils.result_summary import summarize_grid_search_results
 from greykite.common.viz.timeseries_plotting import plot_multivariate
 warnings.filterwarnings("ignore")

Function to load and prepare data

This is the code to upload and prepare the daily bike-sharing data in Washington DC.

 def prepare_bikesharing_data():
     """Loads bike-sharing data and adds proper regressors."""
     dl = DataLoaderTS()
     agg_func = {"count": "sum", "tmin": "mean", "tmax": "mean", "pn": "mean"}
     df = dl.load_bikesharing(agg_freq="daily", agg_func=agg_func)

     # There are some zero values which cause issue for MAPE
     # This adds a small number to all data to avoid that issue
     value_col = "count"
     df[value_col] += 10
     # We drop last value as data might be incorrect as original data is hourly
     df.drop(df.tail(1).index, inplace=True)
     # We only use data from 2018 for demonstration purposes (run time is shorter)
     df = df.loc[df["ts"] > "2018-01-01"]
     df.reset_index(drop=True, inplace=True)

     print(f"\n df.tail(): \n {df.tail()}")

     # Creates useful regressors from existing raw regressors
     df["bin_pn"] = (df["pn"] > 5).map(float)
     df["bin_heavy_pn"] = (df["pn"] > 20).map(float)
     df.columns = [
         "ts",
         value_col,
         "regressor_tmin",
         "regressor_tmax",
         "regressor_pn",
         "regressor_bin_pn",
         "regressor_bin_heavy_pn"]

     forecast_horizon = 7
     train_df = df.copy()
     test_df = df.tail(forecast_horizon).reset_index(drop=True)
     # When using the pipeline (as done in the ``fit_forecast`` below),
     # fitting and prediction are done in one step
     # Therefore for demonstration purpose we remove the response values of last 7 days.
     # This is needed because we are using regressors,
     # and future regressor data must be augmented to ``df``.
     # We mimic that by removal of the values of the response.
     train_df.loc[(len(train_df) - forecast_horizon):len(train_df), value_col] = None

     print(f"train_df shape: \n {train_df.shape}")
     print(f"test_df shape: \n {test_df.shape}")
     print(f"train_df.tail(14): \n {train_df.tail(14)}")
     print(f"test_df: \n {test_df}")

     return {
         "train_df": train_df,
         "test_df": test_df}

Function to fit silverkite

This is the code for fitting a silverkite model to the data.

 def fit_forecast(
         df,
         time_col,
         value_col):
     """Fits a daily model for this use case.
     The daily model is a generic silverkite model with regressors."""

     meta_data_params = MetadataParam(
         time_col=time_col,
         value_col=value_col,
         freq="D",
     )

     # Autoregression to be used in the function
     autoregression = {
         "autoreg_dict": {
             "lag_dict": {"orders": [1, 2, 3]},
             "agg_lag_dict": {
                 "orders_list": [[7, 7*2, 7*3]],
                 "interval_list": [(1, 7), (8, 7*2)]},
             "series_na_fill_func": lambda s: s.bfill().ffill()},
         "fast_simulation": True
     }

     # Changepoints configuration
     # The config includes changepoints both in trend and seasonality
     changepoints = {
         "changepoints_dict": {
             "method": "auto",
             "yearly_seasonality_order": 15,
             "resample_freq": "2D",
             "actual_changepoint_min_distance": "100D",
             "potential_changepoint_distance": "50D",
             "no_changepoint_distance_from_end": "50D"},
         "seasonality_changepoints_dict": {
             "method": "auto",
             "yearly_seasonality_order": 15,
             "resample_freq": "2D",
             "actual_changepoint_min_distance": "100D",
             "potential_changepoint_distance": "50D",
             "no_changepoint_distance_from_end": "50D"}
         }

     regressor_cols = [
         "regressor_tmin",
         "regressor_bin_pn",
         "regressor_bin_heavy_pn",
     ]

     # Model parameters
     model_components = ModelComponentsParam(
         growth=dict(growth_term="linear"),
         seasonality=dict(
             yearly_seasonality=[15],
             quarterly_seasonality=[False],
             monthly_seasonality=[False],
             weekly_seasonality=[7],
             daily_seasonality=[False]
         ),
         custom=dict(
             fit_algorithm_dict=dict(fit_algorithm="ridge"),
             extra_pred_cols=None,
             normalize_method="statistical"
         ),
         regressors=dict(regressor_cols=regressor_cols),
         autoregression=autoregression,
         uncertainty=dict(uncertainty_dict=None),
         events=dict(holiday_lookup_countries=["US"]),
         changepoints=changepoints
      )

     # Evaluation is done on same ``forecast_horizon`` as desired for output
     evaluation_period_param = EvaluationPeriodParam(
         test_horizon=None,
         cv_horizon=forecast_horizon,
         cv_min_train_periods=365*2,
         cv_expanding_window=True,
         cv_use_most_recent_splits=False,
         cv_periods_between_splits=None,
         cv_periods_between_train_test=0,
         cv_max_splits=5,
     )

     # Runs the forecast model using "SILVERKITE" template
     forecaster = Forecaster()
     result = forecaster.run_forecast_config(
         df=df,
         config=ForecastConfig(
             model_template=ModelTemplateEnum.SILVERKITE.name,
             coverage=0.95,
             forecast_horizon=forecast_horizon,
             metadata_param=meta_data_params,
             evaluation_period_param=evaluation_period_param,
             model_components_param=model_components
         )
     )

     # Gets cross-validation results
     grid_search = result.grid_search
     cv_results = summarize_grid_search_results(
         grid_search=grid_search,
         decimals=2,
         cv_report_metrics=None)
     cv_results = cv_results.transpose()
     cv_results = pd.DataFrame(cv_results)
     cv_results.columns = ["err_value"]
     cv_results["err_name"] = cv_results.index
     cv_results = cv_results.reset_index(drop=True)
     cv_results = cv_results[["err_name", "err_value"]]

     print(f"\n cv_results: \n {cv_results}")

     return result

Loads and prepares data

The data is loaded and some information about the input data is printed. We use the number of daily rented bikes in Washington DC over time. The data is augmented with weather data (precipitation, min/max daily temperature).

 data = prepare_bikesharing_data()

Out:

 df.tail():
             ts  count  tmin  tmax   pn
2019-08-27  12216  17.2  26.7  0.0
2019-08-28  11401  18.3  27.8  0.0
2019-08-29  12685  16.7  28.9  0.0
2019-08-30  12097  14.4  32.8  0.0
2019-08-31  11281  17.8  31.1  0.0
train_df shape:
 (607, 7)
test_df shape:
 (7, 7)
train_df.tail(14):
             ts    count  regressor_tmin  regressor_tmax  regressor_pn  regressor_bin_pn  regressor_bin_heavy_pn
2019-08-18   9655.0            22.2            35.6           0.3               0.0                     0.0
2019-08-19  10579.0            21.1            37.2           0.0               0.0                     0.0
2019-08-20   8898.0            22.2            36.1           0.0               0.0                     0.0
2019-08-21  11648.0            21.7            35.0           1.8               0.0                     0.0
2019-08-22  11724.0            21.7            35.0          30.7               1.0                     1.0
2019-08-23   8158.0            17.8            23.3           1.8               0.0                     0.0
2019-08-24  12475.0            16.7            26.1           0.0               0.0                     0.0
2019-08-25      NaN            15.6            26.7           0.0               0.0                     0.0
2019-08-26      NaN            17.2            25.0           0.0               0.0                     0.0
2019-08-27      NaN            17.2            26.7           0.0               0.0                     0.0
2019-08-28      NaN            18.3            27.8           0.0               0.0                     0.0
2019-08-29      NaN            16.7            28.9           0.0               0.0                     0.0
2019-08-30      NaN            14.4            32.8           0.0               0.0                     0.0
2019-08-31      NaN            17.8            31.1           0.0               0.0                     0.0
test_df:
           ts  count  regressor_tmin  regressor_tmax  regressor_pn  regressor_bin_pn  regressor_bin_heavy_pn
2019-08-25  11634            15.6            26.7           0.0               0.0                     0.0
2019-08-26  11747            17.2            25.0           0.0               0.0                     0.0
2019-08-27  12216            17.2            26.7           0.0               0.0                     0.0
2019-08-28  11401            18.3            27.8           0.0               0.0                     0.0
2019-08-29  12685            16.7            28.9           0.0               0.0                     0.0
2019-08-30  12097            14.4            32.8           0.0               0.0                     0.0
2019-08-31  11281            17.8            31.1           0.0               0.0                     0.0

Fits model to daily data

In this step we fit a silverkite model to the data which uses weather regressors, holidays, auto-regression etc.

 df = data["train_df"]
 time_col = "ts"
 value_col = "count"
 forecast_horizon = 7

 result = fit_forecast(
     df=df,
     time_col=time_col,
     value_col=value_col)
 trained_estimator = result.model[-1]
 # Checks model coefficients and p-values
 print("\n Model Summary:")
 print(trained_estimator.summary())

Out:

Fitting 1 folds for each of 1 candidates, totalling 1 fits

 cv_results:
                                              err_name                                          err_value
0                                      rank_test_MAPE                                                  1
1                                      mean_test_MAPE                                              10.28
2                                     split_test_MAPE                                           (10.28,)
3                                     mean_train_MAPE                                              21.71
4                                              params                                                 []
5                 param_estimator__yearly_seasonality                                                 15
6                 param_estimator__weekly_seasonality                                                  7
7                   param_estimator__uncertainty_dict                                               None
8                  param_estimator__training_fraction                                               None
9                  param_estimator__train_test_thresh                                               None
10                   param_estimator__time_properties  {'period': 86400, 'simple_freq': SimpleTimeFre...
11                    param_estimator__simulation_num                                                 10
12     param_estimator__seasonality_changepoints_dict  {'method': 'auto', 'yearly_seasonality_order':...
13                  param_estimator__remove_intercept                                              False
14                    param_estimator__regressor_cols  [regressor_tmin, regressor_bin_pn, regressor_b...
15             param_estimator__regression_weight_col                                               None
16             param_estimator__quarterly_seasonality                                              False
17              param_estimator__origin_for_time_vars                                               None
18                  param_estimator__normalize_method                                        statistical
19               param_estimator__monthly_seasonality                                              False
20              param_estimator__min_admissible_value                                               None
21  param_estimator__max_weekly_seas_interaction_o...                                                  2
22  param_estimator__max_daily_seas_interaction_order                                                  5
23              param_estimator__max_admissible_value                                               None
24             param_estimator__lagged_regressor_dict                                               None
25      param_estimator__holidays_to_model_separately                                               auto
26         param_estimator__holiday_pre_post_num_dict                                               None
27              param_estimator__holiday_pre_num_days                                                  2
28             param_estimator__holiday_post_num_days                                                  2
29          param_estimator__holiday_lookup_countries                                               [US]
30                       param_estimator__growth_term                                             linear
31                param_estimator__fit_algorithm_dict                         {'fit_algorithm': 'ridge'}
32              param_estimator__feature_sets_enabled                                               auto
33                   param_estimator__fast_simulation                                               True
34                   param_estimator__extra_pred_cols                                               None
35                param_estimator__explicit_pred_cols                                               None
36                    param_estimator__drop_pred_cols                                               None
37                 param_estimator__daily_seasonality                                              False
38        param_estimator__daily_event_shifted_effect                                               None
39       param_estimator__daily_event_neighbor_impact                                               None
40               param_estimator__daily_event_df_dict                                               None
41                 param_estimator__changepoints_dict  {'method': 'auto', 'yearly_seasonality_order':...
42                      param_estimator__autoreg_dict  {'lag_dict': {'orders': [1, 2, 3]}, 'agg_lag_d...
43                  param_estimator__auto_seasonality                                              False
44                      param_estimator__auto_holiday                                              False
45                       param_estimator__auto_growth                                              False
46                                   split_train_MAPE                                           (21.71,)
47                                      mean_fit_time                                              11.44
48                                       std_fit_time                                                0.0
49                                    mean_score_time                                              22.95
50                                     std_score_time                                                0.0
51                                   split0_test_MAPE                                              10.28
52                                      std_test_MAPE                                                0.0
53                                  split0_train_MAPE                                              21.71
54                                     std_train_MAPE                                                0.0

 Model Summary:
================================ Model Summary =================================

Number of observations: 600,   Number of features: 134
Method: Ridge regression
Number of nonzero features: 133
Regularization parameter: 174.3

Residuals:
         Min           1Q       Median           3Q          Max
     -7532.0       -907.2        85.25        986.3       7618.0

            Pred_col Estimate Std. Err Pr(>)_boot sig. code               95%CI
           Intercept   9633.0    73.89     <2e-16       ***    (9518.0, 9788.0)
events_Christmas Day   -144.7    74.95     <2e-16       *** (-184.8, 9.867e-28)
 events_C...as Day-1   -135.8    70.13     <2e-16       *** (-175.9, 8.573e-28)
 events_C...as Day-2    -51.6    29.47      0.010         * (-82.11, 8.971e-28)
 events_C...as Day+1   -72.73    40.14     <2e-16       *** (-105.0, 9.241e-28)
 events_C...as Day+2   -23.44    17.99      0.110           (-54.61, 1.070e-27)
 events_I...ence Day    45.49    22.73      0.016         * (-1.097e-27, 78.67)
 events_I...ce Day-1    -27.7    20.49      0.142               (-64.51, 11.58)
 events_I...ce Day-2   -14.33     29.2      0.582               (-66.92, 42.99)
 events_I...ce Day+1   -16.08    15.31      0.248               (-42.61, 16.35)
 events_I...ce Day+2   -65.19     44.9      0.096         .     (-137.0, 12.05)
    events_Labor Day    -61.5    33.47      0.006        ** (-87.95, 9.708e-28)
  events_Labor Day-1    92.35     48.5     <2e-16       *** (-8.479e-28, 127.7)
  events_Labor Day-2   -59.02    32.73      0.006        ** (-91.87, 9.757e-28)
  events_Labor Day+1   -51.11    28.74      0.020         * (-83.07, 8.825e-28)
  events_Labor Day+2   -3.722    11.14      0.490               (-30.07, 18.41)
 events_Memorial Day   -41.91    20.48      0.024         * (-73.42, 1.147e-27)
 events_M...al Day-1    125.5    76.81      0.048         * (-1.136e-27, 240.3)
 events_M...al Day-2   -28.66    20.23      0.134                (-63.1, 9.638)
 events_M...al Day+1   -57.03    52.58      0.326               (-139.8, 31.93)
 events_M...al Day+2   -35.61    19.67      0.030         * (-64.91, 1.310e-27)
events_New Years Day   -46.77    27.12      0.014         * (-74.62, 9.757e-28)
 events_N...rs Day-1   -42.77    25.87      0.032         * (-73.62, 9.662e-28)
 events_N...rs Day-2    7.616    11.58      0.378               (-18.38, 30.21)
 events_N...rs Day+1   -23.72    33.53      0.600               (-88.15, 30.91)
 events_N...rs Day+2    33.46    33.29      0.330               (-33.38, 87.92)
        events_Other   -109.0    52.65      0.048         *    (-215.9, -3.222)
      events_Other-1    43.86    43.91      0.326               (-58.32, 113.1)
      events_Other-2   -93.26    42.67      0.032         *    (-170.1, -5.821)
      events_Other+1    48.48    43.58      0.294               (-33.27, 133.0)
      events_Other+2    -31.5    64.32      0.624               (-164.3, 89.07)
 events_Thanksgiving   -184.1    94.95     <2e-16       *** (-229.3, 8.093e-28)
 events_T...giving-1    -46.3    26.67      0.008        ** (-70.37, 7.000e-28)
 events_T...giving-2    2.582    10.16      0.570                (-21.51, 23.0)
 events_T...giving+1   -128.7    68.18     <2e-16       *** (-168.7, 8.405e-28)
 events_T...giving+2   -52.92     32.2      0.030         * (-89.96, 7.698e-28)
 events_Veterans Day   -31.03    20.53      0.090         . (-58.95, 1.294e-27)
 events_V...ns Day-1   -42.18    25.97      0.036         * (-75.49, 8.799e-28)
 events_V...ns Day-2   -77.52    40.11     <2e-16       *** (-109.6, 7.602e-28)
 events_V...ns Day+1     24.4    18.54      0.156                (-12.5, 54.45)
 events_V...ns Day+2    1.288    13.84      0.626                (-34.22, 26.1)
       str_dow_2-Tue    19.98    28.02      0.500               (-32.04, 72.93)
       str_dow_3-Wed    20.59    23.84      0.412               (-27.51, 64.33)
       str_dow_4-Thu    28.45    27.59      0.278                (-29.2, 81.47)
       str_dow_5-Fri    42.23    32.73      0.218               (-21.38, 98.27)
       str_dow_6-Sat   -9.575    36.02      0.778               (-75.66, 63.68)
       str_dow_7-Sun   -105.6    30.48      0.002        **    (-164.1, -45.57)
      regressor_tmin    599.7    62.55     <2e-16       ***      (446.7, 697.5)
    regressor_bin_pn   -836.8     62.6     <2e-16       ***    (-934.8, -687.2)
 regresso...heavy_pn   -363.7    87.44     <2e-16       ***    (-539.3, -192.1)
                 ct1   -7.363    30.67      0.840               (-76.78, 48.19)
      is_weekend:ct1   -13.75    24.55      0.572               (-68.44, 29.78)
   str_dow_2-Tue:ct1    20.77     24.2      0.368               (-28.55, 71.35)
   str_dow_3-Wed:ct1    16.21    20.44      0.428               (-23.91, 58.72)
   str_dow_4-Thu:ct1    3.699    24.25      0.882               (-40.43, 52.44)
   str_dow_5-Fri:ct1    8.268    27.81      0.786               (-44.75, 65.17)
   str_dow_6-Sat:ct1    17.78    34.23      0.620               (-53.82, 75.82)
   str_dow_7-Sun:ct1   -36.41    28.85      0.216                (-89.24, 19.7)
   cp0_2018_07_21_00   -155.3    26.11     <2e-16       ***    (-200.6, -98.09)
 is_weeke...07_21_00   -26.32     28.9      0.366                (-81.6, 33.33)
 str_dow_...07_21_00   -38.36    34.63      0.246               (-104.2, 31.29)
 str_dow_...07_21_00   -34.16    22.91      0.138                (-77.5, 13.91)
 str_dow_...07_21_00   -13.82    33.41      0.690               (-78.71, 47.62)
 str_dow_...07_21_00   -88.42    41.78      0.036         *      (-165.3, 1.37)
 str_dow_...07_21_00    16.16    42.66      0.684               (-75.19, 104.3)
 str_dow_...07_21_00   -52.69    42.95      0.232               (-131.6, 35.51)
 ct1:sin1_tow_weekly     20.6    21.44      0.328                (-17.0, 66.06)
 ct1:cos1_tow_weekly   -34.91    23.95      0.146               (-79.08, 10.08)
 ct1:sin2_tow_weekly    30.22    22.03      0.166                (-8.901, 72.1)
 ct1:cos2_tow_weekly   -30.71    24.11      0.180               (-80.64, 18.59)
 cp0_2018...w_weekly   -2.956    27.37      0.922               (-56.38, 46.37)
 cp0_2018...w_weekly   -25.93    33.53      0.424               (-87.89, 39.71)
 cp0_2018...w_weekly   -12.03    33.01      0.704               (-73.65, 50.75)
 cp0_2018...w_weekly   -61.05    31.05      0.054         .    (-124.4, -2.265)
     sin1_tow_weekly    60.01     26.8      0.024         *      (9.316, 117.1)
     cos1_tow_weekly   -57.42    30.95      0.058         .     (-113.2, 4.766)
     sin2_tow_weekly    59.25    30.36      0.052         .     (-2.357, 117.3)
     cos2_tow_weekly    27.97    30.99      0.352               (-37.52, 85.31)
     sin3_tow_weekly    8.528    29.24      0.770               (-45.01, 65.81)
     cos3_tow_weekly    35.31    31.45      0.282               (-26.83, 96.46)
     sin4_tow_weekly   -8.528    29.24      0.770               (-65.81, 45.01)
     cos4_tow_weekly    35.31    31.45      0.282               (-26.83, 96.46)
     sin5_tow_weekly   -59.25    30.36      0.052         .     (-117.3, 2.357)
     cos5_tow_weekly    27.97    30.99      0.352               (-37.52, 85.31)
     sin6_tow_weekly   -60.01     26.8      0.024         *    (-117.1, -9.316)
     cos6_tow_weekly   -57.42    30.95      0.058         .     (-113.2, 4.766)
     sin7_tow_weekly    63.68    24.77      0.016         *      (19.59, 117.5)
     cos7_tow_weekly       0.       0.      1.000                      (0., 0.)
     sin1_ct1_yearly     14.3    48.56      0.768               (-86.86, 107.8)
     cos1_ct1_yearly   -524.4    39.51     <2e-16       ***    (-589.7, -436.9)
     sin2_ct1_yearly   -206.7    51.84     <2e-16       ***    (-313.3, -106.7)
     cos2_ct1_yearly   -85.62     53.0      0.112               (-182.6, 21.98)
     sin3_ct1_yearly   -73.14    52.12      0.172               (-179.9, 21.69)
     cos3_ct1_yearly   -41.59    51.32      0.438               (-125.3, 64.83)
     sin4_ct1_yearly    33.59    49.09      0.532               (-65.17, 124.1)
     cos4_ct1_yearly     36.9    52.16      0.482               (-52.18, 146.3)
     sin5_ct1_yearly   -54.65    53.42      0.294               (-151.5, 59.96)
     cos5_ct1_yearly   -57.41    56.79      0.316               (-168.5, 53.99)
     sin6_ct1_yearly   -17.45    53.74      0.774               (-120.6, 86.91)
     cos6_ct1_yearly   -194.4    58.25     <2e-16       ***    (-296.1, -72.46)
     sin7_ct1_yearly   -29.26     55.9      0.582               (-135.8, 86.51)
     cos7_ct1_yearly    59.93    60.79      0.352               (-49.52, 175.3)
     sin8_ct1_yearly    11.46    57.74      0.836               (-103.7, 121.9)
     cos8_ct1_yearly    22.87    60.04      0.686               (-84.53, 137.4)
     sin9_ct1_yearly   -20.75    60.91      0.748               (-142.4, 95.79)
     cos9_ct1_yearly    -36.6    56.24      0.514               (-138.7, 79.24)
    sin10_ct1_yearly    98.24    57.25      0.094         .     (-18.03, 208.3)
    cos10_ct1_yearly   -19.53    52.35      0.696               (-108.9, 105.4)
    sin11_ct1_yearly   -15.28    55.27      0.780               (-132.0, 81.42)
    cos11_ct1_yearly    1.071    61.27      0.986               (-124.6, 120.5)
    sin12_ct1_yearly   -46.26    53.78      0.392                (-152.5, 54.0)
    cos12_ct1_yearly    126.5    65.47      0.062         .     (-15.14, 246.2)
    sin13_ct1_yearly   -78.99    55.71      0.142               (-192.3, 25.18)
    cos13_ct1_yearly   -50.87     58.4      0.378               (-167.9, 61.57)
    sin14_ct1_yearly   -50.02    57.75      0.378               (-173.3, 56.79)
    cos14_ct1_yearly    -25.4    56.43      0.680               (-128.3, 79.32)
    sin15_ct1_yearly   -161.4    57.85      0.008        **    (-272.7, -53.63)
    cos15_ct1_yearly   -38.67    56.92      0.504               (-133.4, 86.56)
 sin1_con...07_21_00    37.09    48.08      0.438               (-65.83, 124.0)
 cos1_con...07_21_00   -136.3    51.51      0.012         *    (-246.0, -46.07)
 sin2_con...07_21_00    30.48    44.87      0.500               (-57.98, 113.8)
 cos2_con...07_21_00   -172.3    60.19      0.002        **    (-277.0, -46.76)
 sin3_con...07_21_00   -29.37    53.95      0.534               (-136.2, 81.71)
 cos3_con...07_21_00    -4.81    49.64      0.896               (-97.93, 93.85)
 sin4_con...07_21_00   -18.05    48.79      0.698               (-116.7, 72.59)
 cos4_con...07_21_00     42.7    54.46      0.432               (-78.26, 130.6)
 sin5_con...07_21_00    46.26    59.68      0.412               (-77.74, 166.1)
 cos5_con...07_21_00    69.93    49.19      0.160               (-23.68, 171.8)
              y_lag1    608.2    82.55     <2e-16       ***      (421.6, 749.2)
              y_lag2    87.56    64.72      0.178               (-13.06, 223.5)
              y_lag3    157.6    71.12      0.028         *       (31.9, 303.0)
    y_avglag_7_14_21    333.6    55.86     <2e-16       ***      (220.2, 448.8)
     y_avglag_1_to_7    235.1    44.19     <2e-16       ***      (157.0, 331.9)
    y_avglag_8_to_14    337.7    52.37     <2e-16       ***      (233.5, 447.6)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.7793,   Adjusted R-squared: 0.7498
F-statistic: 22.607 on 70 and 528 DF,   p-value: 1.110e-16
Model AIC: 12945.0,   model BIC: 13260.0

WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.
WARNING: the following columns have estimated coefficients equal to zero, while ridge is not supposed to have zero estimates. This is probably because these columns are degenerate in the design matrix. Make sure these columns do not have constant values.
['cos7_tow_weekly']
WARNING: the following columns are degenerate, do you really want to include them in your model? This may cause some of them to show unrealistic significance. Consider using the `drop_degenerate` transformer.
['Intercept', 'cos7_tow_weekly']

Grouping of variables

Regex expressions are used to group variables in the breakdown plot. Each group is given in one key of this dictionary. The grouping is done using variable names and for each group multiple regex are given. For each group, variables that satisfy EITHER regex are chosen. Note that this grouping assumes that regressor variables start with “regressor_”. Also note that the order of this grouping matters (Python treats the dictionary as ordered in 3.6+). That means the variables chosen using regex in top groups will not be picked up again. If some variables do not satisfy any of the groupings, they will be grouped into “OTHER”. The following breakdown dictionary should work for many use cases. However, the users can customize it as needed.

Two alternative dictionaries are included in constants in the variables DEFAULT_COMPONENTS_REGEX_DICT and DETAILED_SEASONALITY_COMPONENTS_REGEX_DICT.

 grouping_regex_patterns_dict = {
     "regressors": "regressor_.*",  # regressor effects
     "AR": ".*lag",  # autoregression component
     "events": ".*events_.*",  # events and holidays
     "seasonality": ".*quarter.*|.*month.*|.*C\(dow.*|.*C\(dow_hr.*|sin.*|cos.*|.*doq.*|.*dom.*|.*str_dow.*|.*is_weekend.*|.*tow_weekly.*",  # seasonality
     "trend": "ct1|ct2|ct_sqrt|ct3|ct_root3|.*changepoint.*",  # long term trend (includes changepoints)
 }

Creates forecast breakdown

This is generated for observed data plus the prediction data (available in df). Each component is centered around zero and the sum of all components is equal to forecast.

 breakdown_result = trained_estimator.forecast_breakdown(
     grouping_regex_patterns_dict=grouping_regex_patterns_dict,
     center_components=True,
     plt_title="forecast breakdowns")
 forecast_breakdown_df = breakdown_result["breakdown_df_with_index_col"]
 forecast_components_fig = breakdown_result["breakdown_fig"]
 plotly.io.show(forecast_components_fig)

Standardization of the components

Next we provide a more “standardized” view of the breakdown. This is achieved by dividing all components by observed absolute value of the metric. By doing so, intercept should be mapped to 1 and the y-axis changes can be viewed relative to the average magnitude of the series. The sum of all components at each time point will be equal to “forecast / obs_abs_mean”.

 column_grouping_result = breakdown_result["column_grouping_result"]
 component_cols = list(grouping_regex_patterns_dict.keys())
 forecast_breakdown_stdzd_df = forecast_breakdown_df.copy()
 obs_abs_mean = abs(df[value_col]).mean()
 for col in component_cols + ["Intercept", "OTHER"]:
     if col in forecast_breakdown_stdzd_df.columns:
         forecast_breakdown_stdzd_df[col] /= obs_abs_mean
 forecast_breakdown_stdzd_fig = plot_multivariate(
     df=forecast_breakdown_stdzd_df,
     x_col=time_col,
     title="forecast breakdowns divided by mean of abs value of response",
     ylabel="component")
 forecast_breakdown_stdzd_fig.update_layout(yaxis_range=[-1.1, 1.1])
 plotly.io.show(forecast_breakdown_stdzd_fig)

Breaking down the predictions

Next we perform a prediction and generate a breakdown plot for that prediction.

 test_df = data["test_df"].reset_index()
 test_df[value_col] = None
 print(f"\n test_df: \n {test_df}")
 pred_df = trained_estimator.predict(test_df)
 forecast_x_mat = trained_estimator.forecast_x_mat
 # Generate the breakdown plot
 breakdown_result = trained_estimator.forecast_breakdown(
     grouping_regex_patterns_dict=grouping_regex_patterns_dict,
     forecast_x_mat=forecast_x_mat,
     time_values=pred_df[time_col])

 breakdown_fig = breakdown_result["breakdown_fig"]
 plotly.io.show(breakdown_fig)

Out:

 test_df:
    index         ts count  ...  regressor_pn  regressor_bin_pn  regressor_bin_heavy_pn
    0 2019-08-25  None  ...           0.0               0.0                     0.0
    1 2019-08-26  None  ...           0.0               0.0                     0.0
    2 2019-08-27  None  ...           0.0               0.0                     0.0
    3 2019-08-28  None  ...           0.0               0.0                     0.0
    4 2019-08-29  None  ...           0.0               0.0                     0.0
    5 2019-08-30  None  ...           0.0               0.0                     0.0
    6 2019-08-31  None  ...           0.0               0.0                     0.0

[7 rows x 8 columns]

Demonstrating a scenario-based breakdown

We artificially inject a “bad weather” day into test data on the second day of prediction. This is done to observe if the breakdown plot captures a decrease in the collective regressors’ effect. The impact of the change in the regressor values can be clearly seen in the updated breakdown.

 # Altering the test data.
 # We alter the normal weather conditions on the second day to heavy precipitation and low temperature.
 test_df["regressor_bin_pn"] = [0, 1, 0, 0, 0, 0, 0]
 test_df["regressor_bin_heavy_pn"] = [0, 1, 0, 0, 0, 0, 0]
 test_df["regressor_tmin"] = [15, 0, 15, 15,  15, 15, 15]
 print(f"altered test_df: \n {test_df}")

 # Gets predictions and the design matrix used during predictions.
 pred_df = trained_estimator.predict(test_df.reset_index())
 forecast_x_mat = trained_estimator.forecast_x_mat

 # Generates the breakdown plot.
 breakdown_result = trained_estimator.forecast_breakdown(
     grouping_regex_patterns_dict=grouping_regex_patterns_dict,
     forecast_x_mat=forecast_x_mat,
     time_values=pred_df[time_col])
 breakdown_fig = breakdown_result["breakdown_fig"]
 plotly.io.show(breakdown_fig)

Out:

altered test_df:
    index         ts count  ...  regressor_pn  regressor_bin_pn  regressor_bin_heavy_pn
    0 2019-08-25  None  ...           0.0                 0                       0
    1 2019-08-26  None  ...           0.0                 1                       1
    2 2019-08-27  None  ...           0.0                 0                       0
    3 2019-08-28  None  ...           0.0                 0                       0
    4 2019-08-29  None  ...           0.0                 0                       0
    5 2019-08-30  None  ...           0.0                 0                       0
    6 2019-08-31  None  ...           0.0                 0                       0

[7 rows x 8 columns]

Total running time of the script: ( 1 minutes 8.661 seconds)

Gallery generated by Sphinx-Gallery