Simple Forecast

You can create and evaluate a forecast with just a few lines of code.

Provide your timeseries as a pandas dataframe with timestamp and value.

For example, to forecast daily sessions data, your dataframe could look like this:

import pandas as pd
df = pd.DataFrame({
    "date": ["2020-01-08-00", "2020-01-09-00", "2020-01-10-00"],
    "sessions": [10231.0, 12309.0, 12104.0]
})

The time column can be any format recognized by pandas.to_datetime.

In this example, we’ll load a dataset representing log(daily page views) on the Wikipedia page for Peyton Manning. It contains values from 2007-12-10 to 2016-01-20. More dataset info here.

27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
 from collections import defaultdict
 import warnings

 warnings.filterwarnings("ignore")

 import pandas as pd
 import plotly

 from greykite.common.data_loader import DataLoader
 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
 from greykite.framework.templates.autogen.forecast_config import MetadataParam
 from greykite.framework.templates.forecaster import Forecaster
 from greykite.framework.templates.model_templates import ModelTemplateEnum
 from greykite.framework.utils.result_summary import summarize_grid_search_results

 # Loads dataset into pandas DataFrame
 dl = DataLoader()
 df = dl.load_peyton_manning()

 # specify dataset information
 metadata = MetadataParam(
     time_col="ts",  # name of the time column ("date" in example above)
     value_col="y",  # name of the value column ("sessions" in example above)
     freq="D"  # "H" for hourly, "D" for daily, "W" for weekly, etc.
               # Any format accepted by `pandas.date_range`
 )

Create a forecast

You can pick the PROPHET or SILVERKITE forecasting model template. (see Choose a Model).

In this example, we use SILVERKITE. You may also use PROPHET to see how a third-party library is leveraged in the same framework.

63
64
65
66
67
68
69
70
71
72
 forecaster = Forecaster()  # Creates forecasts and stores the result
 result = forecaster.run_forecast_config(  # result is also stored as `forecaster.forecast_result`.
     df=df,
     config=ForecastConfig(
         model_template=ModelTemplateEnum.SILVERKITE.name,
         forecast_horizon=365,  # forecasts 365 steps ahead
         coverage=0.95,         # 95% prediction intervals
         metadata_param=metadata
     )
 )

Out:

Fitting 3 folds for each of 1 candidates, totalling 3 fits

Check results

The output of run_forecast_config is a dictionary that contains the future forecast, historical forecast performance, and the original timeseries.

Timeseries

Let’s plot the original timeseries. run_forecast_config returns this as ts.

(The interactive plot is generated by plotly: click to zoom!)

88
89
90
 ts = result.timeseries
 fig = ts.plot()
 plotly.io.show(fig)

Cross-validation

By default, run_forecast_config provides historical evaluation, so you can see how the forecast performs on past data. This is stored in grid_search (cross-validation splits) and backtest (holdout test set).

Let’s check the cross-validation results. By default, all metrics in ElementwiseEvaluationMetricEnum are computed on each CV train/test split. The configuration of CV evaluation metrics can be found at Evaluation Metric. Below, we show the Mean Absolute Percentage Error (MAPE) across splits (see summarize_grid_search_results to control what to show and for details on the output columns).

108
109
110
111
112
113
114
115
116
117
118
 grid_search = result.grid_search
 cv_results = summarize_grid_search_results(
     grid_search=grid_search,
     decimals=2,
     # The below saves space in the printed output. Remove to show all available metrics and columns.
     cv_report_metrics=None,
     column_order=["rank", "mean_test", "split_test", "mean_train", "split_train", "mean_fit_time", "mean_score_time", "params"])
 # Transposes to save space in the printed output
 cv_results["params"] = cv_results["params"].astype(str)
 cv_results.set_index("params", drop=True, inplace=True)
 cv_results.transpose()
params []
rank_test_MAPE 1
mean_test_MAPE 7.31
split_test_MAPE (5.01, 8.53, 8.39)
mean_train_MAPE 4.2
split_train_MAPE (3.82, 4.25, 4.54)
mean_fit_time 10.09
mean_score_time 1.42


Backtest

Let’s plot the historical forecast on the holdout test set. You can zoom in to see how it performed in any given period.

125
126
127
 backtest = result.backtest
 fig = backtest.plot()
 plotly.io.show(fig)

You can also check historical evaluation metrics (on the historical training/test set).

131
132
133
134
135
136
 backtest_eval = defaultdict(list)
 for metric, value in backtest.train_evaluation.items():
     backtest_eval[metric].append(value)
     backtest_eval[metric].append(backtest.test_evaluation[metric])
 metrics = pd.DataFrame(backtest_eval, index=["train", "test"]).T
 metrics
train test
CORR 0.754324 0.757055
R2 0.556393 -0.694746
MSE 0.317129 0.864868
RMSE 0.563142 0.929983
MAE 0.401152 0.856552
MedAE 0.301442 0.840986
MAPE 4.75639 11.3051
MedAPE 3.76305 11.2075
sMAPE 2.38659 5.31705
Q80 0.20089 0.186995
Q95 0.201047 0.0663538
Q99 0.201089 0.0341829
OutsideTolerance1p 0.849331 0.986226
OutsideTolerance2p 0.710464 0.972452
OutsideTolerance3p 0.583792 0.961433
OutsideTolerance4p 0.479937 0.931129
OutsideTolerance5p 0.387884 0.895317
Outside Tolerance (fraction) None None
R2_null_model_score None None
Prediction Band Width (%) 26.6351 28.2734
Prediction Band Coverage (fraction) 0.95358 0.782369
Coverage: Lower Band 0.565303 0.752066
Coverage: Upper Band 0.388277 0.030303
Coverage Diff: Actual_Coverage - Intended_Coverage 0.00357986 -0.167631


Forecast

The forecast attribute contains the forecasted result. Just as for backtest, you can plot the result or see the evaluation metrics.

Let’s plot the forecast (trained on all data):

146
147
148
 forecast = result.forecast
 fig = forecast.plot()
 plotly.io.show(fig)

The forecasted values are available in df.

152
 forecast.df.head().round(2)
ts actual forecast forecast_lower forecast_upper
0 2007-12-10 9.59 8.71 7.17 10.24
1 2007-12-11 8.52 8.57 7.46 9.68
2 2007-12-12 8.18 8.45 7.49 9.41
3 2007-12-13 8.07 8.38 7.44 9.33
4 2007-12-14 7.89 8.36 7.32 9.40


Model Diagnostics

The component plot shows how your dataset’s trend, seasonality, and event / holiday patterns are handled in the model:

159
160
 fig = forecast.plot_components()
 plotly.io.show(fig)     # fig.show() if you are using "PROPHET" template

Model summary allows inspection of individual model terms. Check parameter estimates and their significance for insights on how the model works and what can be further improved.

166
167
 summary = result.model[-1].summary()  # -1 retrieves the estimator from the pipeline
 print(summary)

Out:

================================ Model Summary =================================

Number of observations: 2964,   Number of features: 122
Method: Ridge regression
Number of nonzero features: 122
Regularization parameter: 148.5

Residuals:
         Min           1Q       Median           3Q          Max
      -2.344      -0.3605     -0.06506       0.2601        3.756

             Pred_col    Estimate  Std. Err Pr(>)_boot sig. code                   95%CI
            Intercept       8.047   0.02101     <2e-16       ***          (8.009, 8.089)
  events_C...New Year     0.01506   0.02485      0.534               (-0.03307, 0.06385)
  events_C...w Year-1     0.00384   0.02308      0.860               (-0.04038, 0.04911)
  events_C...w Year-2     0.01009   0.02407      0.686               (-0.03902, 0.05626)
  events_C...w Year+1     0.01914   0.02461      0.450               (-0.02535, 0.07034)
  events_C...w Year+2     0.02713   0.02497      0.272               (-0.02117, 0.07874)
 events_Christmas Day    -0.02256    0.0105      0.018         *   (-0.04268, -0.002969)
  events_C...as Day-1    -0.00858   0.01095      0.404               (-0.02953, 0.01393)
  events_C...as Day-2    0.002987   0.01439      0.850               (-0.02732, 0.02762)
  events_C...as Day+1    -0.01571   0.01149      0.194              (-0.03972, 0.004073)
  events_C...as Day+2     0.01042  0.009371      0.254              (-0.007131, 0.02948)
  events_E...Ireland]     -0.0159  0.008193      0.056         .  (-0.03347, -0.0002433)
  events_E...eland]-1    -0.01331  0.007317      0.054         .     (-0.0259, 0.001062)
  events_E...eland]-2   -0.005977  0.006795      0.398              (-0.01924, 0.007108)
  events_E...eland]+1   -0.007653  0.006674      0.248              (-0.02143, 0.004774)
  events_E...eland]+2   0.0005814  0.005574      0.928               (-0.01168, 0.01082)
   events_Good Friday   -0.008391  0.006316      0.178              (-0.02216, 0.003205)
 events_Good Friday-1     -0.0037  0.006199      0.538               (-0.01505, 0.00871)
 events_Good Friday-2   1.059e-05  0.006007      0.996               (-0.01038, 0.01203)
 events_Good Friday+1   -0.005977  0.006795      0.398              (-0.01924, 0.007108)
 events_Good Friday+2    -0.01331  0.007317      0.054         .     (-0.0259, 0.001062)
  events_I...ence Day   -0.004932  0.007861      0.534              (-0.02163, 0.009346)
  events_I...ce Day-1   -0.005151  0.007827      0.510              (-0.02123, 0.009892)
  events_I...ce Day-2   -0.009598  0.006906      0.172               (-0.0226, 0.003493)
  events_I...ce Day+1    -0.01448  0.009392      0.122              (-0.03368, 0.003319)
  events_I...ce Day+2     -0.0113   0.01125      0.326              (-0.03354, 0.009941)
     events_Labor Day    -0.02688   0.01152      0.018         *   (-0.04846, -0.003684)
   events_Labor Day-1   -0.006057   0.01001      0.528                (-0.02546, 0.0131)
   events_Labor Day-2    0.002674  0.008414      0.750                (-0.0129, 0.02031)
   events_Labor Day+1    -0.01792  0.009758      0.072         .    (-0.03795, 0.001037)
   events_Labor Day+2    -0.02103    0.0112      0.054         .   (-0.04219, 0.0005373)
  events_Memorial Day    -0.02911  0.009159     <2e-16       ***    (-0.04682, -0.01205)
  events_M...al Day-1    -0.02007  0.007268      0.008        **    (-0.0346, -0.005458)
  events_M...al Day-2   -0.003899  0.004373      0.352               (-0.0134, 0.003958)
  events_M...al Day+1   -0.008451  0.004944      0.080         .    (-0.01887, 0.000241)
  events_M...al Day+2    0.009339  0.005965      0.104              (-0.001037, 0.02221)
 events_New Years Day   -0.009661  0.009936      0.338              (-0.02912, 0.009435)
  events_N...rs Day-1    0.003618  0.009536      0.698               (-0.01209, 0.02557)
  events_N...rs Day-2     0.01685   0.01114      0.126              (-0.003654, 0.03981)
  events_N...rs Day+1     0.01288  0.009819      0.184               (-0.004855, 0.0335)
  events_N...rs Day+2      0.0151   0.01011      0.128               (-0.00496, 0.03436)
         events_Other    0.001542   0.02351      0.952               (-0.04367, 0.04647)
       events_Other-1    0.007689   0.02403      0.776               (-0.03591, 0.05863)
       events_Other-2     0.02989   0.02386      0.218               (-0.01469, 0.08077)
       events_Other+1     0.02127   0.02255      0.326               (-0.02616, 0.06508)
       events_Other+2     0.03077   0.02146      0.160              (-0.008273, 0.07453)
  events_Thanksgiving    -0.01128  0.008161      0.166               (-0.03079, 0.00338)
  events_T...giving-1    -0.01995  0.007956      0.020         *   (-0.03722, -0.007006)
  events_T...giving-2   -0.008147  0.005624      0.158              (-0.01872, 0.001944)
  events_T...giving+1   -0.005862  0.008201      0.492              (-0.02327, 0.008526)
  events_T...giving+2    -0.01105  0.007755      0.150              (-0.02774, 0.002144)
  events_Veterans Day    0.003172  0.007991      0.664               (-0.01134, 0.01992)
  events_V...ns Day-1    0.002042  0.004163      0.634               (-0.0067, 0.009568)
  events_V...ns Day-2    0.002412   0.00536      0.652              (-0.007463, 0.01199)
  events_V...ns Day+1   -0.001305   0.00547      0.812              (-0.01295, 0.008321)
  events_V...ns Day+2   -0.003438  0.007949      0.670                (-0.0194, 0.01141)
        str_dow_2-Tue     0.01552  0.006814      0.020         *       (0.00227, 0.0293)
        str_dow_3-Wed    -0.01166  0.006308      0.072         .   (-0.02345, 0.0008072)
        str_dow_4-Thu    -0.01425  0.006601      0.034         *   (-0.02752, -0.001066)
        str_dow_5-Fri     -0.0201  0.006864      0.008        **    (-0.03281, -0.00645)
        str_dow_6-Sat    -0.03904  0.007343     <2e-16       ***    (-0.05342, -0.02474)
        str_dow_7-Sun     0.01574  0.007359      0.030         *     (0.001659, 0.02981)
                  ct1     0.01989  0.004693     <2e-16       ***      (0.01077, 0.02919)
       is_weekend:ct1   -0.004719  0.002798      0.096         .   (-0.01027, 0.0006222)
    str_dow_2-Tue:ct1   -0.001557  0.007438      0.868                (-0.01625, 0.0124)
    str_dow_3-Wed:ct1    0.004955  0.005303      0.378              (-0.005164, 0.01496)
    str_dow_4-Thu:ct1  -0.0004004  0.004751      0.922             (-0.009081, 0.008707)
    str_dow_5-Fri:ct1    0.002644  0.005218      0.638              (-0.007209, 0.01219)
    str_dow_6-Sat:ct1   -0.001431   0.00635      0.828                (-0.0135, 0.01035)
    str_dow_7-Sun:ct1   -0.003287  0.007423      0.650                (-0.0178, 0.01071)
  ct1:sin1_tow_weekly    0.006258  0.003163      0.050         .    (-7.310e-05, 0.0125)
  ct1:cos1_tow_weekly     0.01315  0.005949      0.024         *     (0.001933, 0.02542)
  ct1:sin2_tow_weekly    0.001296  0.003778      0.760             (-0.005421, 0.008845)
  ct1:cos2_tow_weekly     0.01827   0.00537     <2e-16       ***     (0.007979, 0.02906)
      sin1_tow_weekly     0.02906   0.01488      0.052         .     (0.001758, 0.06131)
      cos1_tow_weekly      0.1155   0.01883     <2e-16       ***        (0.0808, 0.1521)
      sin2_tow_weekly    -0.01667   0.01565      0.308               (-0.04935, 0.01204)
      cos2_tow_weekly     0.07109    0.0184     <2e-16       ***       (0.03492, 0.1071)
      sin3_tow_weekly     -0.0158  0.008862      0.060         .   (-0.03293, 0.0003409)
      cos3_tow_weekly     0.00166   0.01149      0.890               (-0.02012, 0.02332)
      sin4_tow_weekly      0.0158  0.008862      0.060         .   (-0.0003409, 0.03293)
      cos4_tow_weekly     0.00166   0.01149      0.890               (-0.02012, 0.02332)
   sin1_toq_quarterly     0.05557   0.00805     <2e-16       ***      (0.04004, 0.07042)
   cos1_toq_quarterly   -0.006499  0.006733      0.350              (-0.01998, 0.006078)
   sin2_toq_quarterly    -0.02632  0.007981     <2e-16       ***    (-0.04357, -0.01253)
   cos2_toq_quarterly    -0.05193  0.007588     <2e-16       ***    (-0.06608, -0.03778)
   sin3_toq_quarterly     0.01976   0.00807      0.012         *     (0.004168, 0.03644)
   cos3_toq_quarterly    0.005142  0.007568      0.508              (-0.009108, 0.01953)
   sin4_toq_quarterly  -0.0009807   0.01341      0.926               (-0.02457, 0.02697)
   cos4_toq_quarterly     -0.0219   0.01482      0.140              (-0.04941, 0.006772)
   sin5_toq_quarterly    -0.02648   0.01396      0.068         .    (-0.05637, 0.001071)
   cos5_toq_quarterly     0.01722   0.01412      0.216               (-0.01119, 0.04502)
      sin1_ct1_yearly     -0.1001   0.01444     <2e-16       ***     (-0.1262, -0.07094)
      cos1_ct1_yearly      0.6413   0.01254     <2e-16       ***          (0.615, 0.666)
      sin2_ct1_yearly     0.06836   0.01361     <2e-16       ***      (0.04204, 0.09455)
      cos2_ct1_yearly     -0.1141   0.01408     <2e-16       ***     (-0.1399, -0.08718)
      sin3_ct1_yearly      0.2189   0.01428     <2e-16       ***        (0.1897, 0.2446)
      cos3_ct1_yearly    -0.06467   0.01326     <2e-16       ***    (-0.09107, -0.04083)
      sin4_ct1_yearly    0.003636   0.00672      0.598                (-0.0103, 0.01598)
      cos4_ct1_yearly    -0.06083  0.007891     <2e-16       ***     (-0.07668, -0.0454)
      sin5_ct1_yearly    -0.08743   0.01485     <2e-16       ***     (-0.1145, -0.05626)
      cos5_ct1_yearly    -0.01482   0.01313      0.258               (-0.04009, 0.01287)
      sin6_ct1_yearly     -0.1122     0.013     <2e-16       ***      (-0.1369, -0.0858)
      cos6_ct1_yearly    -0.02006   0.01419      0.162              (-0.04569, 0.007774)
      sin7_ct1_yearly    -0.05091   0.01366     <2e-16       ***    (-0.07687, -0.02542)
      cos7_ct1_yearly     0.04102   0.01481     <2e-16       ***      (0.01435, 0.07077)
      sin8_ct1_yearly     0.02753  0.007713      0.002        **      (0.01333, 0.04251)
      cos8_ct1_yearly     0.04938  0.007712     <2e-16       ***       (0.03475, 0.0647)
      sin9_ct1_yearly    0.007008   0.01271      0.594                (-0.01619, 0.0337)
      cos9_ct1_yearly    -0.01374   0.01475      0.372               (-0.04239, 0.01403)
     sin10_ct1_yearly    -0.06552   0.01392     <2e-16       ***    (-0.09354, -0.03706)
     cos10_ct1_yearly    -0.04828   0.01467     <2e-16       ***    (-0.07584, -0.02105)
     sin11_ct1_yearly    -0.02851   0.01354      0.040         *   (-0.05418, 0.0004238)
     cos11_ct1_yearly   -0.006983    0.0129      0.570               (-0.03227, 0.01754)
     sin12_ct1_yearly    0.001446  0.007786      0.850               (-0.01417, 0.01708)
     cos12_ct1_yearly     0.01336  0.008133      0.102              (-0.002231, 0.02944)
     sin13_ct1_yearly    -0.01186   0.01252      0.350               (-0.03691, 0.01235)
     cos13_ct1_yearly     0.05476   0.01389     <2e-16       ***      (0.02855, 0.08261)
     sin14_ct1_yearly     0.02379   0.01345      0.060         .     (0.0002346, 0.0491)
     cos14_ct1_yearly     0.01026   0.01473      0.484               (-0.01886, 0.04189)
     sin15_ct1_yearly     0.02415   0.01432      0.082         .    (-0.002438, 0.05267)
     cos15_ct1_yearly    -0.02237   0.01395      0.112              (-0.04879, 0.004978)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.5231,   Adjusted R-squared: 0.5151
F-statistic: 55.206 on 48 and 2914 DF,   p-value: 1.110e-16
Model AIC: 20626.0,   model BIC: 20926.0

WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.

Apply the model

The trained model is available as a fitted sklearn.pipeline.Pipeline.

Out:

Pipeline(steps=[('input',
                 PandasFeatureUnion(transformer_list=[('date',
                                                       Pipeline(steps=[('select_date',
                                                                        ColumnSelector(column_names=['ts']))])),
                                                      ('response',
                                                       Pipeline(steps=[('select_val',
                                                                        ColumnSelector(column_names=['y'])),
                                                                       ('outlier',
                                                                        ZscoreOutlierTransformer()),
                                                                       ('null',
                                                                        NullTransformer(impute_algorithm='interpolate',
                                                                                        impute_params={'axis': 0,
                                                                                                       'limit_direct...
                                                            'simple_freq': <SimpleTimeFrequencyEnum.DAY: Frequency(default_horizon=30, seconds_per_observation=86400, valid_seas={'QUARTERLY_SEASONALITY', 'YEARLY_SEASONALITY', 'WEEKLY_SEASONALITY', 'MONTHLY_SEASONALITY'})>,
                                                            'start_year': 2007},
                                           uncertainty_dict={'params': {'conditional_cols': ['dow_hr'],
                                                                        'quantile_estimation_method': 'normal_fit',
                                                                        'quantiles': [0.025000000000000022,
                                                                                      0.975],
                                                                        'sample_size_thresh': 5,
                                                                        'small_sample_size_method': 'std_quantiles',
                                                                        'small_sample_size_quantile': 0.98},
                                                             'uncertainty_method': 'simple_conditional_residuals'}))])

You can take this model and forecast on any date range by passing a new dataframe to predict on. The make_future_dataframe convenience function can be used to create this dataframe. Here, we predict the next 4 periods after the model’s train end date.

Note

The dataframe passed to .predict() must have the same columns as the df passed to run_forecast_config above, including any regressors needed for prediction. The value_col column should be included with values set to np.nan.

188
189
190
191
 future_df = result.timeseries.make_future_dataframe(
     periods=4,
     include_history=False)
 future_df
ts y
2016-01-21 2016-01-21 NaN
2016-01-22 2016-01-22 NaN
2016-01-23 2016-01-23 NaN
2016-01-24 2016-01-24 NaN


Call .predict() to compute predictions

195
 model.predict(future_df)
ts forecast forecast_lower forecast_upper y_quantile_summary err_std
0 2016-01-21 8.964736 8.016267 9.913206 (8.016266568612478, 9.913205705277191) 0.483922
1 2016-01-22 8.964897 7.924127 10.005667 (7.9241270009730345, 10.005667160080467) 0.531015
2 2016-01-23 8.603602 7.559956 9.647248 (7.559956355690183, 9.647248225095886) 0.532482
3 2016-01-24 9.083030 7.789760 10.376301 (7.789759626997768, 10.376300700410866) 0.659844


What’s next?

If you’re satisfied with the forecast performance, you’re done!

For a complete example of how to tune this forecast, see Tune your first forecast model.

Besides the component plot, we offer additional tools to help you improve your forecast and understand the result.

See the following guides:

For example, for this dataset, you could add changepoints to handle the change in trend around 2014 and avoid the overprediction issue seen in the backtest plot.

Or you might want to try a different model template. Model templates bundle an algorithm with recommended hyperparameters. The template that works best for you depends on the data characteristics and forecast requirements (e.g. short / long forecast horizon). We recommend trying a few and tuning the ones that look promising. All model templates are available through the same forecasting and tuning interface shown here.

For details about the model templates and how to set model components, see the following guides:

Total running time of the script: ( 1 minutes 31.239 seconds)

Gallery generated by Sphinx-Gallery