Note

Click here to download the full example code

Simple Forecast¶

You can create and evaluate a forecast with just a few lines of code.

Provide your timeseries as a pandas dataframe with timestamp and value.

For example, to forecast daily sessions data, your dataframe could look like this:

import pandas as pd
df = pd.DataFrame({
    "date": ["2020-01-08-00", "2020-01-09-00", "2020-01-10-00"],
    "sessions": [10231.0, 12309.0, 12104.0]
})

The time column can be any format recognized by pandas.to_datetime.

In this example, we’ll load a dataset representing log(daily page views) on the Wikipedia page for Peyton Manning. It contains values from 2007-12-10 to 2016-01-20. More dataset info here.

 from collections import defaultdict
 import warnings

 warnings.filterwarnings("ignore")

 import pandas as pd
 import plotly

 from greykite.common.data_loader import DataLoader
 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
 from greykite.framework.templates.autogen.forecast_config import MetadataParam
 from greykite.framework.templates.forecaster import Forecaster
 from greykite.framework.templates.model_templates import ModelTemplateEnum
 from greykite.framework.utils.result_summary import summarize_grid_search_results

 # Loads dataset into pandas DataFrame
 dl = DataLoader()
 df = dl.load_peyton_manning()

 # specify dataset information
 metadata = MetadataParam(
     time_col="ts",  # name of the time column ("date" in example above)
     value_col="y",  # name of the value column ("sessions" in example above)
     freq="D"  # "H" for hourly, "D" for daily, "W" for weekly, etc.
               # Any format accepted by `pandas.date_range`
 )

Create a forecast¶

You can pick the PROPHET or SILVERKITE forecasting model template. (see Choose a Model).

In this example, we use SILVERKITE. You may also use PROPHET to see how a third-party library is leveraged in the same framework.

 forecaster = Forecaster()  # Creates forecasts and stores the result
 result = forecaster.run_forecast_config(  # result is also stored as `forecaster.forecast_result`.
     df=df,
     config=ForecastConfig(
         model_template=ModelTemplateEnum.SILVERKITE.name,
         forecast_horizon=365,  # forecasts 365 steps ahead
         coverage=0.95,         # 95% prediction intervals
         metadata_param=metadata
     )
 )

Out:

Fitting 3 folds for each of 1 candidates, totalling 3 fits

Check results¶

The output of run_forecast_config is a dictionary that contains the future forecast, historical forecast performance, and the original timeseries.

Timeseries¶

Let’s plot the original timeseries. run_forecast_config returns this as ts.

(The interactive plot is generated by plotly: click to zoom!)

 ts = result.timeseries
 fig = ts.plot()
 plotly.io.show(fig)

Cross-validation¶

By default, run_forecast_config provides historical evaluation, so you can see how the forecast performs on past data. This is stored in grid_search (cross-validation splits) and backtest (holdout test set).

Let’s check the cross-validation results. By default, all metrics in ElementwiseEvaluationMetricEnum are computed on each CV train/test split. The configuration of CV evaluation metrics can be found at Evaluation Metric. Below, we show the Mean Absolute Percentage Error (MAPE) across splits (see summarize_grid_search_results to control what to show and for details on the output columns).

 grid_search = result.grid_search
 cv_results = summarize_grid_search_results(
     grid_search=grid_search,
     decimals=2,
     # The below saves space in the printed output. Remove to show all available metrics and columns.
     cv_report_metrics=None,
     column_order=["rank", "mean_test", "split_test", "mean_train", "split_train", "mean_fit_time", "mean_score_time", "params"])
 # Transposes to save space in the printed output
 cv_results["params"] = cv_results["params"].astype(str)
 cv_results.set_index("params", drop=True, inplace=True)
 cv_results.transpose()

params	[]
rank_test_MAPE	1
mean_test_MAPE	7.31
split_test_MAPE	(5.01, 8.53, 8.39)
mean_train_MAPE	4.2
split_train_MAPE	(3.82, 4.25, 4.54)
mean_fit_time	10.09
mean_score_time	1.42

Backtest¶

Let’s plot the historical forecast on the holdout test set. You can zoom in to see how it performed in any given period.

 backtest = result.backtest
 fig = backtest.plot()
 plotly.io.show(fig)

You can also check historical evaluation metrics (on the historical training/test set).

 backtest_eval = defaultdict(list)
 for metric, value in backtest.train_evaluation.items():
     backtest_eval[metric].append(value)
     backtest_eval[metric].append(backtest.test_evaluation[metric])
 metrics = pd.DataFrame(backtest_eval, index=["train", "test"]).T
 metrics

	train	test
CORR	0.754324	0.757055
R2	0.556393	-0.694746
MSE	0.317129	0.864868
RMSE	0.563142	0.929983
MAE	0.401152	0.856552
MedAE	0.301442	0.840986
MAPE	4.75639	11.3051
MedAPE	3.76305	11.2075
sMAPE	2.38659	5.31705
Q80	0.20089	0.186995
Q95	0.201047	0.0663538
Q99	0.201089	0.0341829
OutsideTolerance1p	0.849331	0.986226
OutsideTolerance2p	0.710464	0.972452
OutsideTolerance3p	0.583792	0.961433
OutsideTolerance4p	0.479937	0.931129
OutsideTolerance5p	0.387884	0.895317
Outside Tolerance (fraction)	None	None
R2_null_model_score	None	None
Prediction Band Width (%)	26.6351	28.2734
Prediction Band Coverage (fraction)	0.95358	0.782369
Coverage: Lower Band	0.565303	0.752066
Coverage: Upper Band	0.388277	0.030303
Coverage Diff: Actual_Coverage - Intended_Coverage	0.00357986	-0.167631

Forecast¶

The forecast attribute contains the forecasted result. Just as for backtest, you can plot the result or see the evaluation metrics.

Let’s plot the forecast (trained on all data):

 forecast = result.forecast
 fig = forecast.plot()
 plotly.io.show(fig)

The forecasted values are available in df.

 forecast.df.head().round(2)

	ts	actual	forecast	forecast_lower	forecast_upper
0	2007-12-10	9.59	8.71	7.17	10.24
1	2007-12-11	8.52	8.57	7.46	9.68
2	2007-12-12	8.18	8.45	7.49	9.41
3	2007-12-13	8.07	8.38	7.44	9.33
4	2007-12-14	7.89	8.36	7.32	9.40

Model Diagnostics¶

The component plot shows how your dataset’s trend, seasonality, and event / holiday patterns are handled in the model:

 fig = forecast.plot_components()
 plotly.io.show(fig)     # fig.show() if you are using "PROPHET" template

Model summary allows inspection of individual model terms. Check parameter estimates and their significance for insights on how the model works and what can be further improved.

 summary = result.model[-1].summary()  # -1 retrieves the estimator from the pipeline
 print(summary)

Out:

================================ Model Summary =================================

Number of observations: 2964,   Number of features: 122
Method: Ridge regression
Number of nonzero features: 122
Regularization parameter: 148.5

Residuals:
         Min           1Q       Median           3Q          Max
      -2.344      -0.3605     -0.06506       0.2601        3.756

             Pred_col    Estimate  Std. Err Pr(>)_boot sig. code                   95%CI
            Intercept       8.047   0.02101     <2e-16       ***          (8.009, 8.089)
  events_C...New Year     0.01506   0.02485      0.534               (-0.03307, 0.06385)
  events_C...w Year-1     0.00384   0.02308      0.860               (-0.04038, 0.04911)
  events_C...w Year-2     0.01009   0.02407      0.686               (-0.03902, 0.05626)
  events_C...w Year+1     0.01914   0.02461      0.450               (-0.02535, 0.07034)
  events_C...w Year+2     0.02713   0.02497      0.272               (-0.02117, 0.07874)
 events_Christmas Day    -0.02256    0.0105      0.018         *   (-0.04268, -0.002969)
  events_C...as Day-1    -0.00858   0.01095      0.404               (-0.02953, 0.01393)
  events_C...as Day-2    0.002987   0.01439      0.850               (-0.02732, 0.02762)
  events_C...as Day+1    -0.01571   0.01149      0.194              (-0.03972, 0.004073)
  events_C...as Day+2     0.01042  0.009371      0.254              (-0.007131, 0.02948)
  events_E...Ireland]     -0.0159  0.008193      0.056         .  (-0.03347, -0.0002433)
  events_E...eland]-1    -0.01331  0.007317      0.054         .     (-0.0259, 0.001062)
  events_E...eland]-2   -0.005977  0.006795      0.398              (-0.01924, 0.007108)
  events_E...eland]+1   -0.007653  0.006674      0.248              (-0.02143, 0.004774)
  events_E...eland]+2   0.0005814  0.005574      0.928               (-0.01168, 0.01082)
   events_Good Friday   -0.008391  0.006316      0.178              (-0.02216, 0.003205)
 events_Good Friday-1     -0.0037  0.006199      0.538               (-0.01505, 0.00871)
 events_Good Friday-2   1.059e-05  0.006007      0.996               (-0.01038, 0.01203)
 events_Good Friday+1   -0.005977  0.006795      0.398              (-0.01924, 0.007108)
 events_Good Friday+2    -0.01331  0.007317      0.054         .     (-0.0259, 0.001062)
  events_I...ence Day   -0.004932  0.007861      0.534              (-0.02163, 0.009346)
  events_I...ce Day-1   -0.005151  0.007827      0.510              (-0.02123, 0.009892)
  events_I...ce Day-2   -0.009598  0.006906      0.172               (-0.0226, 0.003493)
  events_I...ce Day+1    -0.01448  0.009392      0.122              (-0.03368, 0.003319)
  events_I...ce Day+2     -0.0113   0.01125      0.326              (-0.03354, 0.009941)
     events_Labor Day    -0.02688   0.01152      0.018         *   (-0.04846, -0.003684)
   events_Labor Day-1   -0.006057   0.01001      0.528                (-0.02546, 0.0131)
   events_Labor Day-2    0.002674  0.008414      0.750                (-0.0129, 0.02031)
   events_Labor Day+1    -0.01792  0.009758      0.072         .    (-0.03795, 0.001037)
   events_Labor Day+2    -0.02103    0.0112      0.054         .   (-0.04219, 0.0005373)
  events_Memorial Day    -0.02911  0.009159     <2e-16       ***    (-0.04682, -0.01205)
  events_M...al Day-1    -0.02007  0.007268      0.008        **    (-0.0346, -0.005458)
  events_M...al Day-2   -0.003899  0.004373      0.352               (-0.0134, 0.003958)
  events_M...al Day+1   -0.008451  0.004944      0.080         .    (-0.01887, 0.000241)
  events_M...al Day+2    0.009339  0.005965      0.104              (-0.001037, 0.02221)
 events_New Years Day   -0.009661  0.009936      0.338              (-0.02912, 0.009435)
  events_N...rs Day-1    0.003618  0.009536      0.698               (-0.01209, 0.02557)
  events_N...rs Day-2     0.01685   0.01114      0.126              (-0.003654, 0.03981)
  events_N...rs Day+1     0.01288  0.009819      0.184               (-0.004855, 0.0335)
  events_N...rs Day+2      0.0151   0.01011      0.128               (-0.00496, 0.03436)
         events_Other    0.001542   0.02351      0.952               (-0.04367, 0.04647)
       events_Other-1    0.007689   0.02403      0.776               (-0.03591, 0.05863)
       events_Other-2     0.02989   0.02386      0.218               (-0.01469, 0.08077)
       events_Other+1     0.02127   0.02255      0.326               (-0.02616, 0.06508)
       events_Other+2     0.03077   0.02146      0.160              (-0.008273, 0.07453)
  events_Thanksgiving    -0.01128  0.008161      0.166               (-0.03079, 0.00338)
  events_T...giving-1    -0.01995  0.007956      0.020         *   (-0.03722, -0.007006)
  events_T...giving-2   -0.008147  0.005624      0.158              (-0.01872, 0.001944)
  events_T...giving+1   -0.005862  0.008201      0.492              (-0.02327, 0.008526)
  events_T...giving+2    -0.01105  0.007755      0.150              (-0.02774, 0.002144)
  events_Veterans Day    0.003172  0.007991      0.664               (-0.01134, 0.01992)
  events_V...ns Day-1    0.002042  0.004163      0.634               (-0.0067, 0.009568)
  events_V...ns Day-2    0.002412   0.00536      0.652              (-0.007463, 0.01199)
  events_V...ns Day+1   -0.001305   0.00547      0.812              (-0.01295, 0.008321)
  events_V...ns Day+2   -0.003438  0.007949      0.670                (-0.0194, 0.01141)
        str_dow_2-Tue     0.01552  0.006814      0.020         *       (0.00227, 0.0293)
        str_dow_3-Wed    -0.01166  0.006308      0.072         .   (-0.02345, 0.0008072)
        str_dow_4-Thu    -0.01425  0.006601      0.034         *   (-0.02752, -0.001066)
        str_dow_5-Fri     -0.0201  0.006864      0.008        **    (-0.03281, -0.00645)
        str_dow_6-Sat    -0.03904  0.007343     <2e-16       ***    (-0.05342, -0.02474)
        str_dow_7-Sun     0.01574  0.007359      0.030         *     (0.001659, 0.02981)
                  ct1     0.01989  0.004693     <2e-16       ***      (0.01077, 0.02919)
       is_weekend:ct1   -0.004719  0.002798      0.096         .   (-0.01027, 0.0006222)
    str_dow_2-Tue:ct1   -0.001557  0.007438      0.868                (-0.01625, 0.0124)
    str_dow_3-Wed:ct1    0.004955  0.005303      0.378              (-0.005164, 0.01496)
    str_dow_4-Thu:ct1  -0.0004004  0.004751      0.922             (-0.009081, 0.008707)
    str_dow_5-Fri:ct1    0.002644  0.005218      0.638              (-0.007209, 0.01219)
    str_dow_6-Sat:ct1   -0.001431   0.00635      0.828                (-0.0135, 0.01035)
    str_dow_7-Sun:ct1   -0.003287  0.007423      0.650                (-0.0178, 0.01071)
  ct1:sin1_tow_weekly    0.006258  0.003163      0.050         .    (-7.310e-05, 0.0125)
  ct1:cos1_tow_weekly     0.01315  0.005949      0.024         *     (0.001933, 0.02542)
  ct1:sin2_tow_weekly    0.001296  0.003778      0.760             (-0.005421, 0.008845)
  ct1:cos2_tow_weekly     0.01827   0.00537     <2e-16       ***     (0.007979, 0.02906)
      sin1_tow_weekly     0.02906   0.01488      0.052         .     (0.001758, 0.06131)
      cos1_tow_weekly      0.1155   0.01883     <2e-16       ***        (0.0808, 0.1521)
      sin2_tow_weekly    -0.01667   0.01565      0.308               (-0.04935, 0.01204)
      cos2_tow_weekly     0.07109    0.0184     <2e-16       ***       (0.03492, 0.1071)
      sin3_tow_weekly     -0.0158  0.008862      0.060         .   (-0.03293, 0.0003409)
      cos3_tow_weekly     0.00166   0.01149      0.890               (-0.02012, 0.02332)
      sin4_tow_weekly      0.0158  0.008862      0.060         .   (-0.0003409, 0.03293)
      cos4_tow_weekly     0.00166   0.01149      0.890               (-0.02012, 0.02332)
   sin1_toq_quarterly     0.05557   0.00805     <2e-16       ***      (0.04004, 0.07042)
   cos1_toq_quarterly   -0.006499  0.006733      0.350              (-0.01998, 0.006078)
   sin2_toq_quarterly    -0.02632  0.007981     <2e-16       ***    (-0.04357, -0.01253)
   cos2_toq_quarterly    -0.05193  0.007588     <2e-16       ***    (-0.06608, -0.03778)
   sin3_toq_quarterly     0.01976   0.00807      0.012         *     (0.004168, 0.03644)
   cos3_toq_quarterly    0.005142  0.007568      0.508              (-0.009108, 0.01953)
   sin4_toq_quarterly  -0.0009807   0.01341      0.926               (-0.02457, 0.02697)
   cos4_toq_quarterly     -0.0219   0.01482      0.140              (-0.04941, 0.006772)
   sin5_toq_quarterly    -0.02648   0.01396      0.068         .    (-0.05637, 0.001071)
   cos5_toq_quarterly     0.01722   0.01412      0.216               (-0.01119, 0.04502)
      sin1_ct1_yearly     -0.1001   0.01444     <2e-16       ***     (-0.1262, -0.07094)
      cos1_ct1_yearly      0.6413   0.01254     <2e-16       ***          (0.615, 0.666)
      sin2_ct1_yearly     0.06836   0.01361     <2e-16       ***      (0.04204, 0.09455)
      cos2_ct1_yearly     -0.1141   0.01408     <2e-16       ***     (-0.1399, -0.08718)
      sin3_ct1_yearly      0.2189   0.01428     <2e-16       ***        (0.1897, 0.2446)
      cos3_ct1_yearly    -0.06467   0.01326     <2e-16       ***    (-0.09107, -0.04083)
      sin4_ct1_yearly    0.003636   0.00672      0.598                (-0.0103, 0.01598)
      cos4_ct1_yearly    -0.06083  0.007891     <2e-16       ***     (-0.07668, -0.0454)
      sin5_ct1_yearly    -0.08743   0.01485     <2e-16       ***     (-0.1145, -0.05626)
      cos5_ct1_yearly    -0.01482   0.01313      0.258               (-0.04009, 0.01287)
      sin6_ct1_yearly     -0.1122     0.013     <2e-16       ***      (-0.1369, -0.0858)
      cos6_ct1_yearly    -0.02006   0.01419      0.162              (-0.04569, 0.007774)
      sin7_ct1_yearly    -0.05091   0.01366     <2e-16       ***    (-0.07687, -0.02542)
      cos7_ct1_yearly     0.04102   0.01481     <2e-16       ***      (0.01435, 0.07077)
      sin8_ct1_yearly     0.02753  0.007713      0.002        **      (0.01333, 0.04251)
      cos8_ct1_yearly     0.04938  0.007712     <2e-16       ***       (0.03475, 0.0647)
      sin9_ct1_yearly    0.007008   0.01271      0.594                (-0.01619, 0.0337)
      cos9_ct1_yearly    -0.01374   0.01475      0.372               (-0.04239, 0.01403)
     sin10_ct1_yearly    -0.06552   0.01392     <2e-16       ***    (-0.09354, -0.03706)
     cos10_ct1_yearly    -0.04828   0.01467     <2e-16       ***    (-0.07584, -0.02105)
     sin11_ct1_yearly    -0.02851   0.01354      0.040         *   (-0.05418, 0.0004238)
     cos11_ct1_yearly   -0.006983    0.0129      0.570               (-0.03227, 0.01754)
     sin12_ct1_yearly    0.001446  0.007786      0.850               (-0.01417, 0.01708)
     cos12_ct1_yearly     0.01336  0.008133      0.102              (-0.002231, 0.02944)
     sin13_ct1_yearly    -0.01186   0.01252      0.350               (-0.03691, 0.01235)
     cos13_ct1_yearly     0.05476   0.01389     <2e-16       ***      (0.02855, 0.08261)
     sin14_ct1_yearly     0.02379   0.01345      0.060         .     (0.0002346, 0.0491)
     cos14_ct1_yearly     0.01026   0.01473      0.484               (-0.01886, 0.04189)
     sin15_ct1_yearly     0.02415   0.01432      0.082         .    (-0.002438, 0.05267)
     cos15_ct1_yearly    -0.02237   0.01395      0.112              (-0.04879, 0.004978)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.5231,   Adjusted R-squared: 0.5151
F-statistic: 55.206 on 48 and 2914 DF,   p-value: 1.110e-16
Model AIC: 20626.0,   model BIC: 20926.0

WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.

Apply the model¶

The trained model is available as a fitted sklearn.pipeline.Pipeline.

 model = result.model
 model

Out:

Pipeline(steps=[('input',
                 PandasFeatureUnion(transformer_list=[('date',
                                                       Pipeline(steps=[('select_date',
                                                                        ColumnSelector(column_names=['ts']))])),
                                                      ('response',
                                                       Pipeline(steps=[('select_val',
                                                                        ColumnSelector(column_names=['y'])),
                                                                       ('outlier',
                                                                        ZscoreOutlierTransformer()),
                                                                       ('null',
                                                                        NullTransformer(impute_algorithm='interpolate',
                                                                                        impute_params={'axis': 0,
                                                                                                       'limit_direct...
                                                            'simple_freq': <SimpleTimeFrequencyEnum.DAY: Frequency(default_horizon=30, seconds_per_observation=86400, valid_seas={'QUARTERLY_SEASONALITY', 'YEARLY_SEASONALITY', 'WEEKLY_SEASONALITY', 'MONTHLY_SEASONALITY'})>,
                                                            'start_year': 2007},
                                           uncertainty_dict={'params': {'conditional_cols': ['dow_hr'],
                                                                        'quantile_estimation_method': 'normal_fit',
                                                                        'quantiles': [0.025000000000000022,
                                                                                      0.975],
                                                                        'sample_size_thresh': 5,
                                                                        'small_sample_size_method': 'std_quantiles',
                                                                        'small_sample_size_quantile': 0.98},
                                                             'uncertainty_method': 'simple_conditional_residuals'}))])

You can take this model and forecast on any date range by passing a new dataframe to predict on. The make_future_dataframe convenience function can be used to create this dataframe. Here, we predict the next 4 periods after the model’s train end date.

Note

The dataframe passed to .predict() must have the same columns as the df passed to run_forecast_config above, including any regressors needed for prediction. The value_col column should be included with values set to np.nan.

 future_df = result.timeseries.make_future_dataframe(
     periods=4,
     include_history=False)
 future_df

	ts	y
2016-01-21	2016-01-21	NaN
2016-01-22	2016-01-22	NaN
2016-01-23	2016-01-23	NaN
2016-01-24	2016-01-24	NaN

Call .predict() to compute predictions

 model.predict(future_df)

	ts	forecast	forecast_lower	forecast_upper	y_quantile_summary	err_std
0	2016-01-21	8.964736	8.016267	9.913206	(8.016266568612478, 9.913205705277191)	0.483922
1	2016-01-22	8.964897	7.924127	10.005667	(7.9241270009730345, 10.005667160080467)	0.531015
2	2016-01-23	8.603602	7.559956	9.647248	(7.559956355690183, 9.647248225095886)	0.532482
3	2016-01-24	9.083030	7.789760	10.376301	(7.789759626997768, 10.376300700410866)	0.659844

What’s next?¶

If you’re satisfied with the forecast performance, you’re done!

For a complete example of how to tune this forecast, see Tune your first forecast model.

Besides the component plot, we offer additional tools to help you improve your forecast and understand the result.

See the following guides:

For example, for this dataset, you could add changepoints to handle the change in trend around 2014 and avoid the overprediction issue seen in the backtest plot.

Or you might want to try a different model template. Model templates bundle an algorithm with recommended hyperparameters. The template that works best for you depends on the data characteristics and forecast requirements (e.g. short / long forecast horizon). We recommend trying a few and tuning the ones that look promising. All model templates are available through the same forecasting and tuning interface shown here.

For details about the model templates and how to set model components, see the following guides:

Total running time of the script: ( 1 minutes 31.239 seconds)

Gallery generated by Sphinx-Gallery