Note
Click here to download the full example code
Simple Forecast¶
You can create and evaluate a forecast with just a few lines of code.
Provide your timeseries as a pandas dataframe with timestamp and value.
For example, to forecast daily sessions data, your dataframe could look like this:
import pandas as pd
df = pd.DataFrame({
"date": ["2020-01-08-00", "2020-01-09-00", "2020-01-10-00"],
"sessions": [10231.0, 12309.0, 12104.0]
})
The time column can be any format recognized by pandas.to_datetime
.
In this example, we’ll load a dataset representing log(daily page views)
on the Wikipedia page for Peyton Manning.
It contains values from 2007-12-10 to 2016-01-20. More dataset info
here.
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | from collections import defaultdict
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import plotly
from greykite.common.data_loader import DataLoader
from greykite.framework.templates.autogen.forecast_config import ForecastConfig
from greykite.framework.templates.autogen.forecast_config import MetadataParam
from greykite.framework.templates.forecaster import Forecaster
from greykite.framework.templates.model_templates import ModelTemplateEnum
from greykite.framework.utils.result_summary import summarize_grid_search_results
# Loads dataset into pandas DataFrame
dl = DataLoader()
df = dl.load_peyton_manning()
# specify dataset information
metadata = MetadataParam(
time_col="ts", # name of the time column ("date" in example above)
value_col="y", # name of the value column ("sessions" in example above)
freq="D" # "H" for hourly, "D" for daily, "W" for weekly, etc.
# Any format accepted by `pandas.date_range`
)
|
Create a forecast¶
You can pick the PROPHET
or SILVERKITE
forecasting model template. (see Choose a Model).
In this example, we use SILVERKITE
.
You may also use PROPHET
to see how a third-party library
is leveraged in the same framework.
63 64 65 66 67 68 69 70 71 72 | forecaster = Forecaster() # Creates forecasts and stores the result
result = forecaster.run_forecast_config( # result is also stored as `forecaster.forecast_result`.
df=df,
config=ForecastConfig(
model_template=ModelTemplateEnum.SILVERKITE.name,
forecast_horizon=365, # forecasts 365 steps ahead
coverage=0.95, # 95% prediction intervals
metadata_param=metadata
)
)
|
Out:
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Check results¶
The output of run_forecast_config
is a dictionary that contains
the future forecast, historical forecast performance, and
the original timeseries.
Timeseries¶
Let’s plot the original timeseries.
run_forecast_config
returns this as ts
.
(The interactive plot is generated by plotly
: click to zoom!)
88 89 90 | ts = result.timeseries
fig = ts.plot()
plotly.io.show(fig)
|
Cross-validation¶
By default, run_forecast_config
provides historical evaluation,
so you can see how the forecast performs on past data.
This is stored in grid_search
(cross-validation splits)
and backtest
(holdout test set).
Let’s check the cross-validation results.
By default, all metrics in ElementwiseEvaluationMetricEnum
are computed on each CV train/test split.
The configuration of CV evaluation metrics can be found at
Evaluation Metric.
Below, we show the Mean Absolute Percentage Error (MAPE)
across splits (see summarize_grid_search_results
to control what to show and for details on the output columns).
108 109 110 111 112 113 114 115 116 117 118 | grid_search = result.grid_search
cv_results = summarize_grid_search_results(
grid_search=grid_search,
decimals=2,
# The below saves space in the printed output. Remove to show all available metrics and columns.
cv_report_metrics=None,
column_order=["rank", "mean_test", "split_test", "mean_train", "split_train", "mean_fit_time", "mean_score_time", "params"])
# Transposes to save space in the printed output
cv_results["params"] = cv_results["params"].astype(str)
cv_results.set_index("params", drop=True, inplace=True)
cv_results.transpose()
|
params | [] |
---|---|
rank_test_MAPE | 1 |
mean_test_MAPE | 7.31 |
split_test_MAPE | (5.02, 8.53, 8.39) |
mean_train_MAPE | 4.2 |
split_train_MAPE | (3.82, 4.25, 4.54) |
mean_fit_time | 6.32 |
mean_score_time | 0.76 |
Backtest¶
Let’s plot the historical forecast on the holdout test set. You can zoom in to see how it performed in any given period.
125 126 127 | backtest = result.backtest
fig = backtest.plot()
plotly.io.show(fig)
|
You can also check historical evaluation metrics (on the historical training/test set).
131 132 133 134 135 136 | backtest_eval = defaultdict(list)
for metric, value in backtest.train_evaluation.items():
backtest_eval[metric].append(value)
backtest_eval[metric].append(backtest.test_evaluation[metric])
metrics = pd.DataFrame(backtest_eval, index=["train", "test"]).T
metrics
|
train | test | |
---|---|---|
CORR | 0.754233 | 0.756897 |
R2 | 0.556228 | -0.695154 |
MSE | 0.317248 | 0.865076 |
RMSE | 0.563247 | 0.930095 |
MAE | 0.401251 | 0.856716 |
MedAE | 0.300722 | 0.840022 |
MAPE | 4.75745 | 11.3071 |
MedAPE | 3.775 | 11.2497 |
sMAPE | 2.38715 | 5.318 |
Q80 | 0.200944 | 0.187063 |
Q95 | 0.201104 | 0.0664152 |
Q99 | 0.201146 | 0.0342425 |
OutsideTolerance1p | 0.849725 | 0.986226 |
OutsideTolerance2p | 0.711251 | 0.972452 |
OutsideTolerance3p | 0.583006 | 0.961433 |
OutsideTolerance4p | 0.479937 | 0.933884 |
OutsideTolerance5p | 0.384736 | 0.892562 |
Outside Tolerance (fraction) | None | None |
R2_null_model_score | None | None |
Prediction Band Width (%) | 26.6374 | 28.276 |
Prediction Band Coverage (fraction) | 0.95358 | 0.785124 |
Coverage: Lower Band | 0.56491 | 0.754821 |
Coverage: Upper Band | 0.38867 | 0.030303 |
Coverage Diff: Actual_Coverage - Intended_Coverage | 0.00357986 | -0.164876 |
Forecast¶
The forecast
attribute contains the forecasted result.
Just as for backtest
, you can plot the result or
see the evaluation metrics.
Let’s plot the forecast (trained on all data):
146 147 148 | forecast = result.forecast
fig = forecast.plot()
plotly.io.show(fig)
|
The forecasted values are available in df
.
152 | forecast.df.head().round(2)
|
ts | actual | forecast | forecast_lower | forecast_upper | |
---|---|---|---|---|---|
0 | 2007-12-10 | 9.59 | 8.71 | 7.17 | 10.24 |
1 | 2007-12-11 | 8.52 | 8.57 | 7.46 | 9.68 |
2 | 2007-12-12 | 8.18 | 8.45 | 7.49 | 9.41 |
3 | 2007-12-13 | 8.07 | 8.38 | 7.44 | 9.33 |
4 | 2007-12-14 | 7.89 | 8.36 | 7.32 | 9.40 |
Model Diagnostics¶
The component plot shows how your dataset’s trend, seasonality, and event / holiday patterns are handled in the model:
159 160 | fig = forecast.plot_components()
plotly.io.show(fig) # fig.show() if you are using "PROPHET" template
|
Model summary allows inspection of individual model terms. Check parameter estimates and their significance for insights on how the model works and what can be further improved.
166 167 | summary = result.model[-1].summary() # -1 retrieves the estimator from the pipeline
print(summary)
|
Out:
================================ Model Summary =================================
Number of observations: 2964, Number of features: 122
Method: Ridge regression
Number of nonzero features: 122
Regularization parameter: 148.5
Residuals:
Min 1Q Median 3Q Max
-2.342 -0.3604 -0.06554 0.26 3.759
Pred_col Estimate Std. Err Pr(>)_boot sig. code 95%CI
Intercept 8.049 0.02171 <2e-16 *** (8.006, 8.09)
events_C...New Year 0.007008 0.02008 0.726 (-0.03046, 0.04992)
events_C...w Year-1 0.002321 0.01514 0.862 (-0.02628, 0.03172)
events_C...w Year-2 0.002074 0.01611 0.894 (-0.03322, 0.03272)
events_C...w Year+1 -0.002421 0.01505 0.868 (-0.03079, 0.025)
events_C...w Year+2 0.01539 0.01793 0.398 (-0.01892, 0.05069)
events_Christmas Day -0.02248 0.01086 0.034 * (-0.04476, -0.003264)
events_C...as Day-1 -0.008372 0.01211 0.486 (-0.03351, 0.01449)
events_C...as Day-2 0.003023 0.01392 0.792 (-0.02751, 0.02876)
events_C...as Day+1 -0.01577 0.01087 0.150 (-0.03813, 0.004481)
events_C...as Day+2 0.01047 0.009134 0.268 (-0.007371, 0.02844)
events_E...Ireland] -0.01598 0.008269 0.046 * (-0.03396, -0.000512)
events_E...eland]-1 -0.01323 0.007251 0.066 . (-0.02749, 0.001314)
events_E...eland]-2 -0.005662 0.006406 0.368 (-0.01904, 0.006625)
events_E...eland]+1 -0.007572 0.006471 0.248 (-0.02051, 0.003161)
events_E...eland]+2 0.0006749 0.005251 0.910 (-0.008711, 0.01153)
events_Good Friday -0.008316 0.006618 0.204 (-0.02165, 0.004865)
events_Good Friday-1 -0.003783 0.005843 0.530 (-0.01656, 0.0066)
events_Good Friday-2 0.0001563 0.006248 0.988 (-0.01226, 0.01354)
events_Good Friday+1 -0.005662 0.006406 0.368 (-0.01904, 0.006625)
events_Good Friday+2 -0.01323 0.007251 0.066 . (-0.02749, 0.001314)
events_I...ence Day -0.005032 0.007659 0.510 (-0.01972, 0.008768)
events_I...ce Day-1 -0.005018 0.007737 0.506 (-0.01964, 0.00966)
events_I...ce Day-2 -0.009542 0.006892 0.154 (-0.02267, 0.004394)
events_I...ce Day+1 -0.01452 0.009738 0.150 (-0.03475, 0.002893)
events_I...ce Day+2 -0.01124 0.01089 0.318 (-0.03298, 0.009538)
events_Labor Day -0.02696 0.01174 0.020 * (-0.05198, -0.006319)
events_Labor Day-1 -0.005985 0.009626 0.534 (-0.02494, 0.01049)
events_Labor Day-2 0.002608 0.008583 0.762 (-0.01318, 0.01932)
events_Labor Day+1 -0.01799 0.01002 0.066 . (-0.03769, 0.001578)
events_Labor Day+2 -0.02099 0.01126 0.074 . (-0.04395, 0.000563)
events_Memorial Day -0.02921 0.009702 <2e-16 *** (-0.04786, -0.00999)
events_M...al Day-1 -0.01995 0.00739 0.004 ** (-0.03453, -0.005988)
events_M...al Day-2 -0.003833 0.004554 0.386 (-0.01312, 0.005242)
events_M...al Day+1 -0.008439 0.005106 0.088 . (-0.01864, 0.0007558)
events_M...al Day+2 0.009471 0.006017 0.104 (-0.001008, 0.02243)
events_New Years Day -0.009475 0.01044 0.358 (-0.02956, 0.01119)
events_N...rs Day-1 0.003805 0.009645 0.702 (-0.01301, 0.02499)
events_N...rs Day-2 0.01679 0.01144 0.136 (-0.00381, 0.04019)
events_N...rs Day+1 0.01278 0.009712 0.188 (-0.004023, 0.03279)
events_N...rs Day+2 0.01514 0.01021 0.126 (-0.002275, 0.037)
events_Other 0.002726 0.02517 0.912 (-0.04736, 0.05238)
events_Other-1 0.003401 0.02315 0.898 (-0.04147, 0.0473)
events_Other-2 0.02828 0.02208 0.222 (-0.01736, 0.06764)
events_Other+1 0.02108 0.0219 0.316 (-0.02675, 0.06122)
events_Other+2 0.0284 0.0218 0.172 (-0.01416, 0.07155)
events_Thanksgiving -0.0113 0.00824 0.172 (-0.02877, 0.003817)
events_T...giving-1 -0.02003 0.008155 0.010 * (-0.03533, -0.005981)
events_T...giving-2 -0.008212 0.006092 0.186 (-0.02139, 0.002288)
events_T...giving+1 -0.005816 0.008623 0.514 (-0.02274, 0.009103)
events_T...giving+2 -0.01107 0.008157 0.166 (-0.02938, 0.002088)
events_Veterans Day 0.003134 0.008162 0.704 (-0.01206, 0.02081)
events_V...ns Day-1 0.00224 0.004158 0.556 (-0.006862, 0.009685)
events_V...ns Day-2 0.002455 0.005516 0.656 (-0.008434, 0.01323)
events_V...ns Day+1 -0.001266 0.006285 0.842 (-0.01423, 0.01023)
events_V...ns Day+2 -0.003289 0.007549 0.694 (-0.01932, 0.01)
str_dow_2-Tue 0.01553 0.007015 0.030 * (0.003088, 0.0297)
str_dow_3-Wed -0.01157 0.006675 0.088 . (-0.02434, 0.001833)
str_dow_4-Thu -0.0143 0.006281 0.024 * (-0.02611, -0.001377)
str_dow_5-Fri -0.02014 0.00653 <2e-16 *** (-0.03346, -0.006914)
str_dow_6-Sat -0.03898 0.007444 <2e-16 *** (-0.0537, -0.02411)
str_dow_7-Sun 0.01577 0.00783 0.048 * (0.0005994, 0.03112)
ct1 0.01991 0.004792 0.002 ** (0.0108, 0.02943)
is_weekend:ct1 -0.004701 0.002951 0.114 (-0.01062, 0.001198)
str_dow_2-Tue:ct1 -0.001563 0.007579 0.838 (-0.01631, 0.01282)
str_dow_3-Wed:ct1 0.004952 0.005421 0.352 (-0.005125, 0.01612)
str_dow_4-Thu:ct1 -0.0003992 0.004733 0.934 (-0.009116, 0.008228)
str_dow_5-Fri:ct1 0.002644 0.005019 0.644 (-0.005935, 0.01366)
str_dow_6-Sat:ct1 -0.00142 0.00634 0.814 (-0.01444, 0.01036)
str_dow_7-Sun:ct1 -0.003282 0.007635 0.672 (-0.01823, 0.0121)
ct1:sin1_tow_weekly 0.006236 0.00353 0.088 . (-0.0004046, 0.01336)
ct1:cos1_tow_weekly 0.01315 0.005924 0.018 * (0.002314, 0.02486)
ct1:sin2_tow_weekly 0.00129 0.003766 0.768 (-0.006424, 0.008586)
ct1:cos2_tow_weekly 0.01827 0.005565 <2e-16 *** (0.007345, 0.02891)
sin1_tow_weekly 0.02906 0.01592 0.064 . (-0.002545, 0.05879)
cos1_tow_weekly 0.1155 0.01882 <2e-16 *** (0.07648, 0.1505)
sin2_tow_weekly -0.01668 0.01595 0.286 (-0.04771, 0.01705)
cos2_tow_weekly 0.0708 0.01879 <2e-16 *** (0.03698, 0.1089)
sin3_tow_weekly -0.01584 0.00905 0.090 . (-0.03297, 0.003206)
cos3_tow_weekly 0.001629 0.01153 0.874 (-0.02028, 0.0237)
sin4_tow_weekly 0.01584 0.00905 0.090 . (-0.003206, 0.03297)
cos4_tow_weekly 0.001629 0.01153 0.874 (-0.02028, 0.0237)
sin1_toq_quarterly 0.05602 0.008406 <2e-16 *** (0.04006, 0.07244)
cos1_toq_quarterly -0.006454 0.006881 0.356 (-0.0196, 0.006851)
sin2_toq_quarterly -0.02628 0.007226 <2e-16 *** (-0.0411, -0.01287)
cos2_toq_quarterly -0.05165 0.007978 <2e-16 *** (-0.06629, -0.03568)
sin3_toq_quarterly 0.01995 0.007823 0.006 ** (0.005842, 0.03541)
cos3_toq_quarterly 0.005298 0.008451 0.536 (-0.01074, 0.02272)
sin4_toq_quarterly -0.0009486 0.01358 0.952 (-0.02919, 0.02627)
cos4_toq_quarterly -0.02182 0.01416 0.136 (-0.05078, 0.00646)
sin5_toq_quarterly -0.02673 0.01409 0.068 . (-0.05439, 0.0008321)
cos5_toq_quarterly 0.01739 0.01344 0.196 (-0.009167, 0.0441)
sin1_ct1_yearly -0.09928 0.01505 <2e-16 *** (-0.127, -0.06951)
cos1_ct1_yearly 0.6423 0.01241 <2e-16 *** (0.6173, 0.6645)
sin2_ct1_yearly 0.06922 0.01344 <2e-16 *** (0.04464, 0.0975)
cos2_ct1_yearly -0.1146 0.01383 <2e-16 *** (-0.1414, -0.08731)
sin3_ct1_yearly 0.2197 0.01453 <2e-16 *** (0.1933, 0.2494)
cos3_ct1_yearly -0.06583 0.01379 <2e-16 *** (-0.09015, -0.03974)
sin4_ct1_yearly 0.003701 0.007103 0.590 (-0.009748, 0.01763)
cos4_ct1_yearly -0.0613 0.008278 <2e-16 *** (-0.0777, -0.04643)
sin5_ct1_yearly -0.08773 0.015 <2e-16 *** (-0.1182, -0.05769)
cos5_ct1_yearly -0.01478 0.01305 0.242 (-0.0402, 0.01251)
sin6_ct1_yearly -0.1128 0.013 <2e-16 *** (-0.1369, -0.08713)
cos6_ct1_yearly -0.01973 0.01377 0.162 (-0.04697, 0.006394)
sin7_ct1_yearly -0.05099 0.01347 <2e-16 *** (-0.078, -0.02598)
cos7_ct1_yearly 0.04107 0.01312 0.004 ** (0.01325, 0.06619)
sin8_ct1_yearly 0.02759 0.007615 <2e-16 *** (0.01243, 0.04233)
cos8_ct1_yearly 0.04907 0.007975 <2e-16 *** (0.03273, 0.0628)
sin9_ct1_yearly 0.007199 0.01407 0.598 (-0.01934, 0.03483)
cos9_ct1_yearly -0.01411 0.01374 0.296 (-0.04131, 0.01162)
sin10_ct1_yearly -0.06548 0.01285 <2e-16 *** (-0.09171, -0.04122)
cos10_ct1_yearly -0.04781 0.01361 <2e-16 *** (-0.07461, -0.02277)
sin11_ct1_yearly -0.02852 0.01457 0.060 . (-0.05823, 0.0004864)
cos11_ct1_yearly -0.006444 0.0136 0.604 (-0.03322, 0.02067)
sin12_ct1_yearly 0.00126 0.008016 0.870 (-0.01558, 0.01616)
cos12_ct1_yearly 0.01347 0.008319 0.102 (-0.002179, 0.02993)
sin13_ct1_yearly -0.01191 0.01435 0.396 (-0.03944, 0.01693)
cos13_ct1_yearly 0.05416 0.01368 <2e-16 *** (0.02545, 0.07911)
sin14_ct1_yearly 0.02405 0.01372 0.064 . (0.0008108, 0.05143)
cos14_ct1_yearly 0.009502 0.01334 0.486 (-0.017, 0.03507)
sin15_ct1_yearly 0.02422 0.01463 0.092 . (-0.002486, 0.05627)
cos15_ct1_yearly -0.02238 0.01417 0.108 (-0.04941, 0.006753)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multiple R-squared: 0.5229, Adjusted R-squared: 0.515
F-statistic: 55.601 on 48 and 2914 DF, p-value: 1.110e-16
Model AIC: 20626.0, model BIC: 20924.0
WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.
Apply the model¶
The trained model is available as a fitted sklearn.pipeline.Pipeline
.
173 174 |
Out:
Pipeline(steps=[('input',
PandasFeatureUnion(transformer_list=[('date',
Pipeline(steps=[('select_date',
ColumnSelector(column_names=['ts']))])),
('response',
Pipeline(steps=[('select_val',
ColumnSelector(column_names=['y'])),
('outlier',
ZscoreOutlierTransformer()),
('null',
NullTransformer(impute_algorithm='interpolate',
impute_params={'axis': 0,
'limit_direct...
'simple_freq': <SimpleTimeFrequencyEnum.DAY: Frequency(default_horizon=30, seconds_per_observation=86400, valid_seas={'MONTHLY_SEASONALITY', 'QUARTERLY_SEASONALITY', 'WEEKLY_SEASONALITY', 'YEARLY_SEASONALITY'})>,
'start_year': 2007},
uncertainty_dict={'params': {'conditional_cols': ['dow_hr'],
'quantile_estimation_method': 'normal_fit',
'quantiles': [0.025000000000000022,
0.975],
'sample_size_thresh': 5,
'small_sample_size_method': 'std_quantiles',
'small_sample_size_quantile': 0.98},
'uncertainty_method': 'simple_conditional_residuals'}))])
You can take this model and forecast on any date range
by passing a new dataframe to predict on. The
make_future_dataframe
convenience function can be used to create this dataframe.
Here, we predict the next 4 periods after the model’s train end date.
Note
The dataframe passed to .predict() must have the same columns
as the df
passed to run_forecast_config
above, including
any regressors needed for prediction. The value_col
column
should be included with values set to np.nan.
188 189 190 191 |
ts | y | |
---|---|---|
2016-01-21 | 2016-01-21 | NaN |
2016-01-22 | 2016-01-22 | NaN |
2016-01-23 | 2016-01-23 | NaN |
2016-01-24 | 2016-01-24 | NaN |
Call .predict() to compute predictions
ts | forecast | forecast_lower | forecast_upper | y_quantile_summary | |
---|---|---|---|---|---|
0 | 2016-01-21 | 8.971131 | 8.023176 | 9.919087 | (8.023175764425014, 9.919086953794318) |
1 | 2016-01-22 | 8.971261 | 7.930734 | 10.011789 | (7.930733838005737, 10.011788866680934) |
2 | 2016-01-23 | 8.610398 | 7.565959 | 9.654837 | (7.565959127781375, 9.654837376620925) |
3 | 2016-01-24 | 9.087944 | 7.794432 | 10.381456 | (7.794431622702435, 10.3814559235619) |
What’s next?¶
If you’re satisfied with the forecast performance, you’re done!
For a complete example of how to tune this forecast, see Tune your first forecast model.
Besides the component plot, we offer additional tools to help you improve your forecast and understand the result.
See the following guides:
For example, for this dataset, you could add changepoints to handle the change in trend around 2014 and avoid the overprediction issue seen in the backtest plot.
Or you might want to try a different model template. Model templates bundle an algorithm with recommended hyperparameters. The template that works best for you depends on the data characteristics and forecast requirements (e.g. short / long forecast horizon). We recommend trying a few and tuning the ones that look promising. All model templates are available through the same forecasting and tuning interface shown here.
For details about the model templates and how to set model components, see the following guides:
Total running time of the script: ( 0 minutes 52.915 seconds)