Tune your first forecast model

This is a basic tutorial for creating and tuning a forecast model. It is intended to provide a basic sense of a forecast process without assuming background knowledge in forecasting.

You can use the PROPHET or SILVERKITE model. In this tutorial, we focus on SILVERKITE. However, the basic ideas of tuning are similar to both models. You may see detailed information about PROPHET at Prophet.

SILVERKITE decomposes time series into various components, and it creates time-based features, autoregressive features, together with user-provided features such as macro-economic features and their interactions, then performs a machine learning regression model to learn the relationship between the time series and these features. The forecast is based on the learned relationship and the future values of these features. Therefore, including the correct features is the key to success.

Common features include:

Datetime derivatives:

Including features derived from datetime such as day of year, hour of day, weekday, is_weekend and etc. These features are useful in capturing special patterns. For example, the patterns of weekdays and weekends are different for most business related time series, and this can be modeled with is_weekend.

Growth:

First defines the basic feature ct1 that counts how long has passed in terms of years (could be fraction) since the first day of training data. For example, if the training data starts with “2018-01-01”, then the date has ct1=0.0, and “2018-01-02” has ct1=1/365. “2019-01-01” has ct1=1.0. This ct1 can be as granular as needed. A separate growth function can be applied to ct1 to support different types of growth model. For example, ct2 is defined as the square of ct1 to model quadratic growth.

Trend:

Trend describes the average tendency of the time series. It is defined through the growth term with possible changepoints. At every changepoint, the growth rate could change (faster or slower). For example, if ct1 (linear growth) is used with changepoints, the trend is modeled as piece-wise linear.

Seasonality:

Seasonality describes the periodical pattern of the time series. It contains multiple levels including daily seasonality, weekly seasonality, monthly seasonality, quarterly seasonality and yearly seasonality. Seasonality are defined through Fourier series with different orders. The greater the order, the more detailed periodical pattern the model can learn. However, an order that is too large can lead to overfitting.

Events:

Events include holidays and other short-term occurrences that could temporarily affect the time series, such as Thanksgiving long weekend. Typically, events are regular and repeat at know times in the future. These features made of indicators that covers the event day and their neighbor days.

Autoregression:

Autoregressive features include the time series observations in the past and their aggregations. For example, the past day’s observation, the same weekday on the past week, or the average of the past 7 days, etc. can be used. Note that autoregression features are very useful in short term forecasts, however, this should be avoided in long term forecast. The reason is that long-term forecast focuses more on the correctness of trend, seasonality and events. The lags and autoregressive terms in a long-term forecast are calculated based on the forecasted values. The further we forecast into the future, the more forecasted values we need to create the autoregressive terms, making the forecast less stable.

Custom:

Extra features that are relevant to the time series such as macro-ecomonic features that are expected to affect the time series. Note that these features need to be manually provided for both the training and forecasting periods.

Interactions:

Any interaction between the features above.

Now let’s use an example to go through the full forecasting and tuning process. In this example, we’ll load a dataset representing log(daily page views) on the Wikipedia page for Peyton Manning. It contains values from 2007-12-10 to 2016-01-20. More dataset info here.

 87 import datetime
 88
 89 import numpy as np
 90 import pandas as pd
 91 import plotly
 92
 93 from greykite.algo.changepoint.adalasso.changepoint_detector import ChangepointDetector
 94 from greykite.algo.forecast.silverkite.constants.silverkite_holiday import SilverkiteHoliday
 95 from greykite.algo.forecast.silverkite.constants.silverkite_seasonality import SilverkiteSeasonalityEnum
 96 from greykite.algo.forecast.silverkite.forecast_simple_silverkite_helper import cols_interact
 97 from greykite.common import constants as cst
 98 from greykite.common.features.timeseries_features import build_time_features_df
 99 from greykite.common.features.timeseries_features import convert_date_to_continuous_time
100 from greykite.framework.benchmark.data_loader_ts import DataLoaderTS
101 from greykite.framework.templates.autogen.forecast_config import EvaluationPeriodParam
102 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
103 from greykite.framework.templates.autogen.forecast_config import MetadataParam
104 from greykite.framework.templates.autogen.forecast_config import ModelComponentsParam
105 from greykite.framework.templates.forecaster import Forecaster
106 from greykite.framework.templates.model_templates import ModelTemplateEnum
107 from greykite.framework.utils.result_summary import summarize_grid_search_results
108
109
110 # Loads dataset into UnivariateTimeSeries
111 dl = DataLoaderTS()
112 ts = dl.load_peyton_manning_ts()
113 df = ts.df  # cleaned pandas.DataFrame

Exploratory data analysis (EDA)

After reading in a time series, we could first do some exploratory data analysis. The UnivariateTimeSeries class is used to store a timeseries and perform EDA.

122 # describe
123 print(ts.describe_time_col())
124 print(ts.describe_value_col())

Out:

{'data_points': 2964, 'mean_increment_secs': 86400.0, 'min_timestamp': Timestamp('2007-12-10 00:00:00'), 'max_timestamp': Timestamp('2016-01-20 00:00:00')}
count    2905.000000
mean        8.138958
std         0.845957
min         5.262690
25%         7.514800
50%         7.997999
75%         8.580168
max        12.846747
Name: y, dtype: float64

The df has two columns, time column “ts” and value column “y”. The data is daily that ranges from 2007-12-10 to 2016-01-20. The data value ranges from 5.26 to 12.84

Let’s plot the original timeseries. (The interactive plot is generated by plotly: click to zoom!)

134 fig = ts.plot()
135 plotly.io.show(fig)

A few exploratory plots can be plotted to reveal the time series’s properties. The UnivariateTimeSeries class has a very powerful plotting tool plot_quantiles_and_overlays. A tutorial of using the function can be found at Seasonality Plots.

Baseline model

A simple forecast can be created on the data set, see details in Simple Forecast. Note that if you do not provide any extra parameters, all model parameters are by default. The default parameters are chosen conservatively, so consider this a baseline model to assess forecast difficulty and make further improvements if necessary.

153 # Specifies dataset information
154 metadata = MetadataParam(
155     time_col="ts",  # name of the time column
156     value_col="y",  # name of the value column
157     freq="D"  # "H" for hourly, "D" for daily, "W" for weekly, etc.
158 )
159
160 forecaster = Forecaster()
161 result = forecaster.run_forecast_config(
162     df=df,
163     config=ForecastConfig(
164         model_template=ModelTemplateEnum.SILVERKITE.name,
165         forecast_horizon=365,  # forecasts 365 steps ahead
166         coverage=0.95,  # 95% prediction intervals
167         metadata_param=metadata
168     )
169 )

Out:

Fitting 3 folds for each of 1 candidates, totalling 3 fits

For a detailed documentation about the output from run_forecast_config, see Check Forecast Result. Here we could plot the forecast.

176 forecast = result.forecast
177 fig = forecast.plot()
178 plotly.io.show(fig)

Model performance evaluation

We can see the forecast fits the existing data well; however, we do not have a good ground truth to assess how well it predicts into the future.

Train-test-split

The typical way to evaluate model performance is to reserve part of the training data and use it to measure the model performance. Because we always predict the future in a time series forecasting problem, we reserve data from the end of training set to measure the performance of our forecasts. This is called a time series train test split.

By default, the results returned by run_forecast_config creates a time series train test split and stores the test result in result.backtest. The reserved testing data by default has the same length as the forecast horizon. We can access the evaluation results:

199 pd.DataFrame(result.backtest.test_evaluation, index=["Value"]).transpose()  # formats dictionary as a pd.DataFrame
Value
CORR 0.754113
R2 0.505202
MSE 0.252507
RMSE 0.502501
MAE 0.353449
MedAE 0.25443
MAPE 4.528096
MedAPE 3.261391
sMAPE 2.227281
Q80 0.125752
Q95 0.100265
Q99 0.093469
OutsideTolerance1p 0.818182
OutsideTolerance2p 0.680441
OutsideTolerance3p 0.5427
OutsideTolerance4p 0.424242
OutsideTolerance5p 0.336088
Outside Tolerance (fraction) None
R2_null_model_score None
Prediction Band Width (%) 26.808794
Prediction Band Coverage (fraction) 0.983471
Coverage: Lower Band 0.732782
Coverage: Upper Band 0.250689
Coverage Diff: Actual_Coverage - Intended_Coverage 0.033471
MIS 2.620554


Evaluation metrics

From here we can see a list of metrics that measure the model performance on the test data. You may choose one or a few metrics to focus on. Typical metrics include:

MSE:

Mean squared error, the average squared error. Could be affected by extreme values.

RMSE:

Root mean squared error, the square root of MSE.

MAE:

Mean absolute error, the average of absolute error. Could be affected by extreme values.

MedAE:

Median absolute error, the median of absolute error. Less affected by extreme values.

MAPE:

Mean absolute percent error, measures the error percent with respective to the true values. This is useful when you would like to consider the relative error instead of the absolute error. For example, an error of 1 is considered as 10% for a true observation of 10, but as 1% for a true observation of 100. This is the default metric we like.

MedAPE:

Median absolute percent error, the median version of MAPE, less affected by extreme values.

Let’s use MAPE as our metric in this example. Looking at these results, you may have a basic sense of how the model is performing on the unseen test data. On average, the baseline model’s prediction is 11.3% away from the true values.

Time series cross-validation

Forecast quality depends a lot of the evaluation time window. The evaluation window selected above might happen to be a relatively easy/hard period to predict. Thus, it is more robust to evaluate over a longer time window when dataset size allows. Let’s consider a more general way of evaluating a forecast model: time series cross-validation.

Time series cross-validation is based on a time series rolling split. Let’s say we would like to perform an evaluation with a 3-fold cross-validation, The whole training data is split in 3 different ways. Since our forecast horizon is 365 days, we do:

First fold:

Train from 2007-12-10 to 2013-01-20, forecast from 2013-01-21 to 2014-01-20, and compare the forecast with the actual.

Second fold:

Train from 2007-12-10 to 2014-01-20, forecast from 2014-01-21 to 2015-01-20, and compare the forecast with the actual.

Third fold:

Train from 2007-12-10 to 2015-01-20, forecast from 2015-01-21 to 2016-01-20, and compare the forecast with the actual.

The split could be more flexible, for example, the testing periods could have gaps. For more details about evaluation period configuration, see Evaluation Period. The forecast model’s performance will be the average of the three evaluations on the forecasts.

By default, the results returned by run_forecast_config also runs time series cross-validation internally. You are allowed to configure the cross-validation splits, as shown below. Here note that the test_horizon are reserved from the back of the data and not used for cross-validation. This part of testing data can further evaluate the model performance besides the cross-validation result, and is available for plotting.

268 # Defines the cross-validation config
269 evaluation_period = EvaluationPeriodParam(
270     test_horizon=365,             # leaves 365 days as testing data
271     cv_horizon=365,               # each cv test size is 365 days (same as forecast horizon)
272     cv_max_splits=3,              # 3 folds cv
273     cv_min_train_periods=365 * 4  # uses at least 4 years for training because we have 8 years data
274 )
275
276 # Runs the forecast
277 result = forecaster.run_forecast_config(
278     df=df,
279     config=ForecastConfig(
280         model_template=ModelTemplateEnum.SILVERKITE.name,
281         forecast_horizon=365,  # forecasts 365 steps ahead
282         coverage=0.95,  # 95% prediction intervals
283         metadata_param=metadata,
284         evaluation_period_param=evaluation_period
285     )
286 )
287
288 # Summarizes the cv result
289 cv_results = summarize_grid_search_results(
290     grid_search=result.grid_search,
291     decimals=1,
292     # The below saves space in the printed output. Remove to show all available metrics and columns.
293     cv_report_metrics=None,
294     column_order=["rank", "mean_test", "split_test", "mean_train", "split_train", "mean_fit_time", "mean_score_time", "params"])
295 # Transposes to save space in the printed output
296 cv_results["params"] = cv_results["params"].astype(str)
297 cv_results.set_index("params", drop=True, inplace=True)
298 cv_results.transpose()

Out:

Fitting 3 folds for each of 1 candidates, totalling 3 fits
params []
rank_test_MAPE 1
mean_test_MAPE 6.5
split_test_MAPE (5.1, 6.9, 7.4)
mean_train_MAPE 4.0
split_train_MAPE (3.9, 4.1, 4.0)
mean_fit_time 6.6
mean_score_time 1.0


By default, all metrics in ElementwiseEvaluationMetricEnum are computed on each CV train/test split. The configuration of CV evaluation metrics can be found at Evaluation Metric. Here, we show the Mean Absolute Percentage Error (MAPE) across splits (see summarize_grid_search_results to control what to show and for details on the output columns). From the result, we see that the cross-validation mean_test_MAPE is 7.3%, which means the prediction is 7.3% away from the ground truth on average. We also see the 3 cv folds have split_test_MAPE 5.1%, 8.5% and 8.4%, respectively.

When we have different sets of model parameters, a good way to compare them is to run a time series cross-validation on each set of parameters, and pick the set of parameters that has the best cross-validated performance.

Start tuning

Now that you know how to evaluate model performance, let’s see if we can improve the model by tuning its parameters.

Anomaly

An anomaly is a deviation in the metric that is not expected to occur again in the future. Including anomaly points will lead the model to fit the anomaly as an intrinsic property of the time series, resulting in inaccurate forecasts. These anomalies could be identified through overlay plots, see Seasonality Plots.

329 fig = ts.plot_quantiles_and_overlays(
330     groupby_time_feature="month_dom",
331     show_mean=True,
332     show_quantiles=False,
333     show_overlays=True,
334     overlay_label_time_feature="year",
335     overlay_style={"line": {"width": 1}, "opacity": 0.5},
336     center_values=True,
337     xlabel="day of year",
338     ylabel=ts.original_value_col,
339     title="yearly seasonality for each year (centered)",
340 )
341 plotly.io.show(fig)

From the yearly overlay plot above, we could see two big anomalies: one in March of 2012, and one in June of 2010. Other small anomalies could be identified as well, however, they have less influence. The SILVERKITE template currently supports masking anomaly points by supplying the anomaly_info as a dictionary. You could either assign adjusted values to them, or simply mask them as NA (in which case these dates will not be used in fitting). For a detailed introduction about the anomaly_info configuration, see Examine Input Data. Here we define an anomaly_df dataframe to mask them as NA, and wrap it into the anomaly_info dictionary.

356 anomaly_df = pd.DataFrame({
357     # start and end date are inclusive
358     # each row is an anomaly interval
359     cst.START_TIME_COL: ["2010-06-05", "2012-03-01"],  # inclusive
360     cst.END_TIME_COL: ["2010-06-20", "2012-03-20"],  # inclusive
361     cst.ADJUSTMENT_DELTA_COL: [np.nan, np.nan],  # mask as NA
362 })
363 # Creates anomaly_info dictionary.
364 # This will be fed into the template.
365 anomaly_info = {
366     "value_col": "y",
367     "anomaly_df": anomaly_df,
368     "adjustment_delta_col": cst.ADJUSTMENT_DELTA_COL,
369 }

Adding relevant features

Growth and trend

First we look at the growth and trend. Detailed growth configuration can be found at Growth. In these two features, we care less about the short-term fluctuations but rather long-term tendency. From the original plot we see there is no obvious growth pattern, thus we could use a linear growth to fit the model. On the other hand, there could be potential trend changepoints, at which time the linear growth changes its rate. Detailed changepoint configuration can be found at Changepoints. These points can be detected with the ChangepointDetector class. For a quickstart example, see Changepoint Detection. Here we explore the automatic changepoint detection. The parameters in this automatic changepoint detection is customized for this data set. We keep the yearly_seasonality_order the same as the model’s yearly seasonality order. The regularization_strength controls how many changepoints are detected. 0.5 is a good choice, while you may try other numbers such as 0.4 or 0.6 to see the difference. The resample_freq is set to 7 days, because we have a long training history, thus we should keep this relatively long (the intuition is that shorter changes will be ignored). We put 25 potential changepoints to be the candidates, because we do not expect too many changes. However, this could be higher. The yearly_seasonality_change_freq is set to 365 days, which means we refit the yearly seasonality every year, because it can be see from the time series plot that the yearly seasonality varies every year. The no_changepoint_distance_from_end is set to 365 days, which means we do not allow any changepoints at the last 365 days of training data. This avoids fitting the final trend with too little data. For long-term forecast, this is typically the same as the forecast horizon, while for short-term forecast, this could be a multiple of the forecast horizon.

402 model = ChangepointDetector()
403 res = model.find_trend_changepoints(
404     df=df,  # data df
405     time_col="ts",  # time column name
406     value_col="y",  # value column name
407     yearly_seasonality_order=10,  # yearly seasonality order, fit along with trend
408     regularization_strength=0.5,  # between 0.0 and 1.0, greater values imply fewer changepoints, and 1.0 implies no changepoints
409     resample_freq="7D",  # data aggregation frequency, eliminate small fluctuation/seasonality
410     potential_changepoint_n=25,  # the number of potential changepoints
411     yearly_seasonality_change_freq="365D",  # varying yearly seasonality for every year
412     no_changepoint_distance_from_end="365D")  # the proportion of data from end where changepoints are not allowed
413 fig = model.plot(
414     observation=True,
415     trend_estimate=False,
416     trend_change=True,
417     yearly_seasonality_estimate=False,
418     adaptive_lasso_estimate=True,
419     plot=False)
420 plotly.io.show(fig)

From the plot we see the automatically detected trend changepoints. The results shows that the time series is generally increasing until 2012, then generally decreasing. One possible explanation is that 2011 is the last year Peyton Manning was at the Indianapolis Colts before joining the Denver Broncos. If we feed the trend changepoint detection parameter to the template, these trend changepoint features will be automatically included in the model.

430 # The following specifies the growth and trend changepoint configurations.
431 growth = {
432     "growth_term": "linear"
433 }
434 changepoints = {
435     "changepoints_dict": dict(
436         method="auto",
437         yearly_seasonality_order=10,
438         regularization_strength=0.5,
439         resample_freq="7D",
440         potential_changepoint_n=25,
441         yearly_seasonality_change_freq="365D",
442         no_changepoint_distance_from_end="365D"
443     )
444 }

Seasonality

The next features we will look into are the seasonality features. Detailed seasonality configurations can be found at Seasonality. A detailed seasonality detection quickstart example on the same data set is available at Seasonality Plots. The conclusions about seasonality terms are:

  • daily seasonality is not available (because frequency is daily);

  • weekly and yearly patterns are evident (weekly will also interact with football season);

  • monthly or quarterly seasonality is not evident.

Therefore, for pure seasonality terms, we include weekly and yearly seasonality. The seasonality orders are something to be tuned; here let’s take weekly seasonality order to be 5 and yearly seasonality order to be 10. For tuning info, see Seasonality.

465 # Includes yearly seasonality with order 10 and weekly seasonality with order 5.
466 # Set the other seasonality to False to disable them.
467 yearly_seasonality_order = 10
468 weekly_seasonality_order = 5
469 seasonality = {
470     "yearly_seasonality": yearly_seasonality_order,
471     "quarterly_seasonality": False,
472     "monthly_seasonality": False,
473     "weekly_seasonality": weekly_seasonality_order,
474     "daily_seasonality": False
475 }

We will add the interaction between weekly seasonality and the football season later in this tutorial. The SILVERKITE template also supports seasonality changepoints. A seasonality changepoint is a time point after which the periodic effect behaves differently. For SILVERKITE, this means the Fourier series coefficients are allowed to change. We could decide to add this feature if cross-validation performance is poor and seasonality changepoints are detected in exploratory analysis. For details, see Changepoint Detection.

Holidays and events

Then let’s look at holidays and events. Detailed holiday and event configurations can be found at Holidays and Events. Ask yourself which holidays are likely to affect the time series’ values. We expect that major United States holidays may affect wikipedia pageviews, since most football fans are in the United States. Events such as superbowl could potentially increase the pageviews. Therefore, we add United States holidays and superbowls dates as custom events. Other important events that affect the time series can also be found through the yearly seasonality plots in Seasonality Plots.

500 # Includes major holidays and the superbowl date.
501 events = {
502     # These holidays as well as their pre/post dates are modeled as individual events.
503     "holidays_to_model_separately": SilverkiteHoliday.ALL_HOLIDAYS_IN_COUNTRIES,  # all holidays in "holiday_lookup_countries"
504     "holiday_lookup_countries": ["UnitedStates"],  # only look up holidays in the United States
505     "holiday_pre_num_days": 2,  # also mark the 2 days before a holiday as holiday
506     "holiday_post_num_days": 2,  # also mark the 2 days after a holiday as holiday
507     "daily_event_df_dict": {
508         "superbowl": pd.DataFrame({
509             "date": ["2008-02-03", "2009-02-01", "2010-02-07", "2011-02-06",
510                      "2012-02-05", "2013-02-03", "2014-02-02", "2015-02-01", "2016-02-07"],  # dates must cover training and forecast period.
511             "event_name": ["event"] * 9  # labels
512         })
513     },
514 }

Autoregression

The autoregressive features are very useful in short-term forecasting, but could be risky to use in long-term forecasting. Detailed autoregression configurations can be found at Auto-regression.

Custom

Now we consider some custom features that could relate to the pageviews. The documentation for extra regressors can be found at Regressors. As mentioned in Seasonality Plots, we observe that the football season heavily affects the pageviews, therefore we need to use regressors to identify the football season. There are multiple ways to include this feature: adding indicator for the whole season; adding number of days till season start (end) and number of days since season start (end). The former puts a uniform effect over all in-season dates, while the latter quantify the on-ramp and down-ramp. If you are not sure which effect to include, it’s ok to include both effects. SILVERKITE has the option to use Ridge regression as the fit algorithm to avoid over-fitting too many features. Note that many datetime features could also be added to the model as features. SILVERKITE calculates some of these features, which can be added to extra_pred_cols as an arbitrary patsy expression. For a full list of such features, see build_time_features_df.

If a feature is not automatically created by SILVERKITE, we need to create it beforehand and append it to the data df. Here we create the “is_football_season” feature. Note that we also need to provide the customized column for the forecast horizon period as well. The way we do it is to first create the df with timestamps covering the forecast horizon. This can be done with the make_future_dataframe function within the UnivariateTimeSeries class. Then we create a new column of our customized regressor for this augmented df.

548 # Makes augmented df with forecast horizon 365 days
549 df_full = ts.make_future_dataframe(periods=365)
550 # Builds "df_features" that contains datetime information of the "df"
551 df_features = build_time_features_df(
552     dt=df_full["ts"],
553     conti_year_origin=convert_date_to_continuous_time(df_full["ts"][0])
554 )
555
556 # Roughly approximates the football season.
557 # "woy" is short for "week of year", created above.
558 # Football season is roughly the first 6 weeks and last 17 weeks in a year.
559 is_football_season = (df_features["woy"] <= 6) | (df_features["woy"] >= 36)
560 # Adds the new feature to the dataframe.
561 df_full["is_football_season"] = is_football_season.astype(int).tolist()
562 df_full.reset_index(drop=True, inplace=True)
563
564 # Configures regressor column.
565 regressors = {
566     "regressor_cols": ["is_football_season"]
567 }

Interactions

Finally, let’s consider what possible interactions are relevant to the forecast problem. Generally speaking, if a feature behaves differently on different values of another feature, these two features could have potential interaction effects. As in Seasonality Plots, the weekly seasonality is different through football season and non-football season, therefore, the multiplicative term is_football_season x weekly_seasonality is able to capture this pattern.

579 fig = ts.plot_quantiles_and_overlays(
580     groupby_time_feature="str_dow",
581     show_mean=True,
582     show_quantiles=False,
583     show_overlays=True,
584     center_values=True,
585     overlay_label_time_feature="month",  # splits overlays by month
586     overlay_style={"line": {"width": 1}, "opacity": 0.5},
587     xlabel="day of week",
588     ylabel=ts.original_value_col,
589     title="weekly seasonality by month",
590 )
591 plotly.io.show(fig)

Now let’s create the interaction terms: interaction between is_football_season and weekly seasonality. The interaction terms between a feature and a seasonality feature can be created with the cols_interact function.

598 football_week = cols_interact(
599     static_col="is_football_season",
600     fs_name=SilverkiteSeasonalityEnum.WEEKLY_SEASONALITY.value.name,
601     fs_order=weekly_seasonality_order,
602     fs_seas_name=SilverkiteSeasonalityEnum.WEEKLY_SEASONALITY.value.seas_names
603 )
604
605 extra_pred_cols = football_week

Moreover, the multiplicative term month x weekly_seasonality and the dow_woy features also account for the varying weekly seasonality through the year. One could added these features, too. Here we just leave them out. You may use cols_interact again to create the month x weekly_seasonality similar to is_football_season x weekly_seasonality. dow_woy is automatically calcuated by SILVERKITE, you may simply append the name to extra_pred_cols to include it in the model.

Putting things together

Now let’s put everything together and produce a new forecast. A detailed template documentation can be found at Configure a Forecast. We first configure the MetadataParam class. The MetadataParam class includes basic proporties of the time series itself.

622 metadata = MetadataParam(
623     time_col="ts",              # column name of timestamps in the time series df
624     value_col="y",              # column name of the time series values
625     freq="D",                   # data frequency, here we have daily data
626     anomaly_info=anomaly_info,  # this is the anomaly information we defined above,
627     train_end_date=datetime.datetime(2016, 1, 20)
628 )

Next we define the ModelComponentsParam class based on the discussion on relevant features. The ModelComponentsParam include properties related to the model itself.

634 model_components = ModelComponentsParam(
635     seasonality=seasonality,
636     growth=growth,
637     events=events,
638     changepoints=changepoints,
639     autoregression=None,
640     regressors=regressors,  # is_football_season defined above
641     uncertainty={
642         "uncertainty_dict": "auto",
643     },
644     custom={
645         # What algorithm is used to learn the relationship between the time series and the features.
646         # Regularized fitting algorithms are recommended to mitigate high correlations and over-fitting.
647         # If you are not sure what algorithm to use, "ridge" is a good choice.
648         "fit_algorithm_dict": {
649             "fit_algorithm": "ridge",
650         },
651         "extra_pred_cols": extra_pred_cols  # the interaction between is_football_season and weekly seasonality defined above
652     }
653 )

Now let’s run the model with the new configuration. The evaluation config is kept the same as the previous case; this is important for a fair comparison of parameter sets.

660 # Runs the forecast
661 result = forecaster.run_forecast_config(
662     df=df_full,
663     config=ForecastConfig(
664         model_template=ModelTemplateEnum.SILVERKITE.name,
665         forecast_horizon=365,  # forecasts 365 steps ahead
666         coverage=0.95,  # 95% prediction intervals
667         metadata_param=metadata,
668         model_components_param=model_components,
669         evaluation_period_param=evaluation_period
670     )
671 )
672
673 # Summarizes the cv result
674 cv_results = summarize_grid_search_results(
675     grid_search=result.grid_search,
676     decimals=1,
677     # The below saves space in the printed output. Remove to show all available metrics and columns.
678     cv_report_metrics=None,
679     column_order=["rank", "mean_test", "split_test", "mean_train", "split_train", "mean_fit_time", "mean_score_time", "params"])
680 # Transposes to save space in the printed output
681 cv_results["params"] = cv_results["params"].astype(str)
682 cv_results.set_index("params", drop=True, inplace=True)
683 cv_results.transpose()

Out:

Fitting 3 folds for each of 1 candidates, totalling 3 fits
params []
rank_test_MAPE 1
mean_test_MAPE 5.6
split_test_MAPE (3.9, 8.7, 4.3)
mean_train_MAPE 3.4
split_train_MAPE (3.4, 3.6, 3.3)
mean_fit_time 6.4
mean_score_time 1.1


Now we see that after analyzing the problem and adding appropriate features, the cross-validation test MAPE is 5.4%, which is improved compared with the baseline (7.3%). The 3 cv folds also have their MAPE reduced to 3.9%, 8.7% and 3.8%, respectively. The first and third fold improved significantly. With some investigation, we can see that the second fold did not improve because there is a trend changepoint right at the the start of its test period.

It would be hard to know this situation until we see it. In the cross-validation step, one way to avoid this is to set a different evaluation period. However, leaving this period also makes sense because it could happen again in the future. In the forecast period, we could monitor the forecast and actual, and re-train the model to adapt to the most recent pattern if we see a deviation. In the changepoints dictionary, tune regularization_strength or no_changepoint_distance_from_end accordingly, or add manually specified changepoints to the automatically detected ones. For details, see Changepoints.

We could also plot the forecast.

704 forecast = result.forecast
705 fig = forecast.plot()
706 plotly.io.show(fig)

Check model summary

To further investigate the model mechanism, it’s also helpful to see the model summary. The ModelSummary module provides model results such as estimations, significance, p-values, confidence intervals, etc. that can help the user understand how the model works and what can be further improved.

The model summary is a class method of the estimator and can be used as follows.

721 summary = result.model[-1].summary()  # -1 retrieves the estimator from the pipeline
722 print(summary)

Out:

================================ Model Summary =================================

Number of observations: 2964,   Number of features: 292
Method: Ridge regression
Number of nonzero features: 292
Regularization parameter: 0.3039

Residuals:
         Min           1Q       Median           3Q          Max
      -2.119        -0.23     -0.05065       0.1619        3.204

            Pred_col  Estimate Std. Err Pr(>)_boot sig. code                 95%CI
           Intercept     6.869   0.1053     <2e-16       ***        (6.652, 7.065)
events_Christmas Day   -0.4473   0.1577      0.014         *    (-0.7306, -0.1319)
 events_C...bserved)    -0.825   0.7102      0.320                 (-1.84, 0.2645)
 events_C...erved)-1   -0.5963   0.6798      0.468                (-1.622, 0.6439)
 events_C...erved)-2   -0.4905   0.6118      0.478                (-1.476, 0.4396)
 events_C...erved)+1   0.04088   0.1666      0.686               (-0.2975, 0.3927)
 events_C...erved)+2   0.09685   0.1316      0.430                (-0.1443, 0.374)
 events_C...as Day-1   -0.1757   0.1801      0.286               (-0.5287, 0.2203)
 events_C...as Day-2  -0.08809   0.2913      0.776               (-0.7359, 0.3499)
 events_C...as Day+1   -0.2762   0.1574      0.062         .    (-0.5601, 0.03291)
 events_C...as Day+2   0.07479   0.1276      0.536               (-0.1525, 0.3561)
 events_Columbus Day   -0.1725   0.2126      0.382               (-0.5764, 0.3032)
 events_C...us Day-1    0.1012   0.1057      0.348              (-0.08758, 0.3128)
 events_C...us Day-2  -0.07059  0.08596      0.412               (-0.2249, 0.1033)
 events_C...us Day+1  0.000596   0.1419      0.998               (-0.2931, 0.2695)
 events_C...us Day+2 -0.005547   0.1388      0.976                 (-0.243, 0.277)
    events_Halloween   -0.1024   0.1262      0.388               (-0.3586, 0.1542)
  events_Halloween-1   0.02142   0.1213      0.860                (-0.2124, 0.274)
  events_Halloween-2    0.0631   0.1387      0.662               (-0.1911, 0.3389)
  events_Halloween+1  -0.09996   0.0933      0.284              (-0.2983, 0.08067)
  events_Halloween+2    0.2073   0.2292      0.334                (-0.1924, 0.705)
 events_I...ence Day  -0.04604   0.1158      0.672                (-0.263, 0.1784)
 events_I...bserved)   -0.1714   0.1165      0.118              (-0.3817, 0.06475)
 events_I...erved)-1   -0.1782    0.156      0.260               (-0.4676, 0.1094)
 events_I...erved)-2   -0.1073    0.116      0.360                (-0.295, 0.1348)
 events_I...erved)+1   0.02833    0.143      0.808               (-0.2287, 0.3581)
 events_I...erved)+2    0.1472   0.2498      0.588               (-0.1939, 0.7056)
 events_I...ce Day-1   -0.1366  0.09022      0.120              (-0.2863, 0.06995)
 events_I...ce Day-2   -0.1178  0.08849      0.174              (-0.2863, 0.06173)
 events_I...ce Day+1  -0.05492   0.1084      0.602               (-0.2342, 0.1827)
 events_I...ce Day+2  -0.04993   0.1187      0.666               (-0.2441, 0.2005)
    events_Labor Day    -1.268   0.1402     <2e-16       ***     (-1.524, -0.9859)
  events_Labor Day-1   -0.1177   0.1831      0.504               (-0.4878, 0.2873)
  events_Labor Day-2   -0.1132   0.0807      0.140              (-0.2723, 0.03841)
  events_Labor Day+1   -0.6873   0.1154     <2e-16       ***    (-0.9203, -0.4746)
  events_Labor Day+2   -0.2753   0.1172      0.018         *   (-0.5065, -0.03892)
 events_M... Jr. Day     0.356   0.2844      0.204               (-0.1845, 0.9333)
 events_M...r. Day-1    0.3368   0.3447      0.322                (-0.3078, 1.045)
 events_M...r. Day-2  -0.07114   0.1325      0.588               (-0.3113, 0.1882)
 events_M...r. Day+1   -0.1009   0.2062      0.654               (-0.4659, 0.2978)
 events_M...r. Day+2   0.07043   0.1636      0.680               (-0.2474, 0.3614)
 events_Memorial Day   -0.2282  0.05646     <2e-16       ***    (-0.3284, -0.1038)
 events_M...al Day-1    -0.165  0.09259      0.066         .    (-0.3124, 0.03807)
 events_M...al Day-2  -0.09409   0.1233      0.456               (-0.3181, 0.1734)
 events_M...al Day+1  -0.05099   0.0809      0.498               (-0.1998, 0.1231)
 events_M...al Day+2    0.1118   0.1206      0.348               (-0.1114, 0.3574)
events_New Years Day    -0.222   0.1117      0.052         .  (-0.4423, -0.001199)
 events_N...bserved)   0.01097  0.09176      0.782               (-0.1807, 0.1798)
 events_N...erved)-1   -0.2073    0.129      0.068         .    (-0.4046, 0.01509)
 events_N...erved)-2    -0.131   0.3203      0.624               (-0.7162, 0.3527)
 events_N...erved)+1    0.3919   0.2837      0.140               (-0.1129, 0.8218)
 events_N...erved)+2   0.07572   0.0809      0.270              (-0.07598, 0.2359)
 events_N...rs Day-1  -0.03653   0.1209      0.754               (-0.2717, 0.2132)
 events_N...rs Day-2    0.1355    0.174      0.398                (-0.1927, 0.504)
 events_N...rs Day+1    0.1919   0.1056      0.064         .   (0.0003645, 0.4008)
 events_N...rs Day+2    0.2059   0.1631      0.206              (-0.08964, 0.5253)
 events_Thanksgiving   -0.1139   0.1081      0.290              (-0.3191, 0.08503)
 events_T...giving-1   -0.3522  0.09293     <2e-16       ***    (-0.5342, -0.1634)
 events_T...giving-2   -0.3579   0.1046     <2e-16       ***    (-0.5596, -0.1578)
 events_T...giving+1  -0.07119   0.1206      0.568               (-0.2962, 0.1727)
 events_T...giving+2   -0.1675  0.08607      0.064         .   (-0.3391, 0.007174)
 events_Veterans Day  -0.07226  0.09416      0.438               (-0.2629, 0.1045)
 events_V...bserved)   -0.3612   0.2003      0.008        **         (-0.5589, 0.)
 events_V...erved)-1    0.1449   0.1025      0.100              (-0.005133, 0.298)
 events_V...erved)-2   0.01131  0.05712      0.540               (-0.1028, 0.1568)
 events_V...erved)+1   -0.1694   0.1039      0.048         *         (-0.3127, 0.)
 events_V...erved)+2   -0.1663  0.09324      0.024         *         (-0.2728, 0.)
 events_V...ns Day-1   -0.1161  0.08042      0.150              (-0.2791, 0.05194)
 events_V...ns Day-2  -0.02317  0.09884      0.812               (-0.2156, 0.1922)
 events_V...ns Day+1  -0.01046   0.1059      0.912                (-0.2197, 0.187)
 events_V...ns Day+2  -0.06036  0.06735      0.370              (-0.1911, 0.06657)
 events_W...Birthday  -0.03252   0.1248      0.808               (-0.2537, 0.2159)
 events_W...rthday-1   -0.3396  0.09701      0.004        **    (-0.5334, -0.1684)
 events_W...rthday-2   -0.1429  0.07614      0.064         .    (-0.2779, 0.02377)
 events_W...rthday+1   -0.0723  0.08868      0.428               (-0.2366, 0.1081)
 events_W...rthday+2   -0.1422  0.04839      0.004        **   (-0.2413, -0.04671)
    events_superbowl    0.5846   0.3444      0.072         .     (-0.07498, 1.304)
       str_dow_2-Tue   0.03985  0.02197      0.078         .  (-0.004167, 0.08278)
       str_dow_3-Wed   0.04807  0.01984      0.010         *    (0.01244, 0.08921)
       str_dow_4-Thu   0.03554  0.02039      0.066         .  (0.0001277, 0.07568)
       str_dow_5-Fri  -0.02483  0.01941      0.188             (-0.06317, 0.01569)
       str_dow_6-Sat  -0.09012  0.01893     <2e-16       ***   (-0.1275, -0.05094)
       str_dow_7-Sun  -0.03861  0.02474      0.136             (-0.09279, 0.01079)
 is_footb...w_weekly  -0.09197  0.04712      0.042         *   (-0.186, -0.009302)
 is_footb...w_weekly    0.9142  0.05081     <2e-16       ***        (0.8214, 1.02)
 is_footb...w_weekly  -0.03426  0.02585      0.188             (-0.08413, 0.01897)
 is_footb...w_weekly    0.1931  0.02646     <2e-16       ***      (0.1427, 0.2506)
 is_footb...w_weekly  -0.05447  0.02263      0.016         *   (-0.0981, -0.01213)
 is_footb...w_weekly   0.02137  0.02741      0.404             (-0.03339, 0.07348)
 is_footb...w_weekly   0.05447  0.02263      0.016         *     (0.01213, 0.0981)
 is_footb...w_weekly   0.02137  0.02741      0.404             (-0.03339, 0.07348)
 is_footb...w_weekly   0.03426  0.02585      0.188             (-0.01897, 0.08413)
 is_footb...w_weekly    0.1931  0.02646     <2e-16       ***      (0.1427, 0.2506)
  is_football_season    0.5981   0.1044     <2e-16       ***      (0.4241, 0.8188)
                 ct1    -2.117    0.285     <2e-16       ***      (-2.628, -1.523)
      is_weekend:ct1   -0.4372   0.1675      0.006        **    (-0.7738, -0.1183)
   str_dow_2-Tue:ct1   -0.2769   0.2137      0.178               (-0.7282, 0.1123)
   str_dow_3-Wed:ct1   -0.2551   0.1491      0.080         .    (-0.5607, 0.03082)
   str_dow_4-Thu:ct1   -0.2482   0.1348      0.066         .    (-0.4973, 0.01302)
   str_dow_5-Fri:ct1  -0.02465   0.1309      0.856               (-0.3105, 0.1918)
   str_dow_6-Sat:ct1  -0.07545   0.1178      0.498               (-0.3212, 0.1578)
   str_dow_7-Sun:ct1   -0.3618   0.2072      0.078         .    (-0.7568, 0.05462)
   cp0_2008_07_21_00    0.1117   0.1266      0.364               (-0.1048, 0.3941)
 is_weeke...07_21_00 -0.002811  0.07916      0.988               (-0.1496, 0.1557)
 str_dow_...07_21_00   0.01496  0.09583      0.870               (-0.1667, 0.2074)
 str_dow_...07_21_00   0.05581  0.06694      0.424              (-0.07008, 0.1876)
 str_dow_...07_21_00  -0.01644  0.07454      0.854               (-0.1547, 0.1315)
 str_dow_...07_21_00   0.08184  0.08289      0.340              (-0.06993, 0.2495)
 str_dow_...07_21_00  -0.02617  0.07744      0.774               (-0.1654, 0.1283)
 str_dow_...07_21_00   0.02335    0.079      0.766               (-0.1303, 0.1826)
   cp1_2008_11_10_00     3.172   0.2002     <2e-16       ***        (2.706, 3.517)
 is_weeke...11_10_00    0.7443   0.1193     <2e-16       ***      (0.4793, 0.9536)
 str_dow_...11_10_00    0.4904   0.1318     <2e-16       ***      (0.2271, 0.7417)
 str_dow_...11_10_00     0.412  0.09273     <2e-16       ***      (0.2218, 0.5815)
 str_dow_...11_10_00    0.3469  0.08354     <2e-16       ***      (0.1704, 0.4985)
 str_dow_...11_10_00    0.4074   0.1023     <2e-16       ***      (0.1928, 0.5988)
 str_dow_...11_10_00    0.3276   0.0856     <2e-16       ***      (0.1663, 0.4962)
 str_dow_...11_10_00    0.4169   0.1224     <2e-16       ***      (0.1684, 0.6503)
   cp2_2009_03_09_00     2.362   0.2131     <2e-16       ***         (1.91, 2.757)
 is_weeke...03_09_00    0.6354   0.1435     <2e-16       ***      (0.3543, 0.9314)
 str_dow_...03_09_00    0.3603    0.148      0.012         *       (0.048, 0.6067)
 str_dow_...03_09_00    0.2216   0.1158      0.052         .    (0.007104, 0.4413)
 str_dow_...03_09_00    0.2248  0.09759      0.026         *     (0.02929, 0.4213)
 str_dow_...03_09_00    0.2746   0.1179      0.014         *      (0.05103, 0.505)
 str_dow_...03_09_00    0.2981   0.1106      0.004        **     (0.08758, 0.5249)
 str_dow_...03_09_00    0.3374   0.1563      0.030         *     (0.03026, 0.6406)
   cp3_2009_10_19_00    -1.463   0.2086     <2e-16       ***     (-1.808, -0.9781)
 is_weeke...10_19_00   -0.4489   0.1364     <2e-16       ***    (-0.6984, -0.1803)
 str_dow_...10_19_00   -0.2825   0.1638      0.082         .     (-0.5954, 0.0295)
 str_dow_...10_19_00    -0.287   0.1329      0.028         *   (-0.5421, -0.02853)
 str_dow_...10_19_00   -0.1547   0.1172      0.192              (-0.3676, 0.08615)
 str_dow_...10_19_00   -0.2625   0.1499      0.086         .     (-0.5702, 0.0462)
 str_dow_...10_19_00   -0.2079   0.1182      0.070         .    (-0.4336, 0.01176)
 str_dow_...10_19_00   -0.2411   0.1433      0.094         .    (-0.4908, 0.04551)
   cp4_2010_02_15_00    -1.792   0.2171     <2e-16       ***      (-2.163, -1.327)
 is_weeke...02_15_00   -0.5726   0.1588     <2e-16       ***    (-0.8717, -0.2709)
 str_dow_...02_15_00   -0.3067   0.1687      0.060         .    (-0.6298, 0.02038)
 str_dow_...02_15_00   -0.2516   0.1415      0.080         .   (-0.5465, -0.01375)
 str_dow_...02_15_00   -0.1482   0.1294      0.276              (-0.3867, 0.09427)
 str_dow_...02_15_00   -0.3762   0.1683      0.034         *   (-0.7606, -0.08947)
 str_dow_...02_15_00   -0.2885   0.1466      0.044         *   (-0.5752, -0.02825)
 str_dow_...02_15_00   -0.2842   0.1737      0.108               (-0.6055, 0.0595)
   cp5_2010_06_07_00   -0.1796    0.182      0.330               (-0.589, 0.09932)
 is_weeke...06_07_00  -0.07618   0.1321      0.580               (-0.3625, 0.1548)
 str_dow_...06_07_00   -0.1179   0.1393      0.396                 (-0.3809, 0.16)
 str_dow_...06_07_00    0.1038   0.1115      0.356               (-0.1362, 0.3088)
 str_dow_...06_07_00    0.0836   0.1041      0.414               (-0.1252, 0.2877)
 str_dow_...06_07_00   -0.1792   0.1364      0.192               (-0.4347, 0.1001)
 str_dow_...06_07_00   -0.0356   0.1232      0.766               (-0.2859, 0.2158)
 str_dow_...06_07_00  -0.04061   0.1396      0.770               (-0.3139, 0.2301)
   cp6_2011_01_24_00     1.251   0.2431     <2e-16       ***       (0.8586, 1.827)
 is_weeke...01_24_00    0.3124   0.1889      0.092         .    (-0.06698, 0.6609)
 str_dow_...01_24_00    0.1816    0.211      0.404                (-0.2145, 0.585)
 str_dow_...01_24_00    0.1857   0.1503      0.204               (-0.1166, 0.4878)
 str_dow_...01_24_00    0.1055   0.1422      0.440               (-0.1748, 0.3998)
 str_dow_...01_24_00   0.07184   0.1938      0.704               (-0.2954, 0.4766)
 str_dow_...01_24_00    0.1115   0.1513      0.466               (-0.1454, 0.4501)
 str_dow_...01_24_00    0.2009   0.2251      0.382               (-0.2665, 0.6036)
   cp7_2011_05_16_00     1.529   0.2107     <2e-16       ***         (1.152, 1.97)
 is_weeke...05_16_00    0.3458   0.1712      0.040         *     (0.04088, 0.6884)
 str_dow_...05_16_00    0.2949   0.1443      0.038         *     (0.02642, 0.5849)
 str_dow_...05_16_00    0.2188   0.1518      0.118              (-0.05091, 0.5417)
 str_dow_...05_16_00    0.1317   0.1462      0.352               (-0.1518, 0.4392)
 str_dow_...05_16_00    0.2202   0.1788      0.228               (-0.1392, 0.5483)
 str_dow_...05_16_00    0.1008    0.189      0.586               (-0.2613, 0.4798)
 str_dow_...05_16_00    0.2451   0.1654      0.148              (-0.07212, 0.6045)
   cp8_2012_01_02_00   -0.8736   0.2911      0.002        **     (-1.575, -0.4698)
 is_weeke...01_02_00   0.02601   0.1751      0.884               (-0.3734, 0.3009)
 str_dow_...01_02_00   -0.1949   0.2216      0.388               (-0.6496, 0.1989)
 str_dow_...01_02_00    -0.285   0.1727      0.102              (-0.6301, 0.04926)
 str_dow_...01_02_00   -0.2876   0.1706      0.090         .     (-0.6433, 0.0311)
 str_dow_...01_02_00    0.1976   0.2349      0.404               (-0.3218, 0.6008)
 str_dow_...01_02_00  0.005263    0.173      0.974               (-0.4178, 0.2737)
 str_dow_...01_02_00   0.02075   0.1818      0.918               (-0.3508, 0.3517)
   cp9_2012_04_23_00    -2.969   0.2547     <2e-16       ***      (-3.386, -2.416)
 is_weeke...04_23_00    -0.509   0.1882      0.010         *    (-0.8878, -0.1708)
 str_dow_...04_23_00   -0.5006   0.2089      0.020         *    (-0.9622, -0.1141)
 str_dow_...04_23_00   -0.5005   0.1885      0.006        **    (-0.8387, -0.1182)
 str_dow_...04_23_00   -0.3736   0.1683      0.032         *    (-0.725, -0.05311)
 str_dow_...04_23_00   -0.3995   0.1879      0.034         *   (-0.7663, -0.03901)
 str_dow_...04_23_00   -0.2675   0.1855      0.148              (-0.6404, 0.08221)
 str_dow_...04_23_00   -0.2417    0.192      0.206               (-0.6434, 0.1058)
  cp10_2012_08_13_00   -0.2946   0.2684      0.278               (-0.8076, 0.2358)
 is_weeke...08_13_00   -0.3315   0.1438      0.026         *   (-0.5825, -0.00679)
 str_dow_...08_13_00  -0.03862   0.1951      0.822               (-0.4005, 0.3088)
 str_dow_...08_13_00    0.2506   0.1229      0.038         *    (-0.005779, 0.469)
 str_dow_...08_13_00     0.192   0.1586      0.208               (-0.1444, 0.4852)
 str_dow_...08_13_00   -0.1648   0.1579      0.304                 (-0.47, 0.1457)
 str_dow_...08_13_00  0.006809   0.1124      0.960               (-0.1899, 0.2444)
 str_dow_...08_13_00   -0.3383   0.1583      0.036         *   (-0.6588, -0.05061)
  cp11_2013_04_01_00     1.877    0.181     <2e-16       ***        (1.543, 2.218)
 is_weeke...04_01_00    0.2287   0.1776      0.190               (-0.1319, 0.5678)
 str_dow_...04_01_00     0.399   0.2012      0.044         *     (0.01422, 0.7964)
 str_dow_...04_01_00    0.2594    0.168      0.118              (-0.07879, 0.5584)
 str_dow_...04_01_00    0.2361   0.1742      0.160               (-0.1069, 0.5672)
 str_dow_...04_01_00    0.2293   0.1958      0.238               (-0.1503, 0.5751)
 str_dow_...04_01_00   0.09426   0.1949      0.614               (-0.2762, 0.4438)
 str_dow_...04_01_00    0.1346   0.2437      0.568               (-0.3602, 0.6175)
  cp12_2014_03_10_00   -0.9373   0.1493     <2e-16       ***     (-1.238, -0.6249)
 is_weeke...03_10_00  -0.06963   0.1202      0.570               (-0.2942, 0.1622)
 str_dow_...03_10_00  0.001893   0.2204      0.998                (-0.4345, 0.436)
 str_dow_...03_10_00   -0.1731   0.1169      0.118              (-0.4055, 0.04797)
 str_dow_...03_10_00  -0.09398   0.1387      0.500               (-0.3744, 0.1924)
 str_dow_...03_10_00  -0.02964    0.154      0.826                (-0.333, 0.2593)
 str_dow_...03_10_00   -0.1086   0.1425      0.460               (-0.4016, 0.1651)
 str_dow_...03_10_00    0.0388   0.1967      0.828                (-0.3183, 0.458)
 ct1:sin1_tow_weekly   -0.1056   0.1301      0.418                (-0.3388, 0.159)
 ct1:cos1_tow_weekly   -0.5032    0.257      0.058         .    (-1.005, 0.007495)
 ct1:sin2_tow_weekly    0.1719    0.156      0.284               (-0.1251, 0.4727)
 ct1:cos2_tow_weekly   -0.3191   0.2371      0.174               (-0.8111, 0.1019)
 cp0_2008...w_weekly    0.0158  0.05883      0.782               (-0.1044, 0.1281)
 cp0_2008...w_weekly  -0.03324   0.1124      0.774                (-0.264, 0.1869)
 cp0_2008...w_weekly   0.01693  0.06343      0.810               (-0.1122, 0.1357)
 cp0_2008...w_weekly -0.008468  0.09763      0.920               (-0.2084, 0.1844)
 cp1_2008...w_weekly   0.05859  0.09183      0.526               (-0.1226, 0.2301)
 cp1_2008...w_weekly    0.2615    0.142      0.080         .     (-0.02147, 0.535)
 cp1_2008...w_weekly    0.0423   0.1016      0.662               (-0.1524, 0.2394)
 cp1_2008...w_weekly    0.1977    0.126      0.116              (-0.03261, 0.4424)
 cp2_2009...w_weekly  -0.03987   0.1075      0.706               (-0.2437, 0.1526)
 cp2_2009...w_weekly    0.2728   0.1564      0.096         .    (-0.02514, 0.5741)
 cp2_2009...w_weekly   0.04848   0.1259      0.724                (-0.1947, 0.289)
 cp2_2009...w_weekly    0.1765    0.141      0.202              (-0.07455, 0.4724)
 cp3_2009...w_weekly   -0.0325   0.1119      0.812               (-0.2583, 0.1706)
 cp3_2009...w_weekly   0.06844   0.1906      0.690                (-0.304, 0.4254)
 cp3_2009...w_weekly  -0.04635   0.1235      0.740               (-0.2914, 0.1904)
 cp3_2009...w_weekly     0.144   0.1728      0.412               (-0.1774, 0.4662)
 cp4_2010...w_weekly   0.05979   0.1253      0.610               (-0.1988, 0.2956)
 cp4_2010...w_weekly    0.0446   0.1882      0.802               (-0.3071, 0.3828)
 cp4_2010...w_weekly   -0.1108   0.1334      0.408                (-0.3658, 0.132)
 cp4_2010...w_weekly   0.08032   0.1699      0.656               (-0.2465, 0.4517)
 cp5_2010...w_weekly   0.09711   0.1071      0.358               (-0.1107, 0.2949)
 cp5_2010...w_weekly  -0.01161   0.1456      0.950               (-0.2718, 0.2806)
 cp5_2010...w_weekly   -0.1749    0.112      0.130              (-0.3775, 0.05756)
 cp5_2010...w_weekly  -0.04203    0.134      0.740                (-0.311, 0.2236)
 cp6_2011...w_weekly   0.03701   0.1482      0.834               (-0.2606, 0.3216)
 cp6_2011...w_weekly    0.2149   0.2012      0.286               (-0.2081, 0.5601)
 cp6_2011...w_weekly  -0.03962   0.1745      0.834               (-0.3921, 0.3236)
 cp6_2011...w_weekly   0.08036   0.1734      0.630                (-0.2886, 0.422)
 cp7_2011...w_weekly   0.05963   0.1414      0.666               (-0.2344, 0.3253)
 cp7_2011...w_weekly    0.1416    0.173      0.440               (-0.1877, 0.4854)
 cp7_2011...w_weekly   0.03422   0.1386      0.784               (-0.2292, 0.3075)
 cp7_2011...w_weekly   0.06865   0.1534      0.670               (-0.2528, 0.3644)
 cp8_2012...w_weekly   -0.3396   0.1602      0.034         *  (-0.6181, -0.007229)
 cp8_2012...w_weekly    -0.156   0.2333      0.496               (-0.5791, 0.2966)
 cp8_2012...w_weekly    0.1509    0.193      0.432               (-0.2487, 0.4899)
 cp8_2012...w_weekly  -0.05014   0.2203      0.822                (-0.4758, 0.399)
 cp9_2012...w_weekly   -0.2153   0.1581      0.174                 (-0.48, 0.1292)
 cp9_2012...w_weekly    -0.152   0.2022      0.456               (-0.5466, 0.2385)
 cp9_2012...w_weekly  -0.08811   0.1632      0.588                (-0.4303, 0.216)
 cp9_2012...w_weekly   -0.1654   0.1889      0.408               (-0.5163, 0.1673)
 cp10_201...w_weekly    0.3213   0.1272      0.010         *     (0.04289, 0.5535)
 cp10_201...w_weekly   -0.2737   0.1855      0.142              (-0.6226, 0.09432)
 cp10_201...w_weekly  -0.04732   0.1548      0.756               (-0.3566, 0.2417)
 cp10_201...w_weekly    -0.176   0.1709      0.302               (-0.5399, 0.1328)
 cp11_201...w_weekly    0.1906   0.1453      0.214               (-0.0789, 0.4589)
 cp11_201...w_weekly    0.1923   0.2193      0.374               (-0.2231, 0.6189)
 cp11_201...w_weekly   0.09299   0.1854      0.614               (-0.2322, 0.4541)
 cp11_201...w_weekly       0.2   0.1961      0.308               (-0.1599, 0.5904)
 cp12_201...w_weekly  -0.06166  0.09664      0.504               (-0.2474, 0.1273)
 cp12_201...w_weekly   -0.1988   0.2364      0.388               (-0.6842, 0.2374)
 cp12_201...w_weekly   0.02174   0.1261      0.848               (-0.2411, 0.2402)
 cp12_201...w_weekly   -0.2141   0.2141      0.302               (-0.6592, 0.1718)
     sin1_tow_weekly     0.114  0.02721     <2e-16       ***     (0.06092, 0.1676)
     cos1_tow_weekly   0.01608  0.04073      0.694             (-0.06422, 0.09005)
     sin2_tow_weekly  -0.01573  0.01886      0.406             (-0.05346, 0.02103)
     cos2_tow_weekly   0.03913  0.02649      0.122             (-0.01114, 0.09544)
     sin3_tow_weekly -0.007762  0.01414      0.562             (-0.03547, 0.02233)
     cos3_tow_weekly 0.0002049  0.02446      0.994             (-0.04767, 0.04941)
     sin4_tow_weekly  0.007762  0.01414      0.562             (-0.02233, 0.03547)
     cos4_tow_weekly 0.0002049  0.02446      0.994             (-0.04767, 0.04941)
     sin5_tow_weekly   0.01573  0.01886      0.406             (-0.02103, 0.05346)
     cos5_tow_weekly   0.03913  0.02649      0.122             (-0.01114, 0.09544)
     sin1_ct1_yearly  -0.05446  0.04572      0.226               (-0.1361, 0.0476)
     cos1_ct1_yearly    0.7377   0.1243     <2e-16       ***       (0.476, 0.9413)
     sin2_ct1_yearly    0.2229  0.03067     <2e-16       ***      (0.1683, 0.2889)
     cos2_ct1_yearly   -0.3147  0.03203     <2e-16       ***    (-0.3857, -0.2609)
     sin3_ct1_yearly    0.3336  0.03639     <2e-16       ***      (0.2561, 0.3987)
     cos3_ct1_yearly  0.004652  0.03204      0.888             (-0.05324, 0.07006)
     sin4_ct1_yearly   -0.1049  0.03367     <2e-16       ***   (-0.1757, -0.04302)
     cos4_ct1_yearly    -0.159  0.02764     <2e-16       ***    (-0.2149, -0.1088)
     sin5_ct1_yearly    -0.117  0.02836     <2e-16       ***    (-0.172, -0.05779)
     cos5_ct1_yearly  -0.03623  0.02172      0.090         .  (-0.07745, 0.006865)
     sin6_ct1_yearly   -0.1125  0.03349     <2e-16       ***    (-0.174, -0.04592)
     cos6_ct1_yearly   -0.0199   0.0245      0.398             (-0.06665, 0.03048)
     sin7_ct1_yearly  -0.09129  0.02344     <2e-16       ***   (-0.1396, -0.04828)
     cos7_ct1_yearly   0.05714  0.02414      0.022         *     (0.008218, 0.104)
     sin8_ct1_yearly     0.026  0.02678      0.334             (-0.02517, 0.07548)
     cos8_ct1_yearly    0.1237  0.02634     <2e-16       ***     (0.06881, 0.1727)
     sin9_ct1_yearly  -0.01293  0.02506      0.598             (-0.06246, 0.03532)
     cos9_ct1_yearly  -0.07656  0.02542      0.002        **   (-0.1282, -0.02875)
    sin10_ct1_yearly   -0.1261  0.02458     <2e-16       ***   (-0.1756, -0.07837)
    cos10_ct1_yearly   -0.0503  0.02405      0.026         * (-0.09053, 0.0007198)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.7515,   Adjusted R-squared: 0.7388
F-statistic: 57.652 on 144 and 2818 DF,   p-value: 1.110e-16
Model AIC: 18794.0,   model BIC: 19665.0

WARNING: the condition number is large, 2.60e+05. This might indicate that there are strong multicollinearity or other numerical problems.
WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.

The model summary shows the model information, the coefficients and their significance, and a few summary statistics. For example, we can see the changepoints and how much the growth rate changes at each changepoint. We can see that some of the holidays have significant effect in the model, such as Christmas, Labor day, Thanksgiving, etc. We can see the significance of the interaction between football season and weekly seasonality etc.

For a more detailed guide on model summary, see Model Summary.

Summary in model tuning

After the example, you may have some sense about how to select parameters and tune the model. Here we list a few steps and tricks that might help select the best models. What you may do:

  1. Detect anomaly points with the overlay plots (plot_quantiles_and_overlays). Mask these points with NA. Do not specify the adjustment unless you are confident about how to correct the anomalies.

  2. Choose an appropriate way to model the growth (linear, quadratic, square root, etc.) If none of the typical growth shape fits the time series, you might consider linear growth with trend changepoints. Try different changepoint detection configurations. You may also plot the detected changepoints and see if it makes sense to you. The template also supports custom changepoints. If the automatic changepoint detection result does not make sense to you, you might supply your own changepoints.

  3. Choose the appropriate seasonality orders. The higher the order, the more details the model can learn. However, too large orders could overfit the training data. These can also be detected from the overlay plots (plot_quantiles_and_overlays). There isn’t a unified way to choose seasonality, so explore different seasonality orders and compare the results.

  4. Consider what events and holidays to model. Are there any custom events we need to add? If you add a custom event, remember also adding the dates for the event in the forecast period.

  5. Add external regressors that could be related to the time series. Note that you will need to provide the values of the regressors in the forecast period as well. You may use another time series as a regressor, as long as you have a ground truth/good forecast for it that covers your forecast period.

  6. Adding interaction terms. Let’s mention again here that there could be interaction between two features if the behaviors of one feature are different when the other feature have different values. Try to detect this through the overlay plot (plot_quantiles_and_overlays), too. By default, we have a few pre-defined interaction terms, see feature_sets_enabled.

  7. Choose an appropriate fit algorithm. This is the algorithm that models the relationship between the features and the time series. See a full list of available algorithms at fit_algorithm. If you are unsure about their difference, try some of them and compare the results. If you don’t want to, choosing “ridge” is a safe option.

It is worth noting that the template supports automatic grid search with different sets of parameters. For each parameter, if you provide the configuration in a list, it will automatically run each combination and choose the one with the best cross-validation performance. This will save a lot of time. For details, see Grid Search.

Follow your insights and intuitions, and play with the parameters, you will get good forecasts!

Total running time of the script: ( 2 minutes 50.271 seconds)

Gallery generated by Sphinx-Gallery