Greykite Library

Greykite Info

  • Overview

Quick Start

  • Quickstart
    • Simple Forecast
    • Changepoint Detection
    • Seasonality
    • Model Summary
    • Grid Search
      • Grid search hyperparameters
      • Selective grid search
        • Setting hyperparameter_budget
        • Utilizing hyperparameter_override
  • Tutorials

Step by Step

  • Forecasting Process
  • Choose a Model
  • Choose a Model Template
  • Examine Input Data
  • Configure a Forecast
  • Check Forecast Result
  • Debugging

Tuning the Model Components

  • Greykite models and components
  • Growth
  • Seasonality
  • Holidays and Events
  • Changepoints
  • Custom Parameters
  • Regressors
  • Auto-regression
  • Uncertainty Intervals
  • Pre-processing, Selective Grid Search

Benchmarking

  • Benchmarking

Changelog

  • Changelog

API Reference

  • Docs
Greykite Library
  • Docs »
  • Quickstart »
  • Grid Search

Note

Click here to download the full example code

Grid Search¶

Forecast models have many hyperparameters that could significantly affect the accuracy. These hyperparameters control different components in the model including trend, seasonality, events, etc. You can learn more about how to configure the components or hyperparameters in model tuning tutorial. Here we will see a step-by-step example of how to utilize the “grid search” functionality to choose the best set of hyperparameters.

All model templates support grid search. Here we continue the model tuning tutorial example to use the SILVERKITE model on the Peyton Manning data set. The mechanism of using grid search in PROPHET is similar.

19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
 import warnings
 warnings.filterwarnings("ignore")

 from greykite.common.data_loader import DataLoader
 from greykite.common.evaluation import EvaluationMetricEnum
 from greykite.framework.templates.autogen.forecast_config import ComputationParam
 from greykite.framework.templates.autogen.forecast_config import EvaluationMetricParam
 from greykite.framework.templates.autogen.forecast_config import EvaluationPeriodParam
 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
 from greykite.framework.templates.autogen.forecast_config import ModelComponentsParam
 from greykite.framework.templates.forecaster import Forecaster
 from greykite.framework.utils.result_summary import summarize_grid_search_results

 # Loads dataset into pandas DataFrame
 dl = DataLoader()
 df = dl.load_peyton_manning()

Grid search hyperparameters¶

In model tuning tutorial we learned how the components affect the prediction and how to choose the potential candidate components. We also learned how to interpret the cross-validation results for one set of hyperparameters. In this section, we will go over the grid_search functionality that allows us to compare different sets of hyperparameters by running cross-validation on them automatically.

In the ModelComponentsParam class, each attribute contains a dictionary mapping parameter names to parameter values. You may specify either a specific parameter value to use, or a list of values to explore via grid search. Grid search is done over every possible combination of hyperparameters across the lists.

Note

You may only provide lists for these attributes’ parameter values, not for the parameter values of these attributes’ parameter values if they are dictionaries. For example, seasonality is an attribute in ModelComponentsParam, which has parameter names yearly_seasonality, quarterly_seasonality, etc. We can provide lists for the parameter values of these names. On the other hand, changepoints is an attribute, too, which has parameter names changepoints_dict and seasonality_changepoints_dict. Both names take dictionaries as their parameter values. We can provide lists of dictionaries as the values, however, within each dictionary, we are not allowed to further wrap parameters in lists.

Cross-validation will be performed over these sets of hyperparameters, and the best set of hyperparameters will be selected based on the metric you pick, specified by cv_selection_metric in EvaluationMetricParam.

Now consider that we want to compare different yearly seasonalities (10 or 20), trend changepoints (None or “auto”) and fit algorithms (linear or ridge), while keeping all other model components the same. We could specify:

 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
 seasonality = {
     "yearly_seasonality": [10, 20],  # yearly seasonality could be 10 or 20
     "quarterly_seasonality": False,
     "monthly_seasonality": False,
     "weekly_seasonality": False,
     "daily_seasonality": False
 }

 changepoints = {
     # Changepoints could be None or auto.
     "changepoints_dict": [
         None,
         {"method": "auto"}
     ]
 }

 # Specifies custom parameters
 custom = {
     "fit_algorithm_dict": [
         {"fit_algorithm": "ridge"},
         {"fit_algorithm": "linear", "fit_algorithm_params": dict(missing="drop")}
     ]
 }

 # Specifies the model components
 # Could leave the other components as default,
 # or specify them in the normal way.
 model_components = ModelComponentsParam(
     seasonality=seasonality,
     changepoints=changepoints,
     custom=custom
 )

 # Specifies the metrics
 evaluation_metric = EvaluationMetricParam(
     # The metrics in ``cv_report_metrics`` will be calculated and reported.
     cv_report_metrics=[EvaluationMetricEnum.MeanAbsolutePercentError.name,
                        EvaluationMetricEnum.MeanSquaredError.name],
     # The ``cv_selection_metric`` will be used to select the best set of hyperparameters.
     # It will be added to ``cv_report_metrics`` if it's not there.
     cv_selection_metric=EvaluationMetricEnum.MeanAbsolutePercentError.name
 )

 # Specifies the forecast configuration.
 # You could also specify ``forecast_horizon``, ``metadata_param``, etc.
 config = ForecastConfig(
     model_components_param=model_components,
     evaluation_metric_param=evaluation_metric
 )

For the configuration above, all other model components parameters are the same but yearly seasonality, changepoints and fit algorithm have 2 options each. The model will automatically run cross-validation over the 8 cases:

  • yearly seasonality = 10, no changepoints, fit algorithm = “linear”.

  • yearly seasonality = 20, no changepoints, fit algorithm = “linear”.

  • yearly seasonality = 10, automatic changepoints, fit algorithm = “linear”.

  • yearly seasonality = 20, automatic changepoints, fit algorithm = “linear”.

  • yearly seasonality = 10, no changepoints, fit algorithm = “ridge”.

  • yearly seasonality = 20, no changepoints, fit algorithm = “ridge”.

  • yearly seasonality = 10, automatic changepoints, fit algorithm = “ridge”.

  • yearly seasonality = 20, automatic changepoints, fit algorithm = “ridge”.

The CV test scores will be reported for all 8 cases using the metrics in cv_report_metrics, and the final model will be trained on the best set of hyperparameters according to the cv_selection_metric.

Selective grid search¶

Consider the case when you have 6 model components to tune, each with 3 different candidates. In this case, there will be 3^6=729 different sets of hyperparameters to grid search from. The results might be convincing because of the exhaustive grid search, however, the running time is going to pile up.

It’s very common that not all of the 729 sets of hyperparameters makes sense to us, so it would be good not to run all of them. There are two ways to do selective grid search:

  • Setting hyperparameter_budget.

  • Utilizing hyperparameter_override.

Setting hyperparameter_budget¶

The hyperparameter_budget parameter directly controls how many sets of hyperparameters will be used in grid search. If this number is less than the number of all possible sets of hyperparameters, the algorithm will randomly pick hyperparameter_budget number of hyperparameter sets. Set hyperparameter_budget to -1 to search all possible sets. You may set the budget in the ComputationParam class. This is a simple way to search a large space of hyperparameters if you are not sure which are likely to succeed. After you identify parameter values with better performance, you may run a more precise grid search to fine tune around these values.

Note

If you have a small number of timeseries to forecast, we recommend using the model tuning tutorial to help identify good parameters candidates. This is likely more effective than random grid search over a large grid.

168
169
170
171
172
173
174
175
176
177
178
179
 # Specifies the hyperparameter_budget.
 # Randomly picks 3 sets of hyperparameters.
 computation = ComputationParam(
     hyperparameter_budget=3
 )
 # Specifies the forecast configuration.
 # You could also specify ``forecast_horizon``, ``metadata_param``, etc.
 config = ForecastConfig(
     model_components_param=model_components,
     evaluation_metric_param=evaluation_metric,
     computation_param=computation
 )

Utilizing hyperparameter_override¶

The hyperparameter_override functionality allows us to customize the sets of hyperparameters to search within. The way is to specify the hyperparameter_override parameter in the ModelComponentsParam class. First, model components are translated to the parameters in the corresponding sklearn Estimator for the template (SimpleSilverkiteEstimator and ProphetEstimator). The name is usually the same as the key, for example, “estimator__yearly_seasonality” and “estimator__fit_algorithm_dict” (the ModelComponentsParam attribute is ignored). This creates a default hyperparameter_grid dictionary. Then for each dict in hyperparameter_override, the default grid’s values are replaced by the override values, producing a list of customized grids to search over. Grid search done across all the grids in the list. For more details, see hyperparameter override. Now assume we have the following parameter options, as above:

  • yearly seasonality orders: 10 and 20.

  • trend changepoints: None and “auto”.

  • fit algorithm: linear and ridge.

We do not want to run all 8 sets of hyperparameters. For example, we think that ridge is not needed for the model without changepoints because the model is simple, while linear should not be used when there are changepoints because the model is complex. So we want:

  • for no changepoints we use linear regression only.

  • for automatic changepoints we use ridge regression only.

Then we can specify:

211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
 seasonality = {
     "yearly_seasonality": [10, 20],
     "quarterly_seasonality": False,
     "monthly_seasonality": False,
     "weekly_seasonality": False,
     "daily_seasonality": False
 }

 changepoints = {
     "changepoints_dict": None
 }

 # Specifies custom parameters
 custom = {
     "fit_algorithm_dict": {"fit_algorithm": "linear"}
 }

 # Hyperparameter override can be a list of dictionaries.
 # Each dictionary will be one set of hyperparameters.
 override = [
     {},
     {
         "estimator__changepoints_dict": {"method": "auto"},
         "estimator__fit_algorithm_dict": {"fit_algorithm": "ridge"}
     }
 ]

 # Specifies the model components
 # Could leave the other components as default,
 # or specify them in the normal way.
 model_components = ModelComponentsParam(
     seasonality=seasonality,
     changepoints=changepoints,
     custom=custom,
     hyperparameter_override=override
 )

 # Specifies the evaluation period
 evaluation_period = EvaluationPeriodParam(
     test_horizon=365,             # leaves 365 days as testing data
     cv_horizon=365,               # each CV test size is 365 days (same as forecast horizon)
     cv_max_splits=3,              # 3 folds CV
     cv_min_train_periods=365 * 4  # uses at least 4 years for training because we have 8 years data
 )

 config = ForecastConfig(
     forecast_horizon=365,
     model_components_param=model_components,
     evaluation_metric_param=evaluation_metric,
     evaluation_period_param=evaluation_period
 )

The forecast configuration above specifies the yearly seasonality orders in a list, therefore, both 10 and 20 will be searched. For the hyperparameter override list, there are two elements. The first one is an empty dictionary, which corresponds to the original changepoint and fit algorithm in the configuration. The second dictionary overrides changepoint method with automatic changepoint detection and fit algorithm with ridge. In total, the model will run 4 different configurations:

  • yearly seasonality 10, no changepoint, fit algorithm linear.

  • yearly seasonality 20, no changepoint, fit algorithm linear.

  • yearly seasonality 10, automatic changepoints, fit algorithm ridge.

  • yearly seasonality 20, automatic changepoints, fit algorithm ridge.

In this way, we could only search the sets of hyperparameters we need and save a lot of time. Also note that the above configuration also configures the CV splits using EvaluationPeriodParam. We can see the configs and evaluations with summarize_grid_search_results.

281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
 # Runs the forecast
 forecaster = Forecaster()
 result = forecaster.run_forecast_config(
     df=df,
     config=config
 )

 # Summarizes the CV results
 cv_results = summarize_grid_search_results(
     grid_search=result.grid_search,
     decimals=1,
     # The below saves space in the printed output. Remove to show all available metrics and columns.
     cv_report_metrics=None,
     column_order=["rank", "mean_test", "split_test", "mean_train", "split_train", "mean_fit_time", "mean_score_time", "params"])
 cv_results["params"] = cv_results["params"].astype(str)
 cv_results.set_index("params", drop=True, inplace=True)
 cv_results

Out:

Fitting 3 folds for each of 4 candidates, totalling 12 fits
rank_test_MAPE mean_test_MAPE split_test_MAPE mean_train_MAPE split_train_MAPE mean_fit_time mean_score_time
params
[('estimator__yearly_seasonality', 10), ('estimator__fit_algorithm_dict', {'fit_algorithm': 'linear'}), ('estimator__changepoints_dict', None)] 2 7.3 (5.1, 8.5, 8.3) 4.3 (4.0, 4.3, 4.5) 1.5 0.6
[('estimator__yearly_seasonality', 20), ('estimator__fit_algorithm_dict', {'fit_algorithm': 'linear'}), ('estimator__changepoints_dict', None)] 1 7.3 (5.1, 8.5, 8.3) 4.2 (3.9, 4.2, 4.5) 1.5 0.6
[('estimator__yearly_seasonality', 10), ('estimator__fit_algorithm_dict', {'fit_algorithm': 'ridge'}), ('estimator__changepoints_dict', {'method': 'auto'})] 3 7.4 (5.0, 8.5, 8.5) 4.1 (3.9, 4.3, 4.0) 12.2 0.9
[('estimator__yearly_seasonality', 20), ('estimator__fit_algorithm_dict', {'fit_algorithm': 'ridge'}), ('estimator__changepoints_dict', {'method': 'auto'})] 4 7.5 (5.0, 8.5, 8.9) 4.0 (3.9, 4.2, 3.9) 15.1 1.0


Tip

The simple silverkite templates that use SimpleSilverkiteEstimator are the easiest templates to do grid search, because they support a list of model templates and a list of ModelComponentsParam. For more information, see Templates.

Total running time of the script: ( 2 minutes 2.659 seconds)

Download Python source code: 0500_grid_search.py

Download Jupyter notebook: 0500_grid_search.ipynb

Gallery generated by Sphinx-Gallery

Next Previous

© Copyright 2021, LinkedIn