Docs¶
All Templates¶
-
class
greykite.framework.templates.forecaster.
Forecaster
(model_template_enum: Type[enum.Enum] = <enum 'ModelTemplateEnum'>, default_model_template_name: str = 'SILVERKITE')[source]¶ The main entry point to create a forecast.
Call the
run_forecast_config
method to create a forecast. It takes a dataset and forecast configuration parameters.Notes
This class can create forecasts using any of the model templates in
ModelTemplateEnum
. Model templates provide suitable default values for the available forecast estimators depending on the data characteristics.The model template is selected via the
config.model_template
parameter torun_forecast_config
.To add your own custom algorithms or template classes in our framework, pass
model_template_enum
anddefault_model_template_name
to the constructor.-
model_template_enum
: Type[Enum]¶ The available template names. An Enum class where names are template names, and values are of type
ModelTemplate
.
-
default_model_template_name
: str¶ The default template name if not provided by
config.model_template
. Should be a name inmodel_template_enum
. Used by__get_template_class
.
-
template_class
: Optional[Type[TemplateInterface]]¶ Template class used. Must implement
TemplateInterface
and be one of the classes inself.model_template_enum
. Available for debugging purposes. Set byrun_forecast_config
.
-
template
: Optional[TemplateInterface]¶ Instance of
template_class
used to run the forecast. See the docstring of the specific template class used.Available for debugging purposes. Set by
run_forecast_config
.
-
config
: Optional[ForecastConfig]¶ ForecastConfig
passed to the template class. Set byrun_forecast_config
.
-
pipeline_params
: Optional[Dict]¶ Parameters used to call
forecast_pipeline
. Available for debugging purposes. Set byrun_forecast_config
.
-
forecast_result
: Optional[ForecastResult]¶ The forecast result, returned by
forecast_pipeline
. Set byrun_forecast_config
.
-
apply_forecast_config
(df: pandas.core.frame.DataFrame, config: Optional[greykite.framework.templates.autogen.forecast_config.ForecastConfig] = None) → Dict[source]¶ Fetches pipeline parameters from the
df
andconfig
, but does not run the pipeline to generate a forecast.run_forecast_config
calls this function and also runs the forecast pipeline.Available for debugging purposes to check pipeline parameters before running a forecast. Sets these attributes for debugging:
pipeline_params
: the parameters passed toforecast_pipeline
.template_class
,template
: the template class used to generate the pipeline parameters.config
: theForecastConfig
passed as input to template class, to translate into pipeline parameters.
Provides basic validation on the compatibility of
config.model_template
withconfig.model_components_param
.- Parameters
df (
pandas.DataFrame
) – Timeseries data to forecast. Contains columns [time_col, value_col], and optional regressor columns Regressor columns should include future values for predictionconfig (
ForecastConfig
or None) – Config object for template class to use. SeeForecastConfig
.
- Returns
pipeline_params – Input to
forecast_pipeline
.- Return type
dict [str, any]
-
run_forecast_config
(df: pandas.core.frame.DataFrame, config: Optional[greykite.framework.templates.autogen.forecast_config.ForecastConfig] = None) → greykite.framework.pipeline.pipeline.ForecastResult[source]¶ Creates a forecast from input data and config. The result is also stored as
self.forecast_result
.- Parameters
df (
pandas.DataFrame
) – Timeseries data to forecast. Contains columns [time_col, value_col], and optional regressor columns Regressor columns should include future values for predictionconfig (
ForecastConfig
) – Config object for template class to use. SeeForecastConfig
.
- Returns
forecast_result – Forecast result, an object of type
ForecastResult
.The output of
forecast_pipeline
, according to thedf
andconfig
configuration parameters.- Return type
-
run_forecast_json
(df: pandas.core.frame.DataFrame, json_str: str = '{}') → greykite.framework.pipeline.pipeline.ForecastResult[source]¶ Calls
forecast_pipeline
according to thejson_str
configuration parameters.- Parameters
df (
pandas.DataFrame
) – Timeseries data to forecast. Contains columns [time_col, value_col], and optional regressor columns Regressor columns should include future values for predictionjson_str (str) – Json string of the config object for Forecast to use. See
ForecastConfig
.
- Returns
forecast_result – Forecast result. The output of
forecast_pipeline
, called using the template class with specified configuration. SeeForecastResult
for details.- Return type
-
dump_forecast_result
(destination_dir, object_name='object', dump_design_info=True, overwrite_exist_dir=False)[source]¶ Dumps
self.forecast_result
to local pickle files.- Parameters
destination_dir (str) – The pickle destination directory.
object_name (str) – The stored file name.
dump_design_info (bool, default True) – Whether to dump design info. Design info is a patsy class that includes the design matrix information. It takes longer to dump design info.
overwrite_exist_dir (bool, default False) – What to do when
destination_dir
already exists. Removes the original directory when exists, if set to True.
- Returns
- Return type
This function writes to local files and does not return anything.
-
load_forecast_result
(source_dir, load_design_info=True)[source]¶ Loads
self.forecast_result
from local files created byself.dump_result
.- Parameters
source_dir (str) – The source file directory.
load_design_info (bool, default True) – Whether to load design info. Design info is a patsy class that includes the design matrix information. It takes longer to load design info.
-
-
class
greykite.framework.templates.model_templates.
ModelTemplate
(template_class: Type[greykite.framework.templates.template_interface.TemplateInterface], description: str)[source]¶ A model template consists of a template class, a description, and a name.
This class holds the template class and description. The model template name is the member name in
greykite.framework.templates.model_templates.ModelTemplateEnum
.-
template_class
: Type[greykite.framework.templates.template_interface.TemplateInterface]¶ A class that implements the template interface.
-
-
class
greykite.framework.templates.model_templates.
ModelTemplateEnum
(value)[source]¶ Available model templates.
Enumerates the possible values for the
model_template
attribute ofForecastConfig
.The value has type
ModelTemplate
which contains:the template class that recognizes the model_template. Template classes implement the
TemplateInterface
interface.a plain-text description of what the model_template is for,
The description should be unique across enum members. The template class can be shared, because a template class can recognize multiple model templates. For example, the same template class may use different default values for
ForecastConfig.model_components_param
depending onForecastConfig.model_template
.Notes
The template classes
SilverkiteTemplate
andProphetTemplate
recognize only the model templates explicitly enumerated here.However, the
SimpleSilverkiteTemplate
template class allows additional model templates to be specified generically. Any object of typeSimpleSilverkiteTemplateOptions
can be used as the model_template. These generic model templates are valid but not enumerated here.-
SILVERKITE
= ModelTemplate(template_class=<class 'greykite.framework.templates.simple_silverkite_template.SimpleSilverkiteTemplate'>, description='Silverkite model with automatic growth, seasonality, holidays, and interactions. Best for hourly and daily frequencies.Uses `SimpleSilverkiteEstimator`.')¶ Silverkite model with automatic growth, seasonality, holidays, and interactions. Best for hourly and daily frequencies. Uses SimpleSilverkiteEstimator.
-
SILVERKITE_WITH_AR
= ModelTemplate(template_class=<class 'greykite.framework.templates.simple_silverkite_template.SimpleSilverkiteTemplate'>, description='Has the same config as ``SILVERKITE`` except for adding autoregression. Best for short-term daily forecasts. Uses `SimpleSilverkiteEstimator`.')¶ Has the same config as
SILVERKITE
except for adding autoregression. Best for short-term daily forecasts. Uses SimpleSilverkiteEstimator.
-
SILVERKITE_DAILY_1_CONFIG_1
= ModelTemplate(template_class=<class 'greykite.framework.templates.simple_silverkite_template.SimpleSilverkiteTemplate'>, description='Config 1 in template ``SILVERKITE_DAILY_1``. Compared to ``SILVERKITE``, it adds change points and uses parameters specifically tuned for daily data and 1-day forecast.')¶ Config 1 in template
SILVERKITE_DAILY_1
. Compared toSILVERKITE
, it adds change points and uses parameters specifically tuned for daily data and 1-day forecast.
-
SILVERKITE_DAILY_1_CONFIG_2
= ModelTemplate(template_class=<class 'greykite.framework.templates.simple_silverkite_template.SimpleSilverkiteTemplate'>, description='Config 2 in template ``SILVERKITE_DAILY_1``. Compared to ``SILVERKITE``, it adds change points and uses parameters specifically tuned for daily data and 1-day forecast.')¶ Config 2 in template
SILVERKITE_DAILY_1
. Compared toSILVERKITE
, it adds change points and uses parameters specifically tuned for daily data and 1-day forecast.
-
SILVERKITE_DAILY_1_CONFIG_3
= ModelTemplate(template_class=<class 'greykite.framework.templates.simple_silverkite_template.SimpleSilverkiteTemplate'>, description='Config 3 in template ``SILVERKITE_DAILY_1``. Compared to ``SILVERKITE``, it adds change points and uses parameters specifically tuned for daily data and 1-day forecast.')¶ Config 3 in template
SILVERKITE_DAILY_1
. Compared toSILVERKITE
, it adds change points and uses parameters specifically tuned for daily data and 1-day forecast.
-
SILVERKITE_DAILY_1
= ModelTemplate(template_class=<class 'greykite.framework.templates.simple_silverkite_template.SimpleSilverkiteTemplate'>, description='Silverkite model specifically tuned for daily data and 1-day forecast. Contains 3 candidate configs for grid search, optimized the seasonality and changepoint parameters.')¶ Silverkite model specifically tuned for daily data and 1-day forecast. Contains 3 candidate configs for grid search, optimized the seasonality and changepoint parameters.
-
SILVERKITE_DAILY_90
= ModelTemplate(template_class=<class 'greykite.framework.templates.simple_silverkite_template.SimpleSilverkiteTemplate'>, description='Silverkite model specifically tuned for daily data with 90 days forecast horizon. Contains 4 hyperparameter combinations for grid search. Uses `SimpleSilverkiteEstimator`.')¶ Silverkite model specifically tuned for daily data with 90 days forecast horizon. Contains 4 hyperparameter combinations for grid search. Uses SimpleSilverkiteEstimator.
-
SILVERKITE_WEEKLY
= ModelTemplate(template_class=<class 'greykite.framework.templates.simple_silverkite_template.SimpleSilverkiteTemplate'>, description='Silverkite model specifically tuned for weekly data. Contains 4 hyperparameter combinations for grid search. Uses `SimpleSilverkiteEstimator`.')¶ Silverkite model specifically tuned for weekly data. Contains 4 hyperparameter combinations for grid search. Uses SimpleSilverkiteEstimator.
-
SILVERKITE_HOURLY_1
= ModelTemplate(template_class=<class 'greykite.framework.templates.simple_silverkite_template.SimpleSilverkiteTemplate'>, description='Silverkite model specifically tuned for hourly data with 1 hour forecast horizon. Contains 4 hyperparameter combinations for grid search. Uses `SimpleSilverkiteEstimator`.')¶ Silverkite model specifically tuned for hourly data with 1 hour forecast horizon. Contains 4 hyperparameter combinations for grid search. Uses SimpleSilverkiteEstimator.
-
SILVERKITE_HOURLY_24
= ModelTemplate(template_class=<class 'greykite.framework.templates.simple_silverkite_template.SimpleSilverkiteTemplate'>, description='Silverkite model specifically tuned for hourly data with 24 hours (1 day) forecast horizon. Contains 4 hyperparameter combinations for grid search. Uses `SimpleSilverkiteEstimator`.')¶ Silverkite model specifically tuned for hourly data with 24 hours (1 day) forecast horizon. Contains 4 hyperparameter combinations for grid search. Uses SimpleSilverkiteEstimator.
-
SILVERKITE_HOURLY_168
= ModelTemplate(template_class=<class 'greykite.framework.templates.simple_silverkite_template.SimpleSilverkiteTemplate'>, description='Silverkite model specifically tuned for hourly data with 168 hours (1 week) forecast horizon. Contains 4 hyperparameter combinations for grid search. Uses `SimpleSilverkiteEstimator`.')¶ Silverkite model specifically tuned for hourly data with 168 hours (1 week) forecast horizon. Contains 4 hyperparameter combinations for grid search. Uses SimpleSilverkiteEstimator.
-
SILVERKITE_HOURLY_336
= ModelTemplate(template_class=<class 'greykite.framework.templates.simple_silverkite_template.SimpleSilverkiteTemplate'>, description='Silverkite model specifically tuned for hourly data with 336 hours (2 weeks) forecast horizon. Contains 4 hyperparameter combinations for grid search. Uses `SimpleSilverkiteEstimator`.')¶ Silverkite model specifically tuned for hourly data with 336 hours (2 weeks) forecast horizon. Contains 4 hyperparameter combinations for grid search. Uses SimpleSilverkiteEstimator.
-
SILVERKITE_EMPTY
= ModelTemplate(template_class=<class 'greykite.framework.templates.simple_silverkite_template.SimpleSilverkiteTemplate'>, description='Silverkite model with no component included by default. Fits only a constant intercept. Select and customize this template to add only the terms you want. Uses `SimpleSilverkiteEstimator`.')¶ Silverkite model with no component included by default. Fits only a constant intercept. Select and customize this template to add only the terms you want. Uses SimpleSilverkiteEstimator.
-
SK
= ModelTemplate(template_class=<class 'greykite.framework.templates.silverkite_template.SilverkiteTemplate'>, description='Silverkite model with low-level interface. For flexible model tuning if SILVERKITE template is not flexible enough. Not for use out-of-the-box: customization is needed for good performance. Uses `SilverkiteEstimator`.')¶ Silverkite model with low-level interface. For flexible model tuning if SILVERKITE template is not flexible enough. Not for use out-of-the-box: customization is needed for good performance. Uses SilverkiteEstimator.
-
PROPHET
= ModelTemplate(template_class=<class 'greykite.framework.templates.prophet_template.ProphetTemplate'>, description='Prophet model with growth, seasonality, holidays, additional regressors and prediction intervals. Uses `ProphetEstimator`.')¶ Prophet model with growth, seasonality, holidays, additional regressors and prediction intervals. Uses ProphetEstimator.
-
AUTO_ARIMA
= ModelTemplate(template_class=<class 'greykite.framework.templates.auto_arima_template.AutoArimaTemplate'>, description='Auto ARIMA model with fit and prediction intervals. Uses `AutoArimaEstimator`.')¶ ARIMA model with automatic order selection. Uses AutoArimaEstimator.
-
SILVERKITE_TWO_STAGE
= ModelTemplate(template_class=<class 'greykite.framework.templates.silverkite_multistage_template.SilverkiteMultistageTemplate'>, description="SilverkiteMultistageTemplate's default model template. A two-stage model. The first step takes a longer history and learns the long-term effects, while the second step takes a shorter history and learns the short-term residuals.")¶ SilverkiteMultistage model’s default model template. A two-stage model. ” “The first step takes a longer history and learns the long-term effects, ” “while the second step takes a shorter history and learns the short-term residuals.
-
SILVERKITE_MULTISTAGE_EMPTY
= ModelTemplate(template_class=<class 'greykite.framework.templates.silverkite_multistage_template.SilverkiteMultistageTemplate'>, description='Empty configuration for Silverkite Multistage. All parameters will be exactly what user inputs. Not to be used without overriding.')¶ Empty configuration for Silverkite Multistage. All parameters will be exactly what user inputs. Not to be used without overriding.
-
class
greykite.framework.templates.autogen.forecast_config.
ForecastConfig
(computation_param: Optional[greykite.framework.templates.autogen.forecast_config.ComputationParam] = None, coverage: Optional[float] = None, evaluation_metric_param: Optional[greykite.framework.templates.autogen.forecast_config.EvaluationMetricParam] = None, evaluation_period_param: Optional[greykite.framework.templates.autogen.forecast_config.EvaluationPeriodParam] = None, forecast_horizon: Optional[int] = None, forecast_one_by_one: Optional[Union[bool, int, List[int]]] = None, metadata_param: Optional[greykite.framework.templates.autogen.forecast_config.MetadataParam] = None, model_components_param: Optional[Union[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, List[Optional[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam]]]] = None, model_template: Optional[Union[str, dataclasses.dataclass, List[Union[str, dataclasses.dataclass]]]] = None)[source]¶ Config for providing parameters to the Forecast library
-
computation_param
: Optional[greykite.framework.templates.autogen.forecast_config.ComputationParam] = None¶ How to compute the result. See
ComputationParam
.
-
coverage
: Optional[float] = None¶ Intended coverage of the prediction bands (0.0 to 1.0). If None, the upper/lower predictions are not returned.
-
evaluation_metric_param
: Optional[greykite.framework.templates.autogen.forecast_config.EvaluationMetricParam] = None¶ What metrics to evaluate. See
EvaluationMetricParam
.
-
evaluation_period_param
: Optional[greykite.framework.templates.autogen.forecast_config.EvaluationPeriodParam] = None¶ How to split data for evaluation. See
EvaluationPeriodParam
.
-
forecast_horizon
: Optional[int] = None¶ Number of periods to forecast into the future. Must be > 0. If None, default is determined from input data frequency.
-
forecast_one_by_one
: Optional[Union[bool, int, List[int]]] = None¶ The options to activate the forecast one-by-one algorithm. See
OneByOneEstimator
. Can be boolean, int, of list of int. If int, it has to be less than or equal to the forecast horizon. If list of int, the sum has to be the forecast horizon.
-
metadata_param
: Optional[greykite.framework.templates.autogen.forecast_config.MetadataParam] = None¶ Information about the input data. See
MetadataParam
.
-
model_components_param
: Optional[Union[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, List[Optional[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam]]]] = None¶ Parameters to tune the model. Typically a single ModelComponentsParam, but the SimpleSilverkiteTemplate template also allows a list of ModelComponentsParam for grid search. A single ModelComponentsParam corresponds to one grid, and a list corresponds to a list of grids. See
ModelComponentsParam
.
-
model_template
: Optional[Union[str, dataclasses.dataclass, List[Union[str, dataclasses.dataclass]]]] = None¶ Name of the model template. Typically a single string, but the SimpleSilverkiteTemplate template also allows a list of string for grid search. See
ModelTemplateEnum
for valid names.
-
-
class
greykite.framework.templates.autogen.forecast_config.
MetadataParam
(anomaly_info: Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] = None, date_format: Optional[str] = None, freq: Optional[str] = None, time_col: Optional[str] = None, train_end_date: Optional[str] = None, value_col: Optional[str] = None)[source]¶ Properties of the input data
-
anomaly_info
: Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] = None¶ Anomaly adjustment info. Anomalies in
df
are corrected before any forecasting is done. If None, no adjustments are made. Seeforecast_pipeline
.
-
train_end_date
: Optional[str] = None¶ Last date to use for fitting the model. Forecasts are generated after this date. If None, it is set to the last date with a non-null value in value_col df. See
forecast_pipeline
.
-
-
class
greykite.framework.templates.autogen.forecast_config.
EvaluationMetricParam
(agg_func: Optional[Callable] = None, agg_periods: Optional[int] = None, cv_report_metrics: Optional[Union[str, List[str]]] = None, cv_selection_metric: Optional[str] = None, null_model_params: Optional[Dict[str, Any]] = None, relative_error_tolerance: Optional[float] = None)[source]¶ What metrics to evaluate
-
agg_func
: Optional[Callable] = None¶ See
forecast_pipeline
.
-
agg_periods
: Optional[int] = None¶ See
forecast_pipeline
.
-
cv_report_metrics
: Optional[Union[str, List[str]]] = None¶ See score_func in
forecast_pipeline
.
-
cv_selection_metric
: Optional[str] = None¶ See score_func in
forecast_pipeline
.
-
null_model_params
: Optional[Dict[str, Any]] = None¶ See
forecast_pipeline
.
-
relative_error_tolerance
: Optional[float] = None¶ See
forecast_pipeline
.
-
-
class
greykite.framework.templates.autogen.forecast_config.
EvaluationPeriodParam
(cv_expanding_window: Optional[bool] = None, cv_horizon: Optional[int] = None, cv_max_splits: Optional[int] = None, cv_min_train_periods: Optional[int] = None, cv_periods_between_splits: Optional[int] = None, cv_periods_between_train_test: Optional[int] = None, cv_use_most_recent_splits: Optional[bool] = None, periods_between_train_test: Optional[int] = None, test_horizon: Optional[int] = None)[source]¶ How to split data for evaluation.
-
cv_expanding_window
: Optional[bool] = None¶ See
forecast_pipeline
.
-
cv_horizon
: Optional[int] = None¶ See
forecast_pipeline
.
-
cv_max_splits
: Optional[int] = None¶ See
forecast_pipeline
.
-
cv_min_train_periods
: Optional[int] = None¶ See
forecast_pipeline
.
-
cv_periods_between_splits
: Optional[int] = None¶ See
forecast_pipeline
.
-
cv_periods_between_train_test
: Optional[int] = None¶ See
forecast_pipeline
.
-
cv_use_most_recent_splits
: Optional[bool] = None¶ See
forecast_pipeline
.
-
test_horizon
: Optional[int] = None¶ See
forecast_pipeline
.
-
-
class
greykite.framework.templates.autogen.forecast_config.
ModelComponentsParam
(autoregression: Optional[Dict[str, Any]] = None, changepoints: Optional[Dict[str, Any]] = None, custom: Optional[Dict[str, Any]] = None, events: Optional[Dict[str, Any]] = None, growth: Optional[Dict[str, Any]] = None, hyperparameter_override: Optional[Union[Dict, List[Optional[Dict]]]] = None, regressors: Optional[Dict[str, Any]] = None, lagged_regressors: Optional[Dict[str, Any]] = None, seasonality: Optional[Dict[str, Any]] = None, uncertainty: Optional[Dict[str, Any]] = None)[source]¶ Parameters to tune the model.
-
autoregression
: Optional[Dict[str, Any]] = None¶ For modeling autoregression, see template for details
-
custom
: Optional[Dict[str, Any]] = None¶ Additional parameters used by template, see template for details
-
hyperparameter_override
: Optional[Union[Dict, List[Optional[Dict]]]] = None¶ After the above model components are used to create a hyperparameter grid, the result is updated by this dictionary, to create new keys or override existing ones. Allows for complete customization of the grid search.
-
-
class
greykite.framework.templates.autogen.forecast_config.
ComputationParam
(hyperparameter_budget: Optional[int] = None, n_jobs: Optional[int] = None, verbose: Optional[int] = None)[source]¶ How to compute the result.
-
hyperparameter_budget
: Optional[int] = None¶ See
forecast_pipeline
.
-
n_jobs
: Optional[int] = None¶ See
forecast_pipeline
.
-
verbose
: Optional[int] = None¶ See
forecast_pipeline
.
-
Silverkite Template¶
-
class
greykite.framework.templates.simple_silverkite_template.
SimpleSilverkiteTemplate
(constants: greykite.framework.templates.simple_silverkite_template_config.SimpleSilverkiteTemplateConstants = SimpleSilverkiteTemplateConstants(COMMON_MODELCOMPONENTPARAM_PARAMETERS={'SEAS': {'HOURLY': {'LT': {'yearly_seasonality': 8, 'quarterly_seasonality': 0, 'monthly_seasonality': 0, 'weekly_seasonality': 3, 'daily_seasonality': 5}, 'NM': {'yearly_seasonality': 15, 'quarterly_seasonality': 0, 'monthly_seasonality': 0, 'weekly_seasonality': 4, 'daily_seasonality': 8}, 'HV': {'yearly_seasonality': 25, 'quarterly_seasonality': 0, 'monthly_seasonality': 0, 'weekly_seasonality': 6, 'daily_seasonality': 12}, 'LTQM': {'yearly_seasonality': 8, 'quarterly_seasonality': 2, 'monthly_seasonality': 2, 'weekly_seasonality': 3, 'daily_seasonality': 5}, 'NMQM': {'yearly_seasonality': 15, 'quarterly_seasonality': 3, 'monthly_seasonality': 3, 'weekly_seasonality': 4, 'daily_seasonality': 8}, 'HVQM': {'yearly_seasonality': 25, 'quarterly_seasonality': 4, 'monthly_seasonality': 4, 'weekly_seasonality': 6, 'daily_seasonality': 12}, 'NONE': {'yearly_seasonality': 0, 'quarterly_seasonality': 0, 'monthly_seasonality': 0, 'weekly_seasonality': 0, 'daily_seasonality': 0}}, 'DAILY': {'LT': {'yearly_seasonality': 8, 'quarterly_seasonality': 0, 'monthly_seasonality': 0, 'weekly_seasonality': 3, 'daily_seasonality': 0}, 'NM': {'yearly_seasonality': 15, 'quarterly_seasonality': 0, 'monthly_seasonality': 0, 'weekly_seasonality': 3, 'daily_seasonality': 0}, 'HV': {'yearly_seasonality': 25, 'quarterly_seasonality': 0, 'monthly_seasonality': 0, 'weekly_seasonality': 4, 'daily_seasonality': 0}, 'LTQM': {'yearly_seasonality': 8, 'quarterly_seasonality': 3, 'monthly_seasonality': 2, 'weekly_seasonality': 3, 'daily_seasonality': 0}, 'NMQM': {'yearly_seasonality': 15, 'quarterly_seasonality': 4, 'monthly_seasonality': 4, 'weekly_seasonality': 3, 'daily_seasonality': 0}, 'HVQM': {'yearly_seasonality': 25, 'quarterly_seasonality': 6, 'monthly_seasonality': 4, 'weekly_seasonality': 4, 'daily_seasonality': 0}, 'NONE': {'yearly_seasonality': 0, 'quarterly_seasonality': 0, 'monthly_seasonality': 0, 'weekly_seasonality': 0, 'daily_seasonality': 0}}, 'WEEKLY': {'LT': {'yearly_seasonality': 8, 'quarterly_seasonality': 0, 'monthly_seasonality': 0, 'weekly_seasonality': 0, 'daily_seasonality': 0}, 'NM': {'yearly_seasonality': 15, 'quarterly_seasonality': 0, 'monthly_seasonality': 0, 'weekly_seasonality': 0, 'daily_seasonality': 0}, 'HV': {'yearly_seasonality': 25, 'quarterly_seasonality': 0, 'monthly_seasonality': 0, 'weekly_seasonality': 0, 'daily_seasonality': 0}, 'LTQM': {'yearly_seasonality': 8, 'quarterly_seasonality': 2, 'monthly_seasonality': 2, 'weekly_seasonality': 0, 'daily_seasonality': 0}, 'NMQM': {'yearly_seasonality': 15, 'quarterly_seasonality': 3, 'monthly_seasonality': 3, 'weekly_seasonality': 0, 'daily_seasonality': 0}, 'HVQM': {'yearly_seasonality': 25, 'quarterly_seasonality': 4, 'monthly_seasonality': 4, 'weekly_seasonality': 0, 'daily_seasonality': 0}, 'NONE': {'yearly_seasonality': 0, 'quarterly_seasonality': 0, 'monthly_seasonality': 0, 'weekly_seasonality': 0, 'daily_seasonality': 0}}}, 'GR': {'LINEAR': {'growth_term': 'linear'}, 'NONE': {'growth_term': None}}, 'CP': {'HOURLY': {'LT': {'method': 'auto', 'resample_freq': 'D', 'regularization_strength': 0.6, 'potential_changepoint_distance': '7D', 'no_changepoint_distance_from_end': '30D', 'yearly_seasonality_order': 15, 'yearly_seasonality_change_freq': None}, 'NM': {'method': 'auto', 'resample_freq': 'D', 'regularization_strength': 0.5, 'potential_changepoint_distance': '15D', 'no_changepoint_distance_from_end': '30D', 'yearly_seasonality_order': 15, 'yearly_seasonality_change_freq': '365D'}, 'HV': {'method': 'auto', 'resample_freq': 'D', 'regularization_strength': 0.3, 'potential_changepoint_distance': '15D', 'no_changepoint_distance_from_end': '30D', 'yearly_seasonality_order': 15, 'yearly_seasonality_change_freq': '365D'}, 'NONE': None}, 'DAILY': {'LT': {'method': 'auto', 'resample_freq': '7D', 'regularization_strength': 0.6, 'potential_changepoint_distance': '15D', 'no_changepoint_distance_from_end': '90D', 'yearly_seasonality_order': 15, 'yearly_seasonality_change_freq': None}, 'NM': {'method': 'auto', 'resample_freq': '7D', 'regularization_strength': 0.5, 'potential_changepoint_distance': '15D', 'no_changepoint_distance_from_end': '180D', 'yearly_seasonality_order': 15, 'yearly_seasonality_change_freq': '365D'}, 'HV': {'method': 'auto', 'resample_freq': '7D', 'regularization_strength': 0.3, 'potential_changepoint_distance': '15D', 'no_changepoint_distance_from_end': '180D', 'yearly_seasonality_order': 15, 'yearly_seasonality_change_freq': '365D'}, 'NONE': None}, 'WEEKLY': {'LT': {'method': 'auto', 'resample_freq': '7D', 'regularization_strength': 0.6, 'potential_changepoint_distance': '14D', 'no_changepoint_distance_from_end': '180D', 'yearly_seasonality_order': 15, 'yearly_seasonality_change_freq': None}, 'NM': {'method': 'auto', 'resample_freq': '7D', 'regularization_strength': 0.5, 'potential_changepoint_distance': '14D', 'no_changepoint_distance_from_end': '180D', 'yearly_seasonality_order': 15, 'yearly_seasonality_change_freq': '365D'}, 'HV': {'method': 'auto', 'resample_freq': '7D', 'regularization_strength': 0.3, 'potential_changepoint_distance': '14D', 'no_changepoint_distance_from_end': '180D', 'yearly_seasonality_order': 15, 'yearly_seasonality_change_freq': '365D'}, 'NONE': None}}, 'HOL': {'SP1': {'holidays_to_model_separately': 'auto', 'holiday_lookup_countries': 'auto', 'holiday_pre_num_days': 1, 'holiday_post_num_days': 1, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, 'SP2': {'holidays_to_model_separately': 'auto', 'holiday_lookup_countries': 'auto', 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, 'SP4': {'holidays_to_model_separately': 'auto', 'holiday_lookup_countries': 'auto', 'holiday_pre_num_days': 4, 'holiday_post_num_days': 4, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, 'TG': {'holidays_to_model_separately': [], 'holiday_lookup_countries': 'auto', 'holiday_pre_num_days': 3, 'holiday_post_num_days': 3, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, 'NONE': {'holidays_to_model_separately': [], 'holiday_lookup_countries': [], 'holiday_pre_num_days': 0, 'holiday_post_num_days': 0, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}}, 'FEASET': {'AUTO': 'auto', 'ON': True, 'OFF': False}, 'ALGO': {'LINEAR': {'fit_algorithm': 'linear', 'fit_algorithm_params': None}, 'RIDGE': {'fit_algorithm': 'ridge', 'fit_algorithm_params': None}, 'SGD': {'fit_algorithm': 'sgd', 'fit_algorithm_params': None}, 'LASSO': {'fit_algorithm': 'lasso', 'fit_algorithm_params': None}}, 'AR': {'AUTO': {'autoreg_dict': 'auto', 'simulation_num': 10}, 'OFF': {'autoreg_dict': None, 'simulation_num': 10}}, 'DSI': {'HOURLY': {'AUTO': 5, 'OFF': 0}, 'DAILY': {'AUTO': 0, 'OFF': 0}, 'WEEKLY': {'AUTO': 0, 'OFF': 0}}, 'WSI': {'HOURLY': {'AUTO': 2, 'OFF': 0}, 'DAILY': {'AUTO': 2, 'OFF': 0}, 'WEEKLY': {'AUTO': 0, 'OFF': 0}}}, MULTI_TEMPLATES={'SILVERKITE_DAILY_1': ['SILVERKITE_DAILY_1_CONFIG_1', 'SILVERKITE_DAILY_1_CONFIG_2', 'SILVERKITE_DAILY_1_CONFIG_3'], 'SILVERKITE_DAILY_90': ['DAILY_SEAS_LTQM_GR_LINEAR_CP_LT_HOL_SP2_FEASET_AUTO_ALGO_LINEAR_AR_OFF_DSI_AUTO_WSI_AUTO', 'DAILY_SEAS_LTQM_GR_LINEAR_CP_NONE_HOL_SP2_FEASET_AUTO_ALGO_LINEAR_AR_OFF_DSI_AUTO_WSI_AUTO', 'DAILY_SEAS_LTQM_GR_LINEAR_CP_LT_HOL_SP2_FEASET_AUTO_ALGO_RIDGE_AR_OFF_DSI_AUTO_WSI_AUTO', 'DAILY_SEAS_NM_GR_LINEAR_CP_LT_HOL_SP4_FEASET_AUTO_ALGO_RIDGE_AR_OFF_DSI_AUTO_WSI_AUTO'], 'SILVERKITE_WEEKLY': ['WEEKLY_SEAS_NM_GR_LINEAR_CP_NONE_HOL_NONE_FEASET_OFF_ALGO_LINEAR_AR_OFF_DSI_AUTO_WSI_AUTO', 'WEEKLY_SEAS_NM_GR_LINEAR_CP_LT_HOL_NONE_FEASET_OFF_ALGO_LINEAR_AR_OFF_DSI_AUTO_WSI_AUTO', 'WEEKLY_SEAS_HV_GR_LINEAR_CP_NM_HOL_NONE_FEASET_OFF_ALGO_RIDGE_AR_OFF_DSI_AUTO_WSI_AUTO', 'WEEKLY_SEAS_HV_GR_LINEAR_CP_LT_HOL_NONE_FEASET_OFF_ALGO_RIDGE_AR_OFF_DSI_AUTO_WSI_AUTO'], 'SILVERKITE_HOURLY_1': ['HOURLY_SEAS_LT_GR_LINEAR_CP_NONE_HOL_TG_FEASET_AUTO_ALGO_LINEAR_AR_AUTO', 'HOURLY_SEAS_NM_GR_LINEAR_CP_LT_HOL_SP4_FEASET_AUTO_ALGO_LINEAR_AR_AUTO', 'HOURLY_SEAS_LT_GR_LINEAR_CP_NM_HOL_SP4_FEASET_OFF_ALGO_RIDGE_AR_AUTO', 'HOURLY_SEAS_NM_GR_LINEAR_CP_NM_HOL_SP1_FEASET_AUTO_ALGO_RIDGE_AR_AUTO'], 'SILVERKITE_HOURLY_24': ['HOURLY_SEAS_LT_GR_LINEAR_CP_NM_HOL_SP4_FEASET_AUTO_ALGO_RIDGE_AR_AUTO', 'HOURLY_SEAS_LT_GR_LINEAR_CP_NONE_HOL_SP4_FEASET_AUTO_ALGO_RIDGE_AR_AUTO', 'HOURLY_SEAS_NM_GR_LINEAR_CP_LT_HOL_SP1_FEASET_OFF_ALGO_LINEAR_AR_AUTO', 'HOURLY_SEAS_NM_GR_LINEAR_CP_NM_HOL_SP4_FEASET_AUTO_ALGO_RIDGE_AR_AUTO'], 'SILVERKITE_HOURLY_168': ['HOURLY_SEAS_LT_GR_LINEAR_CP_LT_HOL_SP4_FEASET_AUTO_ALGO_RIDGE_AR_OFF', 'HOURLY_SEAS_LT_GR_LINEAR_CP_LT_HOL_SP2_FEASET_AUTO_ALGO_RIDGE_AR_OFF', 'HOURLY_SEAS_NM_GR_LINEAR_CP_NONE_HOL_SP4_FEASET_OFF_ALGO_LINEAR_AR_AUTO', 'HOURLY_SEAS_NM_GR_LINEAR_CP_NM_HOL_SP1_FEASET_AUTO_ALGO_RIDGE_AR_OFF'], 'SILVERKITE_HOURLY_336': ['HOURLY_SEAS_LT_GR_LINEAR_CP_LT_HOL_SP2_FEASET_AUTO_ALGO_RIDGE_AR_OFF', 'HOURLY_SEAS_LT_GR_LINEAR_CP_LT_HOL_SP4_FEASET_AUTO_ALGO_RIDGE_AR_OFF', 'HOURLY_SEAS_NM_GR_LINEAR_CP_LT_HOL_SP1_FEASET_AUTO_ALGO_LINEAR_AR_OFF', 'HOURLY_SEAS_NM_GR_LINEAR_CP_NM_HOL_SP1_FEASET_AUTO_ALGO_LINEAR_AR_AUTO']}, SILVERKITE=ModelComponentsParam(autoregression={'autoreg_dict': None, 'simulation_num': 10}, changepoints={'changepoints_dict': None, 'seasonality_changepoints_dict': None}, custom={'fit_algorithm_dict': {'fit_algorithm': 'ridge', 'fit_algorithm_params': None}, 'feature_sets_enabled': 'auto', 'max_daily_seas_interaction_order': 5, 'max_weekly_seas_interaction_order': 2, 'extra_pred_cols': [], 'drop_pred_cols': None, 'explicit_pred_cols': None, 'min_admissible_value': None, 'max_admissible_value': None, 'regression_weight_col': None, 'normalize_method': None}, events={'holidays_to_model_separately': 'auto', 'holiday_lookup_countries': 'auto', 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, growth={'growth_term': 'linear'}, hyperparameter_override=None, regressors={'regressor_cols': []}, lagged_regressors={'lagged_regressor_dict': None}, seasonality={'yearly_seasonality': 'auto', 'quarterly_seasonality': 'auto', 'monthly_seasonality': 'auto', 'weekly_seasonality': 'auto', 'daily_seasonality': 'auto'}, uncertainty={'uncertainty_dict': None}), SILVERKITE_WITH_AR=ModelComponentsParam(autoregression={'autoreg_dict': 'auto', 'simulation_num': 10}, changepoints={'changepoints_dict': None, 'seasonality_changepoints_dict': None}, custom={'fit_algorithm_dict': {'fit_algorithm': 'ridge', 'fit_algorithm_params': None}, 'feature_sets_enabled': 'auto', 'max_daily_seas_interaction_order': 5, 'max_weekly_seas_interaction_order': 2, 'extra_pred_cols': [], 'drop_pred_cols': None, 'explicit_pred_cols': None, 'min_admissible_value': None, 'max_admissible_value': None, 'regression_weight_col': None, 'normalize_method': None}, events={'holidays_to_model_separately': 'auto', 'holiday_lookup_countries': 'auto', 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, growth={'growth_term': 'linear'}, hyperparameter_override=None, regressors={'regressor_cols': []}, lagged_regressors={'lagged_regressor_dict': None}, seasonality={'yearly_seasonality': 'auto', 'quarterly_seasonality': 'auto', 'monthly_seasonality': 'auto', 'weekly_seasonality': 'auto', 'daily_seasonality': 'auto'}, uncertainty={'uncertainty_dict': None}), SILVERKITE_DAILY_1_CONFIG_1=ModelComponentsParam(autoregression={'autoreg_dict': 'auto', 'simulation_num': 10}, changepoints={'changepoints_dict': {'method': 'auto', 'resample_freq': '7D', 'regularization_strength': 0.809, 'potential_changepoint_distance': '7D', 'no_changepoint_distance_from_end': '7D', 'yearly_seasonality_order': 8, 'yearly_seasonality_change_freq': None}, 'seasonality_changepoints_dict': None}, custom={'fit_algorithm_dict': {'fit_algorithm': 'ridge', 'fit_algorithm_params': None}, 'feature_sets_enabled': 'auto', 'max_daily_seas_interaction_order': 5, 'max_weekly_seas_interaction_order': 2, 'extra_pred_cols': [], 'drop_pred_cols': None, 'explicit_pred_cols': None, 'min_admissible_value': None, 'max_admissible_value': None, 'regression_weight_col': None, 'normalize_method': None}, events={'holidays_to_model_separately': ("New Year's Day", 'Chinese New Year', 'Christmas Day', 'Independence Day', 'Thanksgiving', 'Labor Day', 'Good Friday', 'Easter Monday [England, Wales, Northern Ireland]', 'Memorial Day', 'Veterans Day'), 'holiday_lookup_countries': ('UnitedStates', 'UnitedKingdom', 'India', 'France', 'China'), 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, growth={'growth_term': 'linear'}, hyperparameter_override=None, regressors={'regressor_cols': []}, lagged_regressors={'lagged_regressor_dict': None}, seasonality={'yearly_seasonality': 8, 'quarterly_seasonality': 0, 'monthly_seasonality': 7, 'weekly_seasonality': 1, 'daily_seasonality': 0}, uncertainty={'uncertainty_dict': None}), SILVERKITE_DAILY_1_CONFIG_2=ModelComponentsParam(autoregression={'autoreg_dict': 'auto', 'simulation_num': 10}, changepoints={'changepoints_dict': {'method': 'auto', 'resample_freq': '7D', 'regularization_strength': 0.624, 'potential_changepoint_distance': '7D', 'no_changepoint_distance_from_end': '17D', 'yearly_seasonality_order': 1, 'yearly_seasonality_change_freq': None}, 'seasonality_changepoints_dict': None}, custom={'fit_algorithm_dict': {'fit_algorithm': 'ridge', 'fit_algorithm_params': None}, 'feature_sets_enabled': 'auto', 'max_daily_seas_interaction_order': 5, 'max_weekly_seas_interaction_order': 2, 'extra_pred_cols': [], 'drop_pred_cols': None, 'explicit_pred_cols': None, 'min_admissible_value': None, 'max_admissible_value': None, 'regression_weight_col': None, 'normalize_method': None}, events={'holidays_to_model_separately': ("New Year's Day", 'Chinese New Year', 'Christmas Day', 'Independence Day', 'Thanksgiving', 'Labor Day', 'Good Friday', 'Easter Monday [England, Wales, Northern Ireland]', 'Memorial Day', 'Veterans Day'), 'holiday_lookup_countries': ('UnitedStates', 'UnitedKingdom', 'India', 'France', 'China'), 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, growth={'growth_term': 'linear'}, hyperparameter_override=None, regressors={'regressor_cols': []}, lagged_regressors={'lagged_regressor_dict': None}, seasonality={'yearly_seasonality': 1, 'quarterly_seasonality': 0, 'monthly_seasonality': 4, 'weekly_seasonality': 6, 'daily_seasonality': 0}, uncertainty={'uncertainty_dict': None}), SILVERKITE_DAILY_1_CONFIG_3=ModelComponentsParam(autoregression={'autoreg_dict': 'auto', 'simulation_num': 10}, changepoints={'changepoints_dict': {'method': 'auto', 'resample_freq': '7D', 'regularization_strength': 0.59, 'potential_changepoint_distance': '7D', 'no_changepoint_distance_from_end': '8D', 'yearly_seasonality_order': 40, 'yearly_seasonality_change_freq': None}, 'seasonality_changepoints_dict': None}, custom={'fit_algorithm_dict': {'fit_algorithm': 'ridge', 'fit_algorithm_params': None}, 'feature_sets_enabled': 'auto', 'max_daily_seas_interaction_order': 5, 'max_weekly_seas_interaction_order': 2, 'extra_pred_cols': [], 'drop_pred_cols': None, 'explicit_pred_cols': None, 'min_admissible_value': None, 'max_admissible_value': None, 'regression_weight_col': None, 'normalize_method': None}, events={'holidays_to_model_separately': ("New Year's Day", 'Chinese New Year', 'Christmas Day', 'Independence Day', 'Thanksgiving', 'Labor Day', 'Good Friday', 'Easter Monday [England, Wales, Northern Ireland]', 'Memorial Day', 'Veterans Day'), 'holiday_lookup_countries': ('UnitedStates', 'UnitedKingdom', 'India', 'France', 'China'), 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, growth={'growth_term': 'linear'}, hyperparameter_override=None, regressors={'regressor_cols': []}, lagged_regressors={'lagged_regressor_dict': None}, seasonality={'yearly_seasonality': 40, 'quarterly_seasonality': 0, 'monthly_seasonality': 0, 'weekly_seasonality': 2, 'daily_seasonality': 0}, uncertainty={'uncertainty_dict': None}), SILVERKITE_COMPONENT_KEYWORDS=<enum 'SILVERKITE_COMPONENT_KEYWORDS'>, SILVERKITE_EMPTY='DAILY_SEAS_NONE_GR_NONE_CP_NONE_HOL_NONE_FEASET_OFF_ALGO_LINEAR_AR_OFF_DSI_OFF_WSI_OFF', VALID_FREQ=['HOURLY', 'DAILY', 'WEEKLY'], SimpleSilverkiteTemplateOptions=<class 'greykite.framework.templates.simple_silverkite_template_config.SimpleSilverkiteTemplateOptions'>), estimator: greykite.sklearn.estimator.base_forecast_estimator.BaseForecastEstimator = SimpleSilverkiteEstimator())[source]¶ A template for
SimpleSilverkiteEstimator
.Takes input data and optional configuration parameters to customize the model. Returns a set of parameters to call
forecast_pipeline
.Notes
The attributes of a
ForecastConfig
forSimpleSilverkiteEstimator
are:- computation_param: ComputationParam or None, default None
How to compute the result. See
ComputationParam
.- coverage: float or None, default None
Intended coverage of the prediction bands (0.0 to 1.0). Same as coverage in
forecast_pipeline
. You may tune how the uncertainty is computed via model_components.uncertainty[“uncertainty_dict”].- evaluation_metric_param: EvaluationMetricParam or None, default None
What metrics to evaluate. See
EvaluationMetricParam
.- evaluation_period_param: EvaluationPeriodParam or None, default None
How to split data for evaluation. See
EvaluationPeriodParam
.- forecast_horizon: int or None, default None
Number of periods to forecast into the future. Must be > 0 If None, default is determined from input data frequency Same as forecast_horizon in forecast_pipeline
- metadata_param: MetadataParam or None, default None
Information about the input data. See
MetadataParam
.- model_components_param:
ModelComponentsParam
, list [ModelComponentsParam
] or None, default None Parameters to tune the model. See
ModelComponentsParam
. The fields are dictionaries with the following items.See inline comments on which values accept lists for grid search.
- seasonality: dict [str, any] or None, optional
Seasonality configuration dictionary, with the following optional keys. (keys are SilverkiteSeasonalityEnum members in lower case).
The keys are parameters of forecast_simple_silverkite. Refer to that function for more details.
"yearly_seasonality"
: str or bool or int or a list of such values for grid search, default ‘auto’Determines the yearly seasonality ‘auto’, True, False, or a number for the Fourier order
"quarterly_seasonality"
: str or bool or int or a list of such values for grid search, default ‘auto’Determines the quarterly seasonality ‘auto’, True, False, or a number for the Fourier order
"monthly_seasonality"
: str or bool or int or a list of such values for grid search, default ‘auto’Determines the monthly seasonality ‘auto’, True, False, or a number for the Fourier order
"weekly_seasonality"
: str or bool or int or a list of such values for grid search, default ‘auto’Determines the weekly seasonality ‘auto’, True, False, or a number for the Fourier order
"daily_seasonality"
: str or bool or int or a list of such values for grid search, default ‘auto’Determines the daily seasonality ‘auto’, True, False, or a number for the Fourier order
- growth: dict [str, any] or None, optional
Growth configuration dictionary with the following optional key:
"growth_term"
: str or None or a list of such values for grid searchHow to model the growth. Valid options are “linear”, “quadratic”, “sqrt”, “cubic”, “cuberoot” All these terms have their origin at the train start date.
- events: dict [str, any] or None, optional
Holiday/events configuration dictionary with the following optional keys:
"holiday_lookup_countries"
: list [str] or “auto” or None or a list of such values for grid search, default “auto”The countries that contain the holidays you intend to model (
holidays_to_model_separately
).If “auto”, uses a default list of countries that contain the default
holidays_to_model_separately
. SeeHOLIDAY_LOOKUP_COUNTRIES_AUTO
.If a list, must be a list of country names.
If None or an empty list, no holidays are modeled.
"holidays_to_model_separately"
: list [str] or “auto” orALL_HOLIDAYS_IN_COUNTRIES
or None or a list of such values for grid search, default “auto” # noqa: E501Which holidays to include in the model. The model creates a separate key, value for each item in
holidays_to_model_separately
. The other holidays in the countries are grouped together as a single effect.If “auto”, uses a default list of important holidays. See
HOLIDAYS_TO_MODEL_SEPARATELY_AUTO
.If
ALL_HOLIDAYS_IN_COUNTRIES
, uses all available holidays inholiday_lookup_countries
. This can often create a model that has too many parameters, and should typically be avoided.If a list, must be a list of holiday names.
If None or an empty list, all holidays in
holiday_lookup_countries
are grouped together as a single effect.
Use
holiday_lookup_countries
to provide a list of countries where these holiday occur."holiday_pre_num_days"
: int or a list of such values for grid search, default 2model holiday effects for pre_num days before the holiday. The unit is days, not periods. It does not depend on input data frequency.
"holiday_post_num_days"
: int or a list of such values for grid search, default 2model holiday effects for post_num days after the holiday. The unit is days, not periods. It does not depend on input data frequency.
"holiday_pre_post_num_dict"
: dict [str, (int, int)] or None, default NoneOverrides
pre_num
andpost_num
for each holiday inholidays_to_model_separately
. For example, ifholidays_to_model_separately
contains “Thanksgiving” and “Labor Day”, this parameter can be set to{"Thanksgiving": [1, 3], "Labor Day": [1, 2]}
, denoting that the “Thanksgiving”pre_num
is 1 andpost_num
is 3, and “Labor Day”pre_num
is 1 andpost_num
is 2. Holidays not specified use the default given bypre_num
andpost_num
."daily_event_df_dict"
: dict [str,pandas.DataFrame
] or None, default NoneA dictionary of data frames, each representing events data for the corresponding key. Specifies additional events to include besides the holidays specified above. The format is the same as in
forecast
. The DataFrame has two columns:The first column contains event dates. Must be in a format recognized by
pandas.to_datetime
. Must be at daily frequency for proper join. It is joined against the time indf
, converted to a day:pd.to_datetime(pd.DatetimeIndex(df[time_col]).date)
.the second column contains the event label for each date
The column order is important; column names are ignored. The event dates must span their occurrences in both the training and future prediction period.
During modeling, each key in the dictionary is mapped to a categorical variable named
f"{EVENT_PREFIX}_{key}"
, whose value at each timestamp is specified by the corresponding DataFrame.For example, to manually specify a yearly event on September 1 during a training/forecast period that spans 2020-2022:
daily_event_df_dict = { "custom_event": pd.DataFrame({ "date": ["2020-09-01", "2021-09-01", "2022-09-01"], "label": ["is_event", "is_event", "is_event"] }) }
It’s possible to specify multiple events in the same df. Two events,
"sep"
and"oct"
are specified below for 2020-2021:daily_event_df_dict = { "custom_event": pd.DataFrame({ "date": ["2020-09-01", "2020-10-01", "2021-09-01", "2021-10-01"], "event_name": ["sep", "oct", "sep", "oct"] }) }
Use multiple keys if two events may fall on the same date. These events must be in separate DataFrames:
daily_event_df_dict = { "fixed_event": pd.DataFrame({ "date": ["2020-09-01", "2021-09-01", "2022-09-01"], "event_name": "fixed_event" }), "moving_event": pd.DataFrame({ "date": ["2020-09-01", "2021-08-28", "2022-09-03"], "event_name": "moving_event" }), }
The multiple event specification can be used even if events never overlap. An equivalent specification to the second example:
daily_event_df_dict = { "sep": pd.DataFrame({ "date": ["2020-09-01", "2021-09-01"], "event_name": "is_event" }), "oct": pd.DataFrame({ "date": ["2020-10-01", "2021-10-01"], "event_name": "is_event" }), }
Note: All these events are automatically added to the model. There is no need to specify them in
extra_pred_cols
as you would forforecast
.Note: Do not use
EVENT_DEFAULT
in the second column. This is reserved to indicate dates that do not correspond to an event.- changepoints: dict [str, dict] or None, optional
Specifies the changepoint configuration. Dictionary with the following optional key:
"changepoints_dict"
: dict or None or a list of such values for grid searchChangepoints dictionary passed to
forecast_simple_silverkite
. A dictionary with the following optional keys:"method"
: strThe method to locate changepoints. Valid options:
“uniform”. Places n_changepoints evenly spaced changepoints to allow growth to change.
“custom”. Places changepoints at the specified dates.
“auto”. Automatically detects change points.
Additional keys to provide parameters for each particular method are described below.
"continuous_time_col"
: str or NoneColumn to apply growth_func to, to generate changepoint features Typically, this should match the growth term in the model
"growth_func"
: callable or NoneGrowth function (numeric -> numeric). Changepoint features are created by applying growth_func to “continuous_time_col” with offsets. If None, uses identity function to use continuous_time_col directly as growth term
If changepoints_dict[“method”] == “uniform”, this other key is required:
"n_changepoints"
: intnumber of changepoints to evenly space across training period
If changepoints_dict[“method”] == “custom”, this other key is required:
"dates"
: list [int or float or str ordatetime
]Changepoint dates. Must be parsable by pd.to_datetime. Changepoints are set at the closest time on or after these dates in the dataset.
If changepoints_dict[“method”] == “auto”, optional keys can be passed that match the parameters in
find_trend_changepoints
(exceptdf
,time_col
andvalue_col
, which are already known). To add manually specified changepoints to the automatically detected ones, the keysdates
,combine_changepoint_min_distance
andkeep_detected
can be specified, which correspond to the three parameterscustom_changepoint_dates
,min_distance
andkeep_detected
incombine_detected_and_custom_trend_changepoints
."seasonality_changepoints_dict"
: dict or None or a list of such values for grid searchseasonality changepoints dictionary passed to
forecast_simple_silverkite
. The optional keys are the parameters infind_seasonality_changepoints
. You don’t need to providedf
,time_col
,value_col
ortrend_changepoints
, since they are passed with the class automatically.
- autoregression: dict [str, dict] or None, optional
Specifies the autoregression configuration. Dictionary with the following optional keys:
"autoreg_dict"
: dict or str or None or a list of such values for grid searchIf a dict: A dictionary with arguments for
build_autoreg_df
. That function’s parametervalue_col
is inferred from the input of current functionself.forecast
. Other keys are:"lag_dict"
: dict or None"agg_lag_dict"
: dict or None"series_na_fill_func"
: callableIf a str: The string will represent a method and a dictionary will be constructed using that str. Currently only implemented method is “auto” which uses __get_default_autoreg_dict to create a dictionary. See more details for above parameters in
build_autoreg_df
."simulation_num"
int, default 10The number of simulations to use. Applies only if any of the lags in
autoreg_dict
are smaller thanforecast_horizon
. In that case, simulations are needed to generate forecasts and prediction intervals.
- regressors: dict [str, any] or None, optional
Specifies the regressors to include in the model (e.g. macro-economic factors). Dictionary with the following optional keys:
"regressor_cols"
: list [str] or None or a list of such values for grid searchThe columns in
df
to use as regressors. Note that regressor values must be available indf
for all prediction dates. Thus,df
will contain timestamps for both training and future prediction.regressors must be available on all dates
the response must be available for training dates (metadata[“value_col”])
Use
extra_pred_cols
to specify interactions of any model terms with the regressors.- lagged_regressors: dict [str, dict] or None, optional
Specifies the lagged regressors configuration. Dictionary with the following optional key:
"lagged_regressor_dict"
: dict or None or a list of such values for grid searchA dictionary with arguments for
build_autoreg_df_multi
. The keys of the dictionary are the target lagged regressor column names. It can leverage the regressors included indf
. The value of each key is either a dict or str. If dict, it has the following keys:"lag_dict"
: dict or None"agg_lag_dict"
: dict or None"series_na_fill_func"
: callableIf str, it represents a method and a dictionary will be constructed using that str. Currently the only implemented method is “auto” which uses
SilverkiteForecast
’s __get_default_lagged_regressor_dict to create a dictionary for each lagged regressor. An example:lagged_regressor_dict = { "regressor1": { "lag_dict": {"orders": [1, 2, 3]}, "agg_lag_dict": { "orders_list": [[7, 7 * 2, 7 * 3]], "interval_list": [(8, 7 * 2)]}, "series_na_fill_func": lambda s: s.bfill().ffill()}, "regressor2": "auto"}
Check the docstring of
build_autoreg_df_multi
for more details for each argument.
- uncertainty: dict [str, dict] or None, optional
Along with
coverage
, specifies the uncertainty interval configuration. Usecoverage
to set interval size. Useuncertainty
to tune the calculation."uncertainty_dict"
: str or dict or None or a list of such values for grid search“auto” or a dictionary on how to fit the uncertainty model. If a dictionary, valid keys are:
"uncertainty_method"
: strThe title of the method. Only
"simple_conditional_residuals"
is implemented infit_ml_model
which calculates intervals using residuals."params"
: dictA dictionary of parameters needed for the requested
uncertainty_method
. For example, foruncertainty_method="simple_conditional_residuals"
, see parameters ofconf_interval
:"conditional_cols"
"quantiles"
"quantile_estimation_method"
"sample_size_thresh"
"small_sample_size_method"
"small_sample_size_quantile"
The default value for
quantiles
is inferred from coverage.
If “auto”, see
get_silverkite_uncertainty_dict
for the default value. Ifcoverage
is not None anduncertainty_dict
is not provided, then the “auto” setting is used.If
coverage
is None anduncertainty_dict
is None, then no intervals are returned.
- custom: dict [str, any] or None, optional
Custom parameters that don’t fit the categories above. Dictionary with the following optional keys:
"fit_algorithm_dict"
: dict or a list of such values for grid searchHow to fit the model. A dictionary with the following optional keys.
"fit_algorithm"
: str, optional, default “ridge”The type of predictive model used in fitting.
See
fit_model_via_design_matrix
for available options and their parameters."fit_algorithm_params"
: dict or None, optional, default NoneParameters passed to the requested fit_algorithm. If None, uses the defaults in
fit_model_via_design_matrix
.
"feature_sets_enabled"
: dict [str, bool or “auto” or None] or bool or “auto” or None; or a list of such values for grid searchWhether to include interaction terms and categorical variables to increase model flexibility.
If a dict, boolean values indicate whether include various sets of features in the model. The following keys are recognized (from
SilverkiteColumn
):"COLS_HOUR_OF_WEEK"
: strConstant hour of week effect
"COLS_WEEKEND_SEAS"
: strDaily seasonality interaction with is_weekend
"COLS_DAY_OF_WEEK_SEAS"
: strDaily seasonality interaction with day of week
"COLS_TREND_DAILY_SEAS"
: strAllow daily seasonality to change over time by is_weekend
"COLS_EVENT_SEAS"
: strAllow sub-daily event effects
"COLS_EVENT_WEEKEND_SEAS"
: strAllow sub-daily event effect to interact with is_weekend
"COLS_DAY_OF_WEEK"
: strConstant day of week effect
"COLS_TREND_WEEKEND"
: strAllow trend (growth, changepoints) to interact with is_weekend
"COLS_TREND_DAY_OF_WEEK"
: strAllow trend to interact with day of week
"COLS_TREND_WEEKLY_SEAS"
: strAllow weekly seasonality to change over time
The following dictionary values are recognized:
True: include the feature set in the model
False: do not include the feature set in the model
None: do not include the feature set in the model
“auto” or not provided: use the default setting based on data frequency and size
If not a dict:
if a boolean, equivalent to a dictionary with all values set to the boolean.
if None, equivalent to a dictionary with all values set to False.
if “auto”, equivalent to a dictionary with all values set to “auto”.
"max_daily_seas_interaction_order"
: int or None or a list of such values for grid search, default 5Max fourier order to use for interactions with daily seasonality. (COLS_EVENT_SEAS, COLS_EVENT_WEEKEND_SEAS, COLS_WEEKEND_SEAS, COLS_DAY_OF_WEEK_SEAS, COLS_TREND_DAILY_SEAS).
Model includes interactions terms specified by
feature_sets_enabled
up to the order limited by this value and the available order fromseasonality
."max_weekly_seas_interaction_order"
int or None or a list of such values for grid search, default 2Max fourier order to use for interactions with weekly seasonality (COLS_TREND_WEEKLY_SEAS).
Model includes interactions terms specified by
feature_sets_enabled
up to the order limited by this value and the available order fromseasonality
."extra_pred_cols"
: list [str] or None or a list of such values for grid search, default NoneNames of extra predictor columns to pass to
forecast_silverkite
. The standard interactions can be controlled viafeature_sets_enabled
parameter. Accepts any valid patsy model formula term. Can be used to model complex interactions of time features, events, seasonality, changepoints, regressors. Columns should be generated bybuild_silverkite_features
or included with input data. These are added to any features already included byfeature_sets_enabled
and terms specified bymodel
."drop_pred_cols"
list [str] or None, default NoneNames of predictor columns to be dropped from the final model. Ignored if None.
"explicit_pred_cols"
list [str] or None, default NoneNames of the explicit predictor columns which will be the only variables in the final model. Note that this overwrites the generated predictors in the model and may include new terms not appearing in the predictors (e.g. interaction terms). Ignored if None.
"min_admissible_value"
: float or double or int or None, default NoneThe lowest admissible value for the forecasts and prediction intervals. Any value below this will be mapped back to this value. If None, there is no lower bound.
"max_admissible_value"
: float or double or int or None, default NoneThe highest admissible value for the forecasts and prediction intervals. Any value above this will be mapped back to this value. If None, there is no upper bound.
"normalize_method"
: str or None, default NoneThe normalization method for feature matrix. Available values are “statistical” and “min_max”.
- hyperparameter_override: dict [str, any] or None or list [dict [str, any] or None], optional
After the above model components are used to create a hyperparameter grid, the result is updated by this dictionary, to create new keys or override existing ones. Allows for complete customization of the grid search.
Keys should have format
{named_step}__{parameter_name}
for the named steps of thesklearn.pipeline.Pipeline
returned by this function. Seesklearn.pipeline.Pipeline
.For example:
hyperparameter_override={ "estimator__silverkite": SimpleSilverkiteForecast(), "estimator__silverkite_diagnostics": SilverkiteDiagnostics(), "estimator__growth_term": "linear", "input__response__null__impute_algorithm": "ts_interpolate", "input__response__null__impute_params": {"orders": [7, 14]}, "input__regressors_numeric__normalize__normalize_algorithm": "RobustScaler", }
If a list of dictionaries, grid search will be done for each dictionary in the list. Each dictionary in the list override the defaults. This enables grid search over specific combinations of parameters to reduce the search space.
For example, the first dictionary could define combinations of parameters for a “complex” model, and the second dictionary could define combinations of parameters for a “simple” model, to prevent mixed combinations of simple and complex.
Or the first dictionary could grid search over fit algorithm, and the second dictionary could use a single fit algorithm and grid search over seasonality.
The result is passed as the
param_distributions
parameter tosklearn.model_selection.RandomizedSearchCV
.
- model_template: str, list`[`str] or None, default None
The simple silverkite template support single templates, multi templates or a list of single/multi templates. A valid single template must be either
SILVERKITE
or consists of{FREQ}_SEAS_{VAL}_GR_{VAL}_CP_{VAL}_HOL_{VAL}_FEASET_{VAL}_ALGO_{VAL}_AR_{VAL}
For example, we have DAILY_SEAS_NM_GR_LINEAR_CP_LT_HOL_NONE_FEASET_ON_ALGO_RIDGE_AR_ON. The valid FREQ and VAL can be found at template_defaults. The components stand for seasonality, growth, changepoints_dict, events, feature_sets_enabled, fit_algorithm and autoregression in
ModelComponentsParam
, which is used inSimpleSilverkiteTemplate
. Users are allowed toOmit any number of component-value pairs, and the omitted will be filled with default values.
Switch the order of different component-value pairs.
A valid multi template must belong to MULTI_TEMPLATES or must be a list of single or multi template names.
-
DEFAULT_MODEL_TEMPLATE
= 'SILVERKITE'¶ The default model template. See
ModelTemplateEnum
. Uses a string to avoid circular imports. Overrides the value fromForecastConfigDefaults
.
-
property
allow_model_template_list
¶ SimpleSilverkiteTemplate allows config.model_template to be a list.
-
property
allow_model_components_param_list
¶ SilverkiteTemplate allows config.model_components_param to be a list.
-
property
constants
¶ Constants used by the template class. Includes the model templates and their default values.
-
get_regressor_cols
()[source]¶ Returns regressor column names from the model components.
Implements the method in
BaseTemplate
.Uses these attributes:
model_components:
ModelComponentsParam
, list [ModelComponentsParam
] or None, default NoneConfiguration of model growth, seasonality, holidays, etc. See
SimpleSilverkiteTemplate
for details.- Returns
regressor_cols – The names of regressor columns used in any hyperparameter set requested by
model_components
. None if there are no regressors.- Return type
list [str] or None
-
get_lagged_regressor_info
()[source]¶ Returns lagged regressor column names and minimal/maximal lag order. The lag order can be used to check potential imputation in the computation of lags.
Implements the method in
BaseTemplate
.- Returns
lagged_regressor_info – A dictionary that includes the lagged regressor column names and maximal/minimal lag order The keys are:
- lagged_regressor_colslist [str] or None
See
forecast_pipeline
.
overall_min_lag_order : int or None overall_max_lag_order : int or None
For example:
self.config.model_components_param.lagged_regressors["lagged_regressor_dict"] = [ {"regressor1": { "lag_dict": {"orders": [7]}, "agg_lag_dict": { "orders_list": [[7, 7 * 2, 7 * 3]], "interval_list": [(8, 7 * 2)]}, "series_na_fill_func": lambda s: s.bfill().ffill()} }, {"regressor2": { "lag_dict": {"orders": [2]}, "agg_lag_dict": { "orders_list": [[7, 7 * 2]], "interval_list": [(8, 7 * 2)]}, "series_na_fill_func": lambda s: s.bfill().ffill()} }, {"regressor3": "auto"} ]
Then the function returns:
lagged_regressor_info = { "lagged_regressor_cols": ["regressor1", "regressor2", "regressor3"], "overall_min_lag_order": 2, "overall_max_lag_order": 21 }
Note that “regressor3” is skipped as the “auto” option makes sure the lag order is proper.
- Return type
dict
-
get_hyperparameter_grid
()[source]¶ Returns hyperparameter grid.
Implements the method in
BaseTemplate
.Converts model components, time properties, and model template into
SimpleSilverkiteEstimator
hyperparameters.Uses these attributes:
model_components:
ModelComponentsParam
, list [ModelComponentsParam
] or None, default NoneConfiguration of model growth, seasonality, events, etc. See
SimpleSilverkiteTemplate
for details.- time_properties: dict [str, any] or None, default None
Time properties dictionary (likely produced by
get_forecast_time_properties
) with keys:"period"
: intPeriod of each observation (i.e. minimum time between observations, in seconds).
"simple_freq"
: SimpleTimeFrequencyEnumSimpleTimeFrequencyEnum
member corresponding to data frequency."num_training_points"
: intNumber of observations for training.
"num_training_days"
: intNumber of days for training.
"start_year"
: intStart year of the training period.
"end_year"
: intEnd year of the forecast period.
"origin_for_time_vars"
: floatContinuous time representation of the first date in
df
.
- model_template: str, default “SILVERKITE”
The name of model template, must be one of the valid templates defined in
SimpleSilverkiteTemplate
.
Notes
forecast_pipeline
handles the train/test splits according toEvaluationPeriodParam
, soestimator__train_test_thresh
andestimator__training_fraction
are always None.Similarly,
estimator__origin_for_time_vars
is set to None.- Returns
hyperparameter_grid – hyperparameter_grid for grid search in
forecast_pipeline
. The output dictionary values are lists, combined in grid search.- Return type
dict [str, list [any]] or list [ dict [str, list [any]] ]
-
check_template_type
(template)[source]¶ Checks the template name is valid and whether it is single or multi template. Raises an error if the template is not recognized.
A valid single template must be either
SILVERKITE
or consists of{FREQ}_SEAS_{VAL}_GR_{VAL}_CP_{VAL}_HOL_{VAL}_FEASET_{VAL}_ALGO_{VAL}_AR_{VAL}
For example, we have DAILY_SEAS_NM_GR_LINEAR_CP_LT_HOL_NONE_FEASET_ON_ALGO_RIDGE_AR_ON. The valid FREQ and VAL can be found at template_defaults. The components stand for seasonality, growth, changepoints_dict, events, feature_sets_enabled, fit_algorithm and autoregression in
ModelComponentsParam
, which is used inSimpleSilverkiteTemplate
. Users are allowed toOmit any number of component-value pairs, and the omitted will be filled with default values.
Switch the order of different component-value pairs.
A valid multi template must belong to MULTI_TEMPLATES or must be a list of single or multi template names.
- Parameters
template (str, SimpleSilverkiteTemplateName or list`[`str, SimpleSilverkiteTemplateName]) – The
model_template
parameter fed intoForecastConfig
. for simple silverkite templates.- Returns
template_type – “single” or “multi”.
- Return type
str
-
get_model_components_from_model_template
(template)[source]¶ Gets the
ModelComponentsParam
class from model template.The template could be a name string, a SimpleSilverkiteTemplateOptions dataclass, or a list of such strings and/or dataclasses. If a list is given, a list of
ModelComponentsParam
is returned. If a single element is given, a list of length 1 is returned.- Parameters
template (str, SimpleSilverkiteTemplateOptions or list [str, SimpleSilverkiteTemplateOptions]) – The
model_template
in ForecastConfig, could be a name string, a SimpleSilverkiteTemplateOptions dataclass, or a list of such strings and/or dataclasses.- Returns
model_components_param – The list of
ModelComponentsParam
class(es) that correspond totemplate
.- Return type
list [
ModelComponentsParam
]
-
static
apply_computation_defaults
(computation: Optional[greykite.framework.templates.autogen.forecast_config.ComputationParam] = None) → greykite.framework.templates.autogen.forecast_config.ComputationParam¶ Applies the default ComputationParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a ComputationParam object.
- Parameters
computation (
ComputationParam
or None) – The ComputationParam object.- Returns
computation – Valid ComputationParam object with the provided attribute values and the default attribute values if not.
- Return type
-
static
apply_evaluation_metric_defaults
(evaluation: Optional[greykite.framework.templates.autogen.forecast_config.EvaluationMetricParam] = None) → greykite.framework.templates.autogen.forecast_config.EvaluationMetricParam¶ Applies the default EvaluationMetricParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a EvaluationMetricParam object.
- Parameters
evaluation (
EvaluationMetricParam
or None) – The EvaluationMetricParam object.- Returns
evaluation – Valid EvaluationMetricParam object with the provided attribute values and the default attribute values if not.
- Return type
-
static
apply_evaluation_period_defaults
(evaluation: Optional[greykite.framework.templates.autogen.forecast_config.EvaluationPeriodParam] = None) → greykite.framework.templates.autogen.forecast_config.EvaluationPeriodParam¶ Applies the default EvaluationPeriodParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a EvaluationPeriodParam object.
- Parameters
evaluation (
EvaluationPeriodParam
or None) – The EvaluationMetricParam object.- Returns
evaluation – Valid EvaluationPeriodParam object with the provided attribute values and the default attribute values if not.
- Return type
-
apply_forecast_config_defaults
(config: Optional[greykite.framework.templates.autogen.forecast_config.ForecastConfig] = None) → greykite.framework.templates.autogen.forecast_config.ForecastConfig¶ Applies the default Forecast Config values to the given config. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input config is None, it creates a Forecast Config.
- Parameters
config (
ForecastConfig
or None) – Forecast configuration if available. SeeForecastConfig
.- Returns
config – A valid Forecast Config which contains the provided attribute values and the default attribute values if not.
- Return type
ForecastConfig
-
static
apply_metadata_defaults
(metadata: Optional[greykite.framework.templates.autogen.forecast_config.MetadataParam] = None) → greykite.framework.templates.autogen.forecast_config.MetadataParam¶ Applies the default MetadataParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a MetadataParam object.
- Parameters
metadata (
MetadataParam
or None) – The MetadataParam object.- Returns
metadata – Valid MetadataParam object with the provided attribute values and the default attribute values if not.
- Return type
-
static
apply_model_components_defaults
(model_components: Optional[Union[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, List[Optional[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam]]]] = None) → Union[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, List[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam]]¶ Applies the default ModelComponentsParam values to the given object.
Converts None to a ModelComponentsParam object. Unpacks a list of a single element to the element itself.
- Parameters
model_components (
ModelComponentsParam
or None or list of such items) – The ModelComponentsParam object.- Returns
model_components – Valid ModelComponentsParam object with the provided attribute values and the default attribute values if not.
- Return type
ModelComponentsParam
or list of such items
-
apply_model_template_defaults
(model_template: Optional[Union[str, List[Optional[str]]]] = None) → Union[str, List[str]]¶ Applies the default ModelComponentsParam values to the given object.
Unpacks a list of a single element to the element itself. Sets default value if None.
- Parameters
model_template (str or None or list [None, str]) – The model template name. See valid names in
ModelTemplateEnum
.- Returns
model_template – The model template name, with defaults value used if not provided.
- Return type
str or list [str]
-
static
apply_template_decorator
(func)¶ Decorator for
apply_template_for_pipeline_params
function.By default, this applies
apply_forecast_config_defaults
toconfig
.Subclass may override this for pre/post processing of
apply_template_for_pipeline_params
, such as input validation. In this case,apply_template_for_pipeline_params
must also be implemented in the subclass.
-
apply_template_for_pipeline_params
(df: pandas.core.frame.DataFrame, config: Optional[greykite.framework.templates.autogen.forecast_config.ForecastConfig] = None) → Dict¶ Implements template interface method. Takes input data and optional configuration parameters to customize the model. Returns a set of parameters to call
forecast_pipeline
.See template interface for parameters and return value.
Uses the methods in this class to set:
"regressor_cols"
: get_regressor_cols()lagged_regressor_cols
: get_lagged_regressor_info()"pipeline"
: get_pipeline()"time_properties"
: get_forecast_time_properties()"hyperparameter_grid"
: get_hyperparameter_grid()
All other parameters are taken directly from
config
.
-
property
estimator
¶ The estimator instance to use as the final step in the pipeline. An instance of
greykite.sklearn.estimator.base_forecast_estimator.BaseForecastEstimator
.
-
get_forecast_time_properties
()¶ Returns forecast time parameters.
Uses
self.df
,self.config
,self.regressor_cols
.Available parameters:
self.df
self.config
self.score_func
self.score_func_greater_is_better
self.regressor_cols
self.lagged_regressor_cols
self.estimator
self.pipeline
- Returns
time_properties – Time properties dictionary (likely produced by
get_forecast_time_properties
) with keys:"period"
intPeriod of each observation (i.e. minimum time between observations, in seconds).
"simple_freq"
SimpleTimeFrequencyEnumSimpleTimeFrequencyEnum
member corresponding to data frequency."num_training_points"
intNumber of observations for training.
"num_training_days"
intNumber of days for training.
"start_year"
intStart year of the training period.
"end_year"
intEnd year of the forecast period.
"origin_for_time_vars"
floatContinuous time representation of the first date in
df
.
- Return type
dict [str, any] or None, default None
-
get_pipeline
()¶ Returns pipeline.
Implementation may be overridden by subclass if a different pipeline is desired.
Uses
self.estimator
,self.score_func
,self.score_func_greater_is_better
,self.config
,self.regressor_cols
.Available parameters:
self.df
self.config
self.score_func
self.score_func_greater_is_better
self.regressor_cols
self.estimator
- Returns
pipeline – See
forecast_pipeline
.- Return type
-
class
greykite.sklearn.estimator.simple_silverkite_estimator.
SimpleSilverkiteEstimator
(silverkite: greykite.algo.forecast.silverkite.forecast_simple_silverkite.SimpleSilverkiteForecast = <greykite.algo.forecast.silverkite.forecast_simple_silverkite.SimpleSilverkiteForecast object>, silverkite_diagnostics: greykite.algo.forecast.silverkite.silverkite_diagnostics.SilverkiteDiagnostics = <greykite.algo.forecast.silverkite.silverkite_diagnostics.SilverkiteDiagnostics object>, score_func: callable = <function mean_squared_error>, coverage: float = None, null_model_params: Optional[Dict] = None, time_properties: Optional[Dict] = None, freq: Optional[str] = None, forecast_horizon: Optional[int] = None, origin_for_time_vars: Optional[float] = None, train_test_thresh: Optional[datetime.datetime] = None, training_fraction: Optional[float] = None, fit_algorithm_dict: Optional[Dict] = None, holidays_to_model_separately: Optional[Union[str, List[str]]] = 'auto', holiday_lookup_countries: Optional[Union[str, List[str]]] = 'auto', holiday_pre_num_days: int = 2, holiday_post_num_days: int = 2, holiday_pre_post_num_dict: Optional[Dict] = None, daily_event_df_dict: Optional[Dict] = None, changepoints_dict: Optional[Dict] = None, yearly_seasonality: Union[bool, str, int] = 'auto', quarterly_seasonality: Union[bool, str, int] = 'auto', monthly_seasonality: Union[bool, str, int] = 'auto', weekly_seasonality: Union[bool, str, int] = 'auto', daily_seasonality: Union[bool, str, int] = 'auto', max_daily_seas_interaction_order: Optional[int] = None, max_weekly_seas_interaction_order: Optional[int] = None, autoreg_dict: Optional[Dict] = None, past_df: Optional[pandas.core.frame.DataFrame] = None, lagged_regressor_dict: Optional[Dict] = None, seasonality_changepoints_dict: Optional[Dict] = None, min_admissible_value: Optional[float] = None, max_admissible_value: Optional[float] = None, uncertainty_dict: Optional[Dict] = None, normalize_method: Optional[str] = None, growth_term: Optional[str] = 'linear', regressor_cols: Optional[List[str]] = None, feature_sets_enabled: Optional[Union[bool, Dict[str, bool]]] = None, extra_pred_cols: Optional[List[str]] = None, drop_pred_cols: Optional[List[str]] = None, explicit_pred_cols: Optional[List[str]] = None, regression_weight_col: Optional[str] = None, simulation_based: Optional[bool] = False, simulation_num: int = 10)[source]¶ Wrapper for forecast_simple_silverkite.
- Parameters
score_func (callable, optional, default mean_squared_error) – See
BaseForecastEstimator
.coverage (float between [0.0, 1.0] or None, optional) – See
BaseForecastEstimator
.null_model_params (dict or None, optional) – Dictionary with arguments to define
DummyRegressor
null model, default is None. SeeBaseForecastEstimator
.fit_algorithm_dict (dict or None, optional) –
How to fit the model. A dictionary with the following optional keys.
"fit_algorithm"
str, optional, default “ridge”The type of predictive model used in fitting.
See
fit_model_via_design_matrix
for available options and their parameters."fit_algorithm_params"
dict or None, optional, default NoneParameters passed to the requested fit_algorithm. If None, uses the defaults in
fit_model_via_design_matrix
.
uncertainty_dict (dict or str or None, optional) – How to fit the uncertainty model. See
forecast
. Note that this is allowed to be “auto”. If None or “auto”, will be set to a default value bycoverage
before callingforecast_silverkite
. SeeBaseForecastEstimator
for details.kwargs (additional parameters) –
Other parameters are the same as in forecast_simple_silverkite.
See source code
__init__
for the parameter names, and refer to forecast_simple_silverkite for their description.If this Estimator is called from
forecast_pipeline
,train_test_thresh
andtraining_fraction
should almost always be None, because train/test is handled outside this Estimator.
Notes
Attributes match those of
BaseSilverkiteEstimator
.See also
None
For attributes and details on fit, predict, and component plots.
None
Function to transform the parameters to call
forecast_silverkite
fit.None
Functions performing the fit and predict.
-
fit
(X, y=None, time_col='ts', value_col='y', **fit_params)[source]¶ Fits
Silverkite
forecast model.- Parameters
X (
pandas.DataFrame
) – Input timeseries, with timestamp column, value column, and any additional regressors. The value column is the response, included inX
to allow transformation bysklearn.pipeline
.y (ignored) – The original timeseries values, ignored. (The
y
for fitting is included inX
).time_col (str) – Time column name in
X
.value_col (str) – Value column name in
X
.fit_params (dict) – additional parameters for null model.
- Returns
self – Fitted model is stored in
self.model_dict
.- Return type
self
-
finish_fit
()¶ Makes important values of
self.model_dict
conveniently accessible.To be called by subclasses at the end of their
fit
method. Sets {pred_cols
,feature_cols
, andcoef_
}.
-
fit_uncertainty
(df: pandas.core.frame.DataFrame, uncertainty_dict: dict, **kwargs)¶ Fits the uncertainty model with a given
df
anduncertainty_dict
.- Parameters
df (
pandas.DataFrame
) – A dataframe representing the data to fit the uncertainty model.uncertainty_dict (dict [str, any]) –
The uncertainty model specification. It should have the following keys:
- ”uncertainty_method”: a string that is in
UncertaintyMethodEnum
.
”params”: a dictionary that includes any additional parameters needed by the uncertainty method.
kwargs (additional parameters to be fed into the uncertainty method.) – These parameters are from the estimator attributes, not given by user.
- Returns
- Return type
The function sets
self.uncertainty_model
and does not return anything.
-
get_max_ar_order
()¶ Gets the maximum autoregression order.
- Returns
max_ar_order – The maximum autoregression order.
- Return type
int
-
get_params
(deep=True)¶ Get parameters for this estimator.
-
plot_seasonalities
(title=None)¶ Convenience function to plot the data and the seasonality components.
- Parameters
title (str, optional, default None) – Plot title.
- Returns
fig – Figure.
- Return type
-
plot_trend
(title=None)¶ Convenience function to plot the data and the trend component.
- Parameters
title (str, optional, default None) – Plot title.
- Returns
fig – Figure.
- Return type
-
plot_trend_changepoint_detection
(params=None)¶ Convenience function to plot the original trend changepoint detection results.
- Parameters
params (dict or None, default None) –
The parameters in
plot
. If set to None, all components will be plotted.Note: seasonality components plotting is not supported currently.
plot
parameter must be False.- Returns
fig – Figure.
- Return type
-
property
pred_category
¶ A dictionary that stores the predictor names in each category.
This property is not initialized until used. This speeds up the fitting process. The categories includes
“intercept” : the intercept.
“time_features” : the predictors that include
TIME_FEATURES
but notSEASONALITY_REGEX
.“event_features” : the predictors that include
EVENT_PREFIX
.“trend_features” : the predictors that include
TREND_REGEX
but notSEASONALITY_REGEX
.“seasonality_features” : the predictors that include
SEASONALITY_REGEX
.“lag_features” : the predictors that include
LAG_REGEX
.“regressor_features” : external regressors and other predictors manually passed to
extra_pred_cols
, but not in the categories above.“interaction_features” : the predictors that include interaction terms, i.e., including a colon.
Note that each predictor falls into at least one category. Some “time_features” may also be “trend_features”. Predictors with an interaction are classified into all categories matched by the interaction components. Thus, “interaction_features” are already included in the other categories.
-
predict
(X, y=None)¶ Creates forecast for the dates specified in
X
.- Parameters
X (
pandas.DataFrame
) – Input timeseries with timestamp column and any additional regressors. Timestamps are the dates for prediction. Value column, if provided inX
, is ignored.y (ignored.) –
- Returns
predictions –
Forecasted values for the dates in
X
. Columns:TIME_COL
: datesPREDICTED_COL
: predictionsPREDICTED_LOWER_COL
: lower bound of predictions, optionalPREDICTED_UPPER_COL
: upper bound of predictions, optional[other columns], optional
PREDICTED_LOWER_COL
andPREDICTED_UPPER_COL
are present ifself.coverage
is not None.- Return type
-
predict_uncertainty
(df: pandas.core.frame.DataFrame)¶ Makes predictions of prediction intervals for
df
based on the predictions andself.uncertainty_model
.- Parameters
df (
pandas.DataFrame
) – The dataframe to calculate prediction intervals upon. It should have eitherself.value_col_
or PREDICT_COL which the prediction interval is based on.- Returns
result_df – The
df
with prediction interval columns.- Return type
-
score
(X, y, sample_weight=None)¶ Default scorer for the estimator (Used in GridSearchCV/RandomizedSearchCV if scoring=None)
Notes
If null_model_params is not None, returns R2_null_model_score of model error relative to null model, evaluated by score_func.
If null_model_params is None, returns score_func of the model itself.
By default, grid search (with no scoring parameter) optimizes improvement of
score_func
against null model.To optimize a different score function, pass scoring to GridSearchCV/RandomizedSearchCV.
- Parameters
X (
pandas.DataFrame
) – Input timeseries with timestamp column and any additional regressors. Value column, if provided in X, is ignoredy (
pandas.Series
ornumpy.array
) – Actual value, used to compute errorsample_weight (
pandas.Series
ornumpy.array
) – ignored
- Returns
score – Comparison of predictions against null predictions, according to specified score function
- Return type
float or None
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
-
summary
(max_colwidth=20)¶ Creates human readable string of how the model works, including relevant diagnostics These details cannot be extracted from the forecast alone Prints model configuration. Extend this in child class to print the trained model parameters.
Log message is printed to the cst.LOGGER_NAME logger.
-
class
greykite.sklearn.estimator.silverkite_estimator.
SilverkiteEstimator
(silverkite: greykite.algo.forecast.silverkite.forecast_silverkite.SilverkiteForecast = <greykite.algo.forecast.silverkite.forecast_silverkite.SilverkiteForecast object>, silverkite_diagnostics: greykite.algo.forecast.silverkite.silverkite_diagnostics.SilverkiteDiagnostics = <greykite.algo.forecast.silverkite.silverkite_diagnostics.SilverkiteDiagnostics object>, score_func=<function mean_squared_error>, coverage=None, null_model_params=None, origin_for_time_vars=None, extra_pred_cols=None, drop_pred_cols=None, explicit_pred_cols=None, train_test_thresh=None, training_fraction=None, fit_algorithm_dict=None, daily_event_df_dict=None, fs_components_df= name period order seas_names 0 tod 24.0 3 daily 1 tow 7.0 3 weekly 2 conti_year 1.0 5 yearly, autoreg_dict=None, past_df=None, lagged_regressor_dict=None, changepoints_dict=None, seasonality_changepoints_dict=None, changepoint_detector=None, min_admissible_value=None, max_admissible_value=None, uncertainty_dict=None, normalize_method=None, adjust_anomalous_dict=None, impute_dict=None, regression_weight_col=None, forecast_horizon=None, simulation_based=False, simulation_num=10)[source]¶ Wrapper for
forecast
.- Parameters
score_func (callable, optional, default mean_squared_error) – See
BaseForecastEstimator
.coverage (float between [0.0, 1.0] or None, optional) – See
BaseForecastEstimator
.null_model_params (dict or None, optional) – Dictionary with arguments to define
DummyRegressor
null model, default is None. SeeBaseForecastEstimator
.fit_algorithm_dict (dict or None, optional) –
How to fit the model. A dictionary with the following optional keys.
"fit_algorithm"
str, optional, default “linear”The type of predictive model used in fitting.
See
fit_model_via_design_matrix
for available options and their parameters."fit_algorithm_params"
dict or None, optional, default NoneParameters passed to the requested fit_algorithm. If None, uses the defaults in
fit_model_via_design_matrix
.
uncertainty_dict (dict or str or None, optional) – How to fit the uncertainty model. See
forecast
. Note that this is allowed to be “auto”. If None or “auto”, will be set to a default value bycoverage
before callingforecast_silverkite
. SeeBaseForecastEstimator
for details.fs_components_df (
pandas.DataFrame
or None, optional) –A dataframe with information about fourier series generation. If provided, it must contain columns with following names:
”name”: name of the timeseries feature (e.g.
tod
,tow
etc.).”period”: Period of the fourier series.
”order”: Order of the fourier series. “seas_names”: Label for the type of seasonality (e.g.
daily
,weekly
etc.) and should be unique.validate_fs_components_df
checks for it, so that component plots don’t have duplicate y-axis labels.
This differs from the expected input of forecast_silverkite where “period”, “order” and “seas_names” are optional. This restriction is to facilitate appropriate computation of component (e.g. trend, seasonalities and holidays) effects. See Notes section in this docstring for a more detailed explanation with examples.
Other parameters are the same as in
forecast
.If this Estimator is called from
forecast_pipeline
,train_test_thresh
andtraining_fraction
should almost always be None, because train/test is handled outside this Estimator.The attributes are the same as
BaseSilverkiteEstimator
.See also
None
For details on fit, predict, and component plots.
None
Functions performing the fit and predict.
-
fit
(X, y=None, time_col='ts', value_col='y', **fit_params)[source]¶ Fits
Silverkite
forecast model.- Parameters
X (
pandas.DataFrame
) – Input timeseries, with timestamp column, value column, and any additional regressors. The value column is the response, included inX
to allow transformation bysklearn.pipeline
.y (ignored) – The original timeseries values, ignored. (The
y
for fitting is included inX
).time_col (str) – Time column name in
X
.value_col (str) – Value column name in
X
.fit_params (dict) – additional parameters for null model.
-
static
validate_fs_components_df
(fs_components_df)[source]¶ Validates the inputs of a fourier series components dataframe called by
SilverkiteEstimator
to validate the inputfs_components_df
.- Parameters
fs_components_df (
pandas.DataFrame
) –A DataFrame with information about fourier series generation. Must contain columns with following names:
”name”: name of the timeseries feature (e.g. “tod”, “tow” etc.)
”period”: Period of the fourier series
”order”: Order of the fourier series
”seas_names”: seas_name corresponding to the name (e.g. “daily”, “weekly” etc.).
-
finish_fit
()¶ Makes important values of
self.model_dict
conveniently accessible.To be called by subclasses at the end of their
fit
method. Sets {pred_cols
,feature_cols
, andcoef_
}.
-
fit_uncertainty
(df: pandas.core.frame.DataFrame, uncertainty_dict: dict, **kwargs)¶ Fits the uncertainty model with a given
df
anduncertainty_dict
.- Parameters
df (
pandas.DataFrame
) – A dataframe representing the data to fit the uncertainty model.uncertainty_dict (dict [str, any]) –
The uncertainty model specification. It should have the following keys:
- ”uncertainty_method”: a string that is in
UncertaintyMethodEnum
.
”params”: a dictionary that includes any additional parameters needed by the uncertainty method.
kwargs (additional parameters to be fed into the uncertainty method.) – These parameters are from the estimator attributes, not given by user.
- Returns
- Return type
The function sets
self.uncertainty_model
and does not return anything.
-
get_max_ar_order
()¶ Gets the maximum autoregression order.
- Returns
max_ar_order – The maximum autoregression order.
- Return type
int
-
get_params
(deep=True)¶ Get parameters for this estimator.
-
plot_seasonalities
(title=None)¶ Convenience function to plot the data and the seasonality components.
- Parameters
title (str, optional, default None) – Plot title.
- Returns
fig – Figure.
- Return type
-
plot_trend
(title=None)¶ Convenience function to plot the data and the trend component.
- Parameters
title (str, optional, default None) – Plot title.
- Returns
fig – Figure.
- Return type
-
plot_trend_changepoint_detection
(params=None)¶ Convenience function to plot the original trend changepoint detection results.
- Parameters
params (dict or None, default None) –
The parameters in
plot
. If set to None, all components will be plotted.Note: seasonality components plotting is not supported currently.
plot
parameter must be False.- Returns
fig – Figure.
- Return type
-
property
pred_category
¶ A dictionary that stores the predictor names in each category.
This property is not initialized until used. This speeds up the fitting process. The categories includes
“intercept” : the intercept.
“time_features” : the predictors that include
TIME_FEATURES
but notSEASONALITY_REGEX
.“event_features” : the predictors that include
EVENT_PREFIX
.“trend_features” : the predictors that include
TREND_REGEX
but notSEASONALITY_REGEX
.“seasonality_features” : the predictors that include
SEASONALITY_REGEX
.“lag_features” : the predictors that include
LAG_REGEX
.“regressor_features” : external regressors and other predictors manually passed to
extra_pred_cols
, but not in the categories above.“interaction_features” : the predictors that include interaction terms, i.e., including a colon.
Note that each predictor falls into at least one category. Some “time_features” may also be “trend_features”. Predictors with an interaction are classified into all categories matched by the interaction components. Thus, “interaction_features” are already included in the other categories.
-
predict
(X, y=None)¶ Creates forecast for the dates specified in
X
.- Parameters
X (
pandas.DataFrame
) – Input timeseries with timestamp column and any additional regressors. Timestamps are the dates for prediction. Value column, if provided inX
, is ignored.y (ignored.) –
- Returns
predictions –
Forecasted values for the dates in
X
. Columns:TIME_COL
: datesPREDICTED_COL
: predictionsPREDICTED_LOWER_COL
: lower bound of predictions, optionalPREDICTED_UPPER_COL
: upper bound of predictions, optional[other columns], optional
PREDICTED_LOWER_COL
andPREDICTED_UPPER_COL
are present ifself.coverage
is not None.- Return type
-
predict_uncertainty
(df: pandas.core.frame.DataFrame)¶ Makes predictions of prediction intervals for
df
based on the predictions andself.uncertainty_model
.- Parameters
df (
pandas.DataFrame
) – The dataframe to calculate prediction intervals upon. It should have eitherself.value_col_
or PREDICT_COL which the prediction interval is based on.- Returns
result_df – The
df
with prediction interval columns.- Return type
-
score
(X, y, sample_weight=None)¶ Default scorer for the estimator (Used in GridSearchCV/RandomizedSearchCV if scoring=None)
Notes
If null_model_params is not None, returns R2_null_model_score of model error relative to null model, evaluated by score_func.
If null_model_params is None, returns score_func of the model itself.
By default, grid search (with no scoring parameter) optimizes improvement of
score_func
against null model.To optimize a different score function, pass scoring to GridSearchCV/RandomizedSearchCV.
- Parameters
X (
pandas.DataFrame
) – Input timeseries with timestamp column and any additional regressors. Value column, if provided in X, is ignoredy (
pandas.Series
ornumpy.array
) – Actual value, used to compute errorsample_weight (
pandas.Series
ornumpy.array
) – ignored
- Returns
score – Comparison of predictions against null predictions, according to specified score function
- Return type
float or None
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
-
summary
(max_colwidth=20)¶ Creates human readable string of how the model works, including relevant diagnostics These details cannot be extracted from the forecast alone Prints model configuration. Extend this in child class to print the trained model parameters.
Log message is printed to the cst.LOGGER_NAME logger.
-
class
greykite.sklearn.estimator.base_silverkite_estimator.
BaseSilverkiteEstimator
(silverkite: greykite.algo.forecast.silverkite.forecast_silverkite.SilverkiteForecast = <greykite.algo.forecast.silverkite.forecast_silverkite.SilverkiteForecast object>, silverkite_diagnostics: greykite.algo.forecast.silverkite.silverkite_diagnostics.SilverkiteDiagnostics = <greykite.algo.forecast.silverkite.silverkite_diagnostics.SilverkiteDiagnostics object>, score_func: callable = <function mean_squared_error>, coverage: float = None, null_model_params: Optional[Dict] = None, uncertainty_dict: Optional[Dict] = None)[source]¶ A base class for forecast estimators that fit using
forecast
.Notes
Allows estimators that fit using
forecast
to share the same functions for input data validation, fit postprocessing, predict, summary, plot_components, etc.Subclasses should:
Implement their own
__init__
that uses a superset of the parameters here.Implement their own
fit
, with this sequence of steps:calls
super().fit
calls
SilverkiteForecast.forecast
orSimpleSilverkiteForecast.forecast_simple
and stores the result inself.model_dict
calls
super().finish_fit
Uses
coverage
to set prediction band width. Even though coverage is not needed byforecast_silverkite
, it is included in everyBaseForecastEstimator
to be used universally for forecast evaluation.Therefore,
uncertainty_dict
must be consistent withcoverage
if provided as a dictionary. Ifuncertainty_dict
is None or “auto”, an appropriate default value is set, according tocoverage
.- Parameters
score_func (callable, optional, default mean_squared_error) – See
BaseForecastEstimator
.coverage (float between [0.0, 1.0] or None, optional) – See
BaseForecastEstimator
.null_model_params (dict, optional) – Dictionary with arguments to define DummyRegressor null model, default is None. See
BaseForecastEstimator
.uncertainty_dict (dict or str or None, optional) – How to fit the uncertainty model. See
forecast
. Note that this is allowed to be “auto”. If None or “auto”, will be set to a default value bycoverage
before callingforecast_silverkite
.
-
silverkite
¶ The silverkite algorithm instance used for forecasting
- Type
Class or a derived class of
SilverkiteForecast
-
silverkite_diagnostics
¶ The silverkite class used for plotting and generating model summary.
- Type
Class or a derived class of
SilverkiteDiagnostics
-
pred_cols
¶ Names of the features used in the model.
- Type
list [str] or None
-
feature_cols
¶ Column names of the patsy design matrix built by
design_mat_from_formula
.- Type
list [str] or None
-
df
¶ The training data used to fit the model.
- Type
pandas.DataFrame
or None
-
coef_
¶ Estimated coefficient matrix for the model. Not available for
random forest
andgradient boosting
methods and set to the default value None.- Type
pandas.DataFrame
or None
-
_pred_category
¶ A dictionary with keys being the predictor category and values being the predictors belonging to the category. For details, see
pred_category
.- Type
dict or None
-
extra_pred_cols
¶ User provided extra predictor names, for details, see
SimpleSilverkiteEstimator
orSilverkiteEstimator
.- Type
list or None
-
past_df
¶ The extra past data before training data used to generate autoregression terms.
- Type
pandas.DataFrame
or None
-
forecast
¶ Output of
predict_silverkite
, set byself.predict
.- Type
pandas.DataFrame
or None
-
forecast_x_mat
¶ The design matrix of the model at the predict time.
- Type
pandas.DataFrame
or None
-
model_summary
¶ The
ModelSummary
class.- Type
class or None
See also
None
Function performing the fit and predict.
Notes
The subclasses will pass
fs_components_df
toforecast_silverkite
. The model terms it creates internally are used to generate the component plots.fourier_series_multi_fcn
usesfs_components_df["names"]
(e.g.tod
,tow
) to build the fourier series and to create column names.fs_components_df["seas_names"]
(e.g.daily
,weekly
) is appended to the column names, if provided.
plot_silverkite_components
groups based onfs_components_df["seas_names"]
passed toforecast_silverkite
during fit. E.g. any column containingdaily
is added to daily seasonality effect. The reason is as follows:1. User can provide
tow
andstr_dow
for weekly seasonality. These should be aggregated, and we can do that only based on “seas_names”. 2. yearly and quarterly seasonality both usect1
as “names” column. Only way to distinguish those effects is via “seas_names”. 3.ct1
is also used for growth. If it is interacted with seasonality, the columns become indistinguishable without “seas_names”.Additionally, the function sets yaxis labels based on
seas_names
:daily
as ylabel is much more informative thantod
as ylabel in component plots.-
fit
(X, y=None, time_col='ts', value_col='y', **fit_params)[source]¶ Pre-processing before fitting
Silverkite
forecast model.- Parameters
X (
pandas.DataFrame
) – Input timeseries, with timestamp column, value column, and any additional regressors. The value column is the response, included inX
to allow transformation bysklearn.pipeline
.y (ignored) – The original timeseries values, ignored. (The
y
for fitting is included inX
).time_col (str) – Time column name in
X
.value_col (str) – Value column name in
X
.fit_params (dict) – additional parameters for null model.
Notes
Subclasses are expected to call this at the beginning of their
fit
method, before callingforecast
.
-
finish_fit
()[source]¶ Makes important values of
self.model_dict
conveniently accessible.To be called by subclasses at the end of their
fit
method. Sets {pred_cols
,feature_cols
, andcoef_
}.
-
predict
(X, y=None)[source]¶ Creates forecast for the dates specified in
X
.- Parameters
X (
pandas.DataFrame
) – Input timeseries with timestamp column and any additional regressors. Timestamps are the dates for prediction. Value column, if provided inX
, is ignored.y (ignored.) –
- Returns
predictions –
Forecasted values for the dates in
X
. Columns:TIME_COL
: datesPREDICTED_COL
: predictionsPREDICTED_LOWER_COL
: lower bound of predictions, optionalPREDICTED_UPPER_COL
: upper bound of predictions, optional[other columns], optional
PREDICTED_LOWER_COL
andPREDICTED_UPPER_COL
are present ifself.coverage
is not None.- Return type
-
property
pred_category
¶ A dictionary that stores the predictor names in each category.
This property is not initialized until used. This speeds up the fitting process. The categories includes
“intercept” : the intercept.
“time_features” : the predictors that include
TIME_FEATURES
but notSEASONALITY_REGEX
.“event_features” : the predictors that include
EVENT_PREFIX
.“trend_features” : the predictors that include
TREND_REGEX
but notSEASONALITY_REGEX
.“seasonality_features” : the predictors that include
SEASONALITY_REGEX
.“lag_features” : the predictors that include
LAG_REGEX
.“regressor_features” : external regressors and other predictors manually passed to
extra_pred_cols
, but not in the categories above.“interaction_features” : the predictors that include interaction terms, i.e., including a colon.
Note that each predictor falls into at least one category. Some “time_features” may also be “trend_features”. Predictors with an interaction are classified into all categories matched by the interaction components. Thus, “interaction_features” are already included in the other categories.
-
get_max_ar_order
()[source]¶ Gets the maximum autoregression order.
- Returns
max_ar_order – The maximum autoregression order.
- Return type
int
-
summary
(max_colwidth=20)[source]¶ Creates human readable string of how the model works, including relevant diagnostics These details cannot be extracted from the forecast alone Prints model configuration. Extend this in child class to print the trained model parameters.
Log message is printed to the cst.LOGGER_NAME logger.
-
plot_trend
(title=None)[source]¶ Convenience function to plot the data and the trend component.
- Parameters
title (str, optional, default None) – Plot title.
- Returns
fig – Figure.
- Return type
-
plot_seasonalities
(title=None)[source]¶ Convenience function to plot the data and the seasonality components.
- Parameters
title (str, optional, default None) – Plot title.
- Returns
fig – Figure.
- Return type
-
plot_trend_changepoint_detection
(params=None)[source]¶ Convenience function to plot the original trend changepoint detection results.
- Parameters
params (dict or None, default None) –
The parameters in
plot
. If set to None, all components will be plotted.Note: seasonality components plotting is not supported currently.
plot
parameter must be False.- Returns
fig – Figure.
- Return type
-
fit_uncertainty
(df: pandas.core.frame.DataFrame, uncertainty_dict: dict, **kwargs)¶ Fits the uncertainty model with a given
df
anduncertainty_dict
.- Parameters
df (
pandas.DataFrame
) – A dataframe representing the data to fit the uncertainty model.uncertainty_dict (dict [str, any]) –
The uncertainty model specification. It should have the following keys:
- ”uncertainty_method”: a string that is in
UncertaintyMethodEnum
.
”params”: a dictionary that includes any additional parameters needed by the uncertainty method.
kwargs (additional parameters to be fed into the uncertainty method.) – These parameters are from the estimator attributes, not given by user.
- Returns
- Return type
The function sets
self.uncertainty_model
and does not return anything.
-
get_params
(deep=True)¶ Get parameters for this estimator.
-
predict_uncertainty
(df: pandas.core.frame.DataFrame)¶ Makes predictions of prediction intervals for
df
based on the predictions andself.uncertainty_model
.- Parameters
df (
pandas.DataFrame
) – The dataframe to calculate prediction intervals upon. It should have eitherself.value_col_
or PREDICT_COL which the prediction interval is based on.- Returns
result_df – The
df
with prediction interval columns.- Return type
-
score
(X, y, sample_weight=None)¶ Default scorer for the estimator (Used in GridSearchCV/RandomizedSearchCV if scoring=None)
Notes
If null_model_params is not None, returns R2_null_model_score of model error relative to null model, evaluated by score_func.
If null_model_params is None, returns score_func of the model itself.
By default, grid search (with no scoring parameter) optimizes improvement of
score_func
against null model.To optimize a different score function, pass scoring to GridSearchCV/RandomizedSearchCV.
- Parameters
X (
pandas.DataFrame
) – Input timeseries with timestamp column and any additional regressors. Value column, if provided in X, is ignoredy (
pandas.Series
ornumpy.array
) – Actual value, used to compute errorsample_weight (
pandas.Series
ornumpy.array
) – ignored
- Returns
score – Comparison of predictions against null predictions, according to specified score function
- Return type
float or None
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
-
class
greykite.framework.templates.simple_silverkite_template_config.
SimpleSilverkiteTemplateOptions
(freq: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_FREQ = <SILVERKITE_FREQ.DAILY: 'DAILY'>, seas: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_SEAS = <SILVERKITE_SEAS.LT: 'LT'>, gr: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_GR = <SILVERKITE_GR.LINEAR: 'LINEAR'>, cp: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_CP = <SILVERKITE_CP.NONE: 'NONE'>, hol: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_HOL = <SILVERKITE_HOL.NONE: 'NONE'>, feaset: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_FEASET = <SILVERKITE_FEASET.OFF: 'OFF'>, algo: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_ALGO = <SILVERKITE_ALGO.LINEAR: 'LINEAR'>, ar: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_AR = <SILVERKITE_AR.OFF: 'OFF'>, dsi: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_DSI = <SILVERKITE_DSI.AUTO: 'AUTO'>, wsi: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_WSI = <SILVERKITE_WSI.AUTO: 'AUTO'>)[source]¶ Defines generic simple silverkite template options.
Attributes can be set to different values using
SILVERKITE_COMPONENT_KEYWORDS
for high level tuning.freq
represents data frequency.The other attributes stand for seasonality, growth, changepoints_dict, events, feature_sets_enabled, fit_algorithm and autoregression in
ModelComponentsParam
, which are used inSimpleSilverkiteTemplate
.-
freq
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_FREQ = 'DAILY'¶ Valid values for simple silverkite template string name frequency. See
SILVERKITE_FREQ
.
-
seas
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_SEAS = 'LT'¶ Valid values for simple silverkite template string name seasonality. See
SILVERKITE_SEAS
.
-
gr
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_GR = 'LINEAR'¶ Valid values for simple silverkite template string name growth. See
SILVERKITE_GR
.
-
cp
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_CP = 'NONE'¶ Valid values for simple silverkite template string name changepoints. See
SILVERKITE_CP
.
-
hol
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_HOL = 'NONE'¶ Valid values for simple silverkite template string name holiday. See
SILVERKITE_HOL
.
-
feaset
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_FEASET = 'OFF'¶ Valid values for simple silverkite template string name feature sets enabled. See
SILVERKITE_FEASET
.
-
algo
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_ALGO = 'LINEAR'¶ Valid values for simple silverkite template string name fit algorithm. See
SILVERKITE_ALGO
.
-
ar
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_AR = 'OFF'¶ Valid values for simple silverkite template string name autoregression. See
SILVERKITE_AR
.
-
dsi
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_DSI = 'AUTO'¶ Valid values for simple silverkite template string name max daily seasonality interaction order. See
SILVERKITE_DSI
.
-
wsi
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_WSI = 'AUTO'¶ Valid values for simple silverkite template string name max weekly seasonality interaction order. See
SILVERKITE_WSI
.
-
-
class
greykite.framework.templates.silverkite_template.
SilverkiteTemplate
[source]¶ A template for
SilverkiteEstimator
.Takes input data and optional configuration parameters to customize the model. Returns a set of parameters to call
forecast_pipeline
.Notes
The attributes of a
ForecastConfig
forSilverkiteEstimator
are:- computation_param: ComputationParam or None, default None
How to compute the result. See
ComputationParam
.- coverage: float or None, default None
Intended coverage of the prediction bands (0.0 to 1.0). Same as coverage in
forecast_pipeline
. You may tune how the uncertainty is computed via model_components.uncertainty[“uncertainty_dict”].- evaluation_metric_param: EvaluationMetricParam or None, default None
What metrics to evaluate. See
EvaluationMetricParam
.- evaluation_period_param: EvaluationPeriodParam or None, default None
How to split data for evaluation. See
EvaluationPeriodParam
.- forecast_horizon: int or None, default None
Number of periods to forecast into the future. Must be > 0 If None, default is determined from input data frequency Same as forecast_horizon in forecast_pipeline
- metadata_param: MetadataParam or None, default None
Information about the input data. See
MetadataParam
.- model_components_param:
ModelComponentsParam
or None, default None Parameters to tune the model. See
ModelComponentsParam
. The fields are dictionaries with the following items.See inline comments on which values accept lists for grid search.
- seasonality: dict [str, any] or None, optional
How to model the seasonality. A dictionary with keys corresponding to parameters in
forecast
.Allowed keys:
"fs_components_df"
.- growth: dict [str, any] or None, optional
How to model the growth.
Allowed keys: None. (Use
model_components.custom["extra_pred_cols"]
to specify growth terms.)- events: dict [str, any] or None, optional
How to model the holidays/events. A dictionary with keys corresponding to parameters in
forecast
.Allowed keys:
"daily_event_df_dict"
.Note
Event names derived from
daily_event_df_dict
must be specified viamodel_components.custom["extra_pred_cols"]
to be included in the model. This parameter has no effect on the model unless event names are passed toextra_pred_cols
.The function
get_event_pred_cols
can be used to extract all event names fromdaily_event_df_dict
.- changepoints: dict [str, any] or None, optional
How to model changes in trend and seasonality. A dictionary with keys corresponding to parameters in
forecast
.Allowed keys: “changepoints_dict”, “seasonality_changepoints_dict”, “changepoint_detector”.
- autoregression: dict [str, any] or None, optional
Specifies the autoregression configuration. Dictionary with the following optional key:
"autoreg_dict"
: dict or str or None or a list of such values for grid searchIf a dict: A dictionary with arguments for
build_autoreg_df
. That function’s parametervalue_col
is inferred from the input of current functionself.forecast
. Other keys are:"lag_dict"
: dict or None"agg_lag_dict"
: dict or None"series_na_fill_func"
: callableIf a str: The string will represent a method and a dictionary will be constructed using that str. Currently only implemented method is “auto” which uses __get_default_autoreg_dict to create a dictionary. See more details for above parameters in
build_autoreg_df
.
- regressors: dict [str, any] or None, optional
How to model the regressors.
Allowed keys: None. (Use
model_components.custom["extra_pred_cols"]
to specify regressors.)- lagged_regressors: dict [str, dict] or None, optional
Specifies the lagged regressors configuration. Dictionary with the following optional key:
"lagged_regressor_dict"
: dict or None or a list of such values for grid searchA dictionary with arguments for
build_autoreg_df_multi
. The keys of the dictionary are the target lagged regressor column names. It can leverage the regressors included indf
. The value of each key is either a dict or str. If dict, it has the following keys:"lag_dict"
: dict or None"agg_lag_dict"
: dict or None"series_na_fill_func"
: callableIf str, it represents a method and a dictionary will be constructed using that str. Currently the only implemented method is “auto” which uses
SilverkiteForecast
’s __get_default_lagged_regressor_dict to create a dictionary for each lagged regressor. An example:lagged_regressor_dict = { "regressor1": { "lag_dict": {"orders": [1, 2, 3]}, "agg_lag_dict": { "orders_list": [[7, 7 * 2, 7 * 3]], "interval_list": [(8, 7 * 2)]}, "series_na_fill_func": lambda s: s.bfill().ffill()}, "regressor2": "auto"}
Check the docstring of
build_autoreg_df_multi
for more details for each argument.
- uncertainty: dict [str, any] or None, optional
How to model the uncertainty. A dictionary with keys corresponding to parameters in
forecast
.Allowed keys:
"uncertainty_dict"
.- custom: dict [str, any] or None, optional
Custom parameters that don’t fit the categories above. A dictionary with keys corresponding to parameters in
forecast
.- Allowed keys:
"silverkite"
,"silverkite_diagnostics"
,"origin_for_time_vars"
,"extra_pred_cols"
,"drop_pred_cols"
,"explicit_pred_cols"
,"fit_algorithm_dict"
,"min_admissible_value"
,"max_admissible_value"
.
Note
"extra_pred_cols"
should contain the desired growth terms, regressor names, and event names.fit_algorithm_dict
is a dictionary withfit_algorithm
andfit_algorithm_params
parameters toforecast
:- fit_algorithm_dictdict or None, optional
How to fit the model. A dictionary with the following optional keys.
"fit_algorithm"
str, optional, default “linear”The type of predictive model used in fitting.
See
fit_model_via_design_matrix
for available options and their parameters."fit_algorithm_params"
dict or None, optional, default NoneParameters passed to the requested fit_algorithm. If None, uses the defaults in
fit_model_via_design_matrix
.
- hyperparameter_override: dict [str, any] or None or list [dict [str, any] or None], optional
After the above model components are used to create a hyperparameter grid, the result is updated by this dictionary, to create new keys or override existing ones. Allows for complete customization of the grid search.
Keys should have format
{named_step}__{parameter_name}
for the named steps of thesklearn.pipeline.Pipeline
returned by this function. Seesklearn.pipeline.Pipeline
.For example:
hyperparameter_override={ "estimator__origin_for_time_vars": 2018.0, "input__response__null__impute_algorithm": "ts_interpolate", "input__response__null__impute_params": {"orders": [7, 14]}, "input__regressors_numeric__normalize__normalize_algorithm": "RobustScaler", }
If a list of dictionaries, grid search will be done for each dictionary in the list. Each dictionary in the list override the defaults. This enables grid search over specific combinations of parameters to reduce the search space.
For example, the first dictionary could define combinations of parameters for a “complex” model, and the second dictionary could define combinations of parameters for a “simple” model, to prevent mixed combinations of simple and complex.
Or the first dictionary could grid search over fit algorithm, and the second dictionary could use a single fit algorithm and grid search over seasonality.
The result is passed as the
param_distributions
parameter tosklearn.model_selection.RandomizedSearchCV
.
- model_template: str
This class only accepts “SK”.
-
DEFAULT_MODEL_TEMPLATE
= 'SK'¶ The default model template. See
ModelTemplateEnum
. Uses a string to avoid circular imports. Overrides the value fromForecastConfigDefaults
.
-
property
allow_model_template_list
¶ SilverkiteTemplate does not allow config.model_template to be a list.
-
property
allow_model_components_param_list
¶ SilverkiteTemplate does not allow config.model_components_param to be a list.
-
get_regressor_cols
()[source]¶ Returns regressor column names.
Implements the method in
BaseTemplate
.The intersection of
extra_pred_cols
from model components andself.df
columns, excludingtime_col
andvalue_col
.- Returns
regressor_cols – See
forecast_pipeline
.- Return type
list [str] or None
-
get_lagged_regressor_info
()[source]¶ Returns lagged regressor column names and minimal/maximal lag order. The lag order can be used to check potential imputation in the computation of lags.
Implements the method in
BaseTemplate
.- Returns
lagged_regressor_info – A dictionary that includes the lagged regressor column names and maximal/minimal lag order The keys are:
- lagged_regressor_colslist [str] or None
See
forecast_pipeline
.
overall_min_lag_order : int or None overall_max_lag_order : int or None
For example:
self.config.model_components_param.lagged_regressors["lagged_regressor_dict"] = [ {"regressor1": { "lag_dict": {"orders": [7]}, "agg_lag_dict": { "orders_list": [[7, 7 * 2, 7 * 3]], "interval_list": [(8, 7 * 2)]}, "series_na_fill_func": lambda s: s.bfill().ffill()} }, {"regressor2": { "lag_dict": {"orders": [2]}, "agg_lag_dict": { "orders_list": [[7, 7 * 2]], "interval_list": [(8, 7 * 2)]}, "series_na_fill_func": lambda s: s.bfill().ffill()} }, {"regressor3": "auto"} ]
Then the function returns:
lagged_regressor_info = { "lagged_regressor_cols": ["regressor1", "regressor2", "regressor3"], "overall_min_lag_order": 2, "overall_max_lag_order": 21 }
Note that “regressor3” is skipped as the “auto” option makes sure the lag order is proper.
- Return type
dict
-
get_hyperparameter_grid
()[source]¶ Returns hyperparameter grid.
Implements the method in
BaseTemplate
.Uses
self.time_properties
andself.config
to generate the hyperparameter grid.Converts model components and time properties into
SilverkiteEstimator
hyperparameters.Notes
forecast_pipeline
handles the train/test splits according toEvaluationPeriodParam
, soestimator__train_test_thresh
andestimator__training_fraction
are always None.estimator__changepoint_detector
is always None, to prevent leaking future information into the past. Passchangepoints_dict
with method=”auto” for automatic detection.- Returns
hyperparameter_grid – See
forecast_pipeline
. The output dictionary values are lists, combined in grid search.- Return type
dict, list [dict] or None
-
static
apply_computation_defaults
(computation: Optional[greykite.framework.templates.autogen.forecast_config.ComputationParam] = None) → greykite.framework.templates.autogen.forecast_config.ComputationParam¶ Applies the default ComputationParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a ComputationParam object.
- Parameters
computation (
ComputationParam
or None) – The ComputationParam object.- Returns
computation – Valid ComputationParam object with the provided attribute values and the default attribute values if not.
- Return type
-
static
apply_evaluation_metric_defaults
(evaluation: Optional[greykite.framework.templates.autogen.forecast_config.EvaluationMetricParam] = None) → greykite.framework.templates.autogen.forecast_config.EvaluationMetricParam¶ Applies the default EvaluationMetricParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a EvaluationMetricParam object.
- Parameters
evaluation (
EvaluationMetricParam
or None) – The EvaluationMetricParam object.- Returns
evaluation – Valid EvaluationMetricParam object with the provided attribute values and the default attribute values if not.
- Return type
-
static
apply_evaluation_period_defaults
(evaluation: Optional[greykite.framework.templates.autogen.forecast_config.EvaluationPeriodParam] = None) → greykite.framework.templates.autogen.forecast_config.EvaluationPeriodParam¶ Applies the default EvaluationPeriodParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a EvaluationPeriodParam object.
- Parameters
evaluation (
EvaluationPeriodParam
or None) – The EvaluationMetricParam object.- Returns
evaluation – Valid EvaluationPeriodParam object with the provided attribute values and the default attribute values if not.
- Return type
-
apply_forecast_config_defaults
(config: Optional[greykite.framework.templates.autogen.forecast_config.ForecastConfig] = None) → greykite.framework.templates.autogen.forecast_config.ForecastConfig¶ Applies the default Forecast Config values to the given config. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input config is None, it creates a Forecast Config.
- Parameters
config (
ForecastConfig
or None) – Forecast configuration if available. SeeForecastConfig
.- Returns
config – A valid Forecast Config which contains the provided attribute values and the default attribute values if not.
- Return type
ForecastConfig
-
static
apply_metadata_defaults
(metadata: Optional[greykite.framework.templates.autogen.forecast_config.MetadataParam] = None) → greykite.framework.templates.autogen.forecast_config.MetadataParam¶ Applies the default MetadataParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a MetadataParam object.
- Parameters
metadata (
MetadataParam
or None) – The MetadataParam object.- Returns
metadata – Valid MetadataParam object with the provided attribute values and the default attribute values if not.
- Return type
-
static
apply_model_components_defaults
(model_components: Optional[Union[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, List[Optional[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam]]]] = None) → Union[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, List[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam]]¶ Applies the default ModelComponentsParam values to the given object.
Converts None to a ModelComponentsParam object. Unpacks a list of a single element to the element itself.
- Parameters
model_components (
ModelComponentsParam
or None or list of such items) – The ModelComponentsParam object.- Returns
model_components – Valid ModelComponentsParam object with the provided attribute values and the default attribute values if not.
- Return type
ModelComponentsParam
or list of such items
-
apply_model_template_defaults
(model_template: Optional[Union[str, List[Optional[str]]]] = None) → Union[str, List[str]]¶ Applies the default ModelComponentsParam values to the given object.
Unpacks a list of a single element to the element itself. Sets default value if None.
- Parameters
model_template (str or None or list [None, str]) – The model template name. See valid names in
ModelTemplateEnum
.- Returns
model_template – The model template name, with defaults value used if not provided.
- Return type
str or list [str]
-
apply_template_for_pipeline_params
(df: pandas.core.frame.DataFrame, config: Optional[greykite.framework.templates.autogen.forecast_config.ForecastConfig] = None) → Dict[source]¶ Explicitly calls the method in
BaseTemplate
to make use of the decorator in this class.- Parameters
df (
pandas.DataFrame
) – The time series dataframe withtime_col
andvalue_col
and optional regressor columns.config (
ForecastConfig
.) – TheForecastConfig
class that includes model training parameters.
- Returns
pipeline_parameters – The pipeline parameters consumable by
forecast_pipeline
.- Return type
dict
-
property
estimator
¶ The estimator instance to use as the final step in the pipeline. An instance of
greykite.sklearn.estimator.base_forecast_estimator.BaseForecastEstimator
.
-
get_forecast_time_properties
()¶ Returns forecast time parameters.
Uses
self.df
,self.config
,self.regressor_cols
.Available parameters:
self.df
self.config
self.score_func
self.score_func_greater_is_better
self.regressor_cols
self.lagged_regressor_cols
self.estimator
self.pipeline
- Returns
time_properties – Time properties dictionary (likely produced by
get_forecast_time_properties
) with keys:"period"
intPeriod of each observation (i.e. minimum time between observations, in seconds).
"simple_freq"
SimpleTimeFrequencyEnumSimpleTimeFrequencyEnum
member corresponding to data frequency."num_training_points"
intNumber of observations for training.
"num_training_days"
intNumber of days for training.
"start_year"
intStart year of the training period.
"end_year"
intEnd year of the forecast period.
"origin_for_time_vars"
floatContinuous time representation of the first date in
df
.
- Return type
dict [str, any] or None, default None
-
get_pipeline
()¶ Returns pipeline.
Implementation may be overridden by subclass if a different pipeline is desired.
Uses
self.estimator
,self.score_func
,self.score_func_greater_is_better
,self.config
,self.regressor_cols
.Available parameters:
self.df
self.config
self.score_func
self.score_func_greater_is_better
self.regressor_cols
self.estimator
- Returns
pipeline – See
forecast_pipeline
.- Return type
Prophet Template¶
-
class
greykite.framework.templates.prophet_template.
ProphetTemplate
(estimator: Optional[greykite.sklearn.estimator.base_forecast_estimator.BaseForecastEstimator] = None)[source]¶ A template for
ProphetEstimator
.Takes input data and optional configuration parameters to customize the model. Returns a set of parameters to call
forecast_pipeline
.Notes
The attributes of a
ForecastConfig
forProphetEstimator
are:- computation_param: ComputationParam or None, default None
How to compute the result. See
ComputationParam
.- coverage: float or None, default None
Intended coverage of the prediction bands (0.0 to 1.0) If None, the upper/lower predictions are not returned Same as coverage in
forecast_pipeline
- evaluation_metric_param: EvaluationMetricParam or None, default None
What metrics to evaluate. See
EvaluationMetricParam
.- evaluation_period_param: EvaluationPeriodParam or None, default None
How to split data for evaluation. See
EvaluationPeriodParam
.- forecast_horizon: int or None, default None
Number of periods to forecast into the future. Must be > 0 If None, default is determined from input data frequency Same as forecast_horizon in forecast_pipeline
- metadata_param: MetadataParam or None, default None
Information about the input data. See
MetadataParam
.- model_components_param:
ModelComponentsParam
or None, default None Parameters to tune the model. See
ModelComponentsParam
. The fields are dictionaries with the following items.- seasonality: dict [str, any] or None
Seasonality config dictionary, with the following optional keys.
"seasonality_mode"
: str or None or list of such values for grid searchCan be ‘additive’ (default) or ‘multiplicative’.
"seasonality_prior_scale"
: float or None or list of such values for grid searchParameter modulating the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, smaller values dampen the seasonality. Specify for individual seasonalities using add_seasonality_dict.
"yearly_seasonality"
: str or bool or int or list of such values for grid search, default ‘auto’Determines the yearly seasonality Can be ‘auto’, True, False, or a number of Fourier terms to generate.
"weekly_seasonality"
: str or bool or int or list of such values for grid search, default ‘auto’Determines the weekly seasonality Can be ‘auto’, True, False, or a number of Fourier terms to generate.
"daily_seasonality"
: str or bool or int or list of such values for grid search, default ‘auto’Determines the daily seasonality Can be ‘auto’, True, False, or a number of Fourier terms to generate.
"add_seasonality_dict"
: dict or None or list of such values for grid searchdict of custom seasonality parameters to be added to the model, default=None Key is the seasonality component name e.g. ‘monthly’; parameters are specified via dict. See
prophet_estimator
for details.
- growth: dict [str, any] or None
Specifies the growth parameter configuration. Dictionary with the following optional key:
"growth_term"
: str or None or list of such values for grid searchHow to model the growth. Valid options are “linear” and “logistic” Specify a linear or logistic trend, these terms have their origin at the train start date.
- events: dict [str, any] or None
Holiday/events configuration dictionary with the following optional keys:
"holiday_lookup_countries"
: list [str] or “auto” or NoneWhich countries’ holidays to include. Must contain all the holidays you intend to model. If “auto”, uses default list of countries with large contribution to Internet traffic. If None or an empty list, no holidays are modeled.
"holidays_prior_scale"
: float or None or list of such values for grid search, default 10.0Modulates the strength of the holiday effect.
"holiday_pre_num_days"
: list [int] or None, default 2Model holiday effects for holiday_pre_num_days days before the holiday. Grid search is not supported. Must be a list with one element or None.
"holiday_post_num_days"
: list [int] or None, default 2Model holiday effects for holiday_post_num_days days after the holiday Grid search is not supported. Must be a list with one element or None.
- changepoints: dict [str, any] or None
Specifies the changepoint configuration. Dictionary with the following optional keys:
"changepoint_prior_scale"
float or None or list of such values for grid search, default 0.05Parameter modulating the flexibility of the automatic changepoint selection. Large values will allow many changepoints, small values will allow few changepoints.
"changepoints"
list [datetime.datetime
] or None or list of such values for grid search, default NoneList of dates at which to include potential changepoints. If not specified, potential changepoints are selected automatically.
"n_changepoints"
int or None or list of such values for grid search, default 25Number of potential changepoints to include. Not used if input changepoints is supplied. If changepoints is not supplied, then n_changepoints potential changepoints are selected uniformly from the first changepoint_range proportion of the history.
"changepoint_range"
float or None or list of such values for grid search, default 0.8Proportion of history in which trend changepoints will be estimated. Permitted values: (0,1] Not used if input changepoints is supplied.
- regressors: dict [str, any] or None
Specifies the regressors to include in the model (e.g. macro-economic factors). Dictionary with the following optional keys:
"add_regressor_dict"
dict or None or list of such values for grid search, default NoneDictionary of extra regressors to be modeled. See
ProphetEstimator
for details.
- uncertainty: dict [str, any] or None
Specifies the uncertainty configuration. A dictionary with the following optional keys:
"mcmc_samples"
int or None or list of such values for grid search, default 0if greater than 0, will do full Bayesian inference with the specified number of MCMC samples. If 0, will do MAP estimation.
"uncertainty_samples"
int or None or list of such values for grid search, default 1000Number of simulated draws used to estimate uncertainty intervals. Setting this value to 0 or False will disable uncertainty estimation and speed up the calculation.
- hyperparameter_override: dict [str, any] or None or list [dict [str, any] or None]
After the above model components are used to create a hyperparameter grid, the result is updated by this dictionary, to create new keys or override existing ones. Allows for complete customization of the grid search.
Keys should have format
{named_step}__{parameter_name}
for the named steps of thesklearn.pipeline.Pipeline
returned by this function. Seesklearn.pipeline.Pipeline
.For example:
hyperparameter_override={ "estimator__yearly_seasonality": [True, False], "estimator__seasonality_prior_scale": [5.0, 15.0], "input__response__null__impute_algorithm": "ts_interpolate", "input__response__null__impute_params": {"orders": [7, 14]}, "input__regressors_numeric__normalize__normalize_algorithm": "RobustScaler", }
If a list of dictionaries, grid search will be done for each dictionary in the list. Each dictionary in the list override the defaults. This enables grid search over specific combinations of parameters to reduce the search space.
For example, the first dictionary could define combinations of parameters for a “complex” model, and the second dictionary could define combinations of parameters for a “simple” model, to prevent mixed combinations of simple and complex.
Or the first dictionary could grid search over fit algorithm, and the second dictionary could use a single fit algorithm and grid search over seasonality.
The result is passed as the
param_distributions
parameter tosklearn.model_selection.RandomizedSearchCV
.- autoregression: dict [str, any] or None
Ignored. Prophet template does not support autoregression.
- lagged_regressors: dict [str, any] or None
Ignored. Prophet template does not support lagged regressors.
- custom: dict [str, any] or None
Ignored. There are no custom options.
- model_template: str
This class only accepts “PROPHET”.
-
DEFAULT_MODEL_TEMPLATE
= 'PROPHET'¶ The default model template. See
ModelTemplateEnum
. Uses a string to avoid circular imports. Overrides the value fromForecastConfigDefaults
.
-
HOLIDAY_LOOKUP_COUNTRIES_AUTO
= ('UnitedStates', 'UnitedKingdom', 'India', 'France', 'China')¶ Default holiday countries to use if countries=’auto’
-
property
allow_model_template_list
¶ ProphetTemplate does not allow config.model_template to be a list.
-
property
allow_model_components_param_list
¶ ProphetTemplate does not allow config.model_components_param to be a list.
-
get_prophet_holidays
(year_list, countries='auto', lower_window=- 2, upper_window=2)[source]¶ Generates holidays for Prophet model.
- Parameters
year_list (list [int]) – List of years for selecting the holidays across given countries.
countries (list [str] or “auto” or None, default “auto”) –
Countries for selecting holidays.
If “auto”, uses Top Countries for internet traffic.
If a list, a list of country names.
If None, the function returns None.
lower_window (int or None, default -2) – Negative integer. Model holiday effects for given number of days before the holiday.
upper_window (int or None, default 2) – Positive integer. Model holiday effects for given number of days after the holiday.
- Returns
holidays – holidays dataframe to pass to Prophet’s
holidays
argument.- Return type
See also
None
,to
,None
-
get_regressor_cols
()[source]¶ Returns regressor column names.
Implements the method in
BaseTemplate
.- Returns
regressor_cols – The names of regressor columns used in any hyperparameter set requested by
model_components
. None if there are no regressors.- Return type
list [str] or None
-
apply_prophet_model_components_defaults
(model_components=None, time_properties=None)[source]¶ Sets default values for
model_components
.Called by
get_hyperparameter_grid
aftertime_properties` is defined. Requires ``time_properties
as well asmodel_components
so we do not simply overrideapply_model_components_defaults
.- Parameters
model_components (
ModelComponentsParam
or None, default None) – Configuration of model growth, seasonality, events, etc. See the docstring of this class for details.time_properties (dict [str, any] or None, default None) –
Time properties dictionary (likely produced by
get_forecast_time_properties
) with keys:"period"
intPeriod of each observation (i.e. minimum time between observations, in seconds).
"simple_freq"
SimpleTimeFrequencyEnumSimpleTimeFrequencyEnum
member corresponding to data frequency."num_training_points"
intNumber of observations for training.
"num_training_days"
intNumber of days for training.
"start_year"
intStart year of the training period.
"end_year"
intEnd year of the forecast period.
"origin_for_time_vars"
floatContinuous time representation of the first date in
df
.
If None, start_year is set to 2015 and end_year to 2030.
- Returns
model_components – The provided
model_components
with default values set- Return type
-
get_hyperparameter_grid
()[source]¶ Returns hyperparameter grid.
Implements the method in
BaseTemplate
.Uses
self.time_properties
andself.config
to generate the hyperparameter grid.Converts model components and time properties into
ProphetEstimator
hyperparameters.- Returns
hyperparameter_grid –
ProphetEstimator
hyperparameters.See
forecast_pipeline
. The output dictionary values are lists, combined in grid search.- Return type
dict [str, list [any]] or None
-
static
apply_computation_defaults
(computation: Optional[greykite.framework.templates.autogen.forecast_config.ComputationParam] = None) → greykite.framework.templates.autogen.forecast_config.ComputationParam¶ Applies the default ComputationParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a ComputationParam object.
- Parameters
computation (
ComputationParam
or None) – The ComputationParam object.- Returns
computation – Valid ComputationParam object with the provided attribute values and the default attribute values if not.
- Return type
-
static
apply_evaluation_metric_defaults
(evaluation: Optional[greykite.framework.templates.autogen.forecast_config.EvaluationMetricParam] = None) → greykite.framework.templates.autogen.forecast_config.EvaluationMetricParam¶ Applies the default EvaluationMetricParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a EvaluationMetricParam object.
- Parameters
evaluation (
EvaluationMetricParam
or None) – The EvaluationMetricParam object.- Returns
evaluation – Valid EvaluationMetricParam object with the provided attribute values and the default attribute values if not.
- Return type
-
static
apply_evaluation_period_defaults
(evaluation: Optional[greykite.framework.templates.autogen.forecast_config.EvaluationPeriodParam] = None) → greykite.framework.templates.autogen.forecast_config.EvaluationPeriodParam¶ Applies the default EvaluationPeriodParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a EvaluationPeriodParam object.
- Parameters
evaluation (
EvaluationPeriodParam
or None) – The EvaluationMetricParam object.- Returns
evaluation – Valid EvaluationPeriodParam object with the provided attribute values and the default attribute values if not.
- Return type
-
apply_forecast_config_defaults
(config: Optional[greykite.framework.templates.autogen.forecast_config.ForecastConfig] = None) → greykite.framework.templates.autogen.forecast_config.ForecastConfig¶ Applies the default Forecast Config values to the given config. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input config is None, it creates a Forecast Config.
- Parameters
config (
ForecastConfig
or None) – Forecast configuration if available. SeeForecastConfig
.- Returns
config – A valid Forecast Config which contains the provided attribute values and the default attribute values if not.
- Return type
ForecastConfig
-
static
apply_metadata_defaults
(metadata: Optional[greykite.framework.templates.autogen.forecast_config.MetadataParam] = None) → greykite.framework.templates.autogen.forecast_config.MetadataParam¶ Applies the default MetadataParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a MetadataParam object.
- Parameters
metadata (
MetadataParam
or None) – The MetadataParam object.- Returns
metadata – Valid MetadataParam object with the provided attribute values and the default attribute values if not.
- Return type
-
static
apply_model_components_defaults
(model_components: Optional[Union[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, List[Optional[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam]]]] = None) → Union[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, List[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam]]¶ Applies the default ModelComponentsParam values to the given object.
Converts None to a ModelComponentsParam object. Unpacks a list of a single element to the element itself.
- Parameters
model_components (
ModelComponentsParam
or None or list of such items) – The ModelComponentsParam object.- Returns
model_components – Valid ModelComponentsParam object with the provided attribute values and the default attribute values if not.
- Return type
ModelComponentsParam
or list of such items
-
apply_model_template_defaults
(model_template: Optional[Union[str, List[Optional[str]]]] = None) → Union[str, List[str]]¶ Applies the default ModelComponentsParam values to the given object.
Unpacks a list of a single element to the element itself. Sets default value if None.
- Parameters
model_template (str or None or list [None, str]) – The model template name. See valid names in
ModelTemplateEnum
.- Returns
model_template – The model template name, with defaults value used if not provided.
- Return type
str or list [str]
-
property
estimator
¶ The estimator instance to use as the final step in the pipeline. An instance of
greykite.sklearn.estimator.base_forecast_estimator.BaseForecastEstimator
.
-
get_forecast_time_properties
()¶ Returns forecast time parameters.
Uses
self.df
,self.config
,self.regressor_cols
.Available parameters:
self.df
self.config
self.score_func
self.score_func_greater_is_better
self.regressor_cols
self.lagged_regressor_cols
self.estimator
self.pipeline
- Returns
time_properties – Time properties dictionary (likely produced by
get_forecast_time_properties
) with keys:"period"
intPeriod of each observation (i.e. minimum time between observations, in seconds).
"simple_freq"
SimpleTimeFrequencyEnumSimpleTimeFrequencyEnum
member corresponding to data frequency."num_training_points"
intNumber of observations for training.
"num_training_days"
intNumber of days for training.
"start_year"
intStart year of the training period.
"end_year"
intEnd year of the forecast period.
"origin_for_time_vars"
floatContinuous time representation of the first date in
df
.
- Return type
dict [str, any] or None, default None
-
get_lagged_regressor_info
()¶ Returns lagged regressor column names and minimal/maximal lag order. The lag order can be used to check potential imputation in the computation of lags.
Can be overridden by subclass.
- Returns
lagged_regressor_info – A dictionary that includes the lagged regressor column names and maximal/minimal lag order The keys are:
- lagged_regressor_colslist [str] or None
See
forecast_pipeline
.
overall_min_lag_order : int or None overall_max_lag_order : int or None
- Return type
dict
-
get_pipeline
()¶ Returns pipeline.
Implementation may be overridden by subclass if a different pipeline is desired.
Uses
self.estimator
,self.score_func
,self.score_func_greater_is_better
,self.config
,self.regressor_cols
.Available parameters:
self.df
self.config
self.score_func
self.score_func_greater_is_better
self.regressor_cols
self.estimator
- Returns
pipeline – See
forecast_pipeline
.- Return type
-
apply_template_for_pipeline_params
(df: pandas.core.frame.DataFrame, config: Optional[greykite.framework.templates.autogen.forecast_config.ForecastConfig] = None) → Dict[source]¶ Explicitly calls the method in
BaseTemplate
to make use of the decorator in this class.- Parameters
df (
pandas.DataFrame
) – The time series dataframe withtime_col
andvalue_col
and optional regressor columns.config (
ForecastConfig
.) – TheForecastConfig
class that includes model training parameters.
- Returns
pipeline_parameters – The pipeline parameters consumable by
forecast_pipeline
.- Return type
dict
-
class
greykite.sklearn.estimator.prophet_estimator.
ProphetEstimator
(score_func=<function mean_squared_error>, coverage=0.8, null_model_params=None, growth='linear', changepoints=None, n_changepoints=25, changepoint_range=0.8, yearly_seasonality='auto', weekly_seasonality='auto', daily_seasonality='auto', holidays=None, seasonality_mode='additive', seasonality_prior_scale=10.0, holidays_prior_scale=10.0, changepoint_prior_scale=0.05, mcmc_samples=0, uncertainty_samples=1000, add_regressor_dict=None, add_seasonality_dict=None)[source]¶ Wrapper for Facebook Prophet model.
- Parameters
score_func (callable) – see BaseForecastEstimator
coverage (float between [0.0, 1.0]) – see BaseForecastEstimator
null_model_params (dict with arguments to define DummyRegressor null model, optional, default=None) – see BaseForecastEstimator
add_regressor_dict (dictionary of extra regressors to be added to the model, optional, default=None) –
These should be available for training and entire prediction interval.
Dictionary format:
add_regressor_dict={ # we can add as many regressors as we want, in the following format "reg_col1": { "prior_scale": 10, "standardize": True, "mode": 'additive' }, "reg_col2": { "prior_scale": 20, "standardize": True, "mode": 'multiplicative' } }
add_seasonality_dict (dict of custom seasonality parameters to be added to the model, optional, default=None) –
parameter details: https://github.com/facebook/prophet/blob/master/python/fbprophet/forecaster.py - refer to add_seasonality() function. Key is the seasonality component name e.g. ‘monthly’; parameters are specified via dict.
Dictionary format:
add_seasonality_dict={ 'monthly': { 'period': 30.5, 'fourier_order': 5 }, 'weekly': { 'period': 7, 'fourier_order': 20, 'prior_scale': 0.6, 'mode': 'additive', 'condition_name': 'condition_col' # takes a bool column in df with True/False values. This means that # the seasonality will only be applied to dates where the condition_name column is True. }, 'yearly': { 'period': 365.25, 'fourier_order': 10, 'prior_scale': 0.2, 'mode': 'additive' } }
Note: If there is a conflict in built-in and custom seasonality e.g. both have “yearly”, then custom seasonality will be used and Model will throw a warning such as: “INFO:fbprophet:Found custom seasonality named “yearly”, disabling built-in yearly seasonality.”
kwargs (additional parameters) –
Other parameters are the same as Prophet model, with one exception:
interval_width
is specified bycoverage
.See source code
__init__
for the parameter names, and refer to Prophet documentation for a description:
-
model
¶ Prophet model object
- Type
Prophet
object
-
forecast
¶ Output of predict method of
Prophet
.- Type
-
fit
(X, y=None, time_col='ts', value_col='y', **fit_params)[source]¶ Fits fbprophet model.
- Parameters
X (
pandas.DataFrame
) – Input timeseries, with timestamp column, value column, and any additional regressors. The value column is the response, included in X to allow transformation bysklearn.pipeline.Pipeline
y (ignored) – The original timeseries values, ignored. (The y for fitting is included in
X
.)time_col (str) – Time column name in
X
value_col (str) – Value column name in
X
fit_params (dict) – additional parameters for null model
- Returns
self – Fitted model is stored in
self.model
.- Return type
self
-
predict
(X, y=None)[source]¶ Creates forecast for dates specified in
X
.- Parameters
X (
pandas.DataFrame
) – Input timeseries with timestamp column and any additional regressors. Timestamps are the dates for prediction. Value column, if provided in X, is ignored.y (ignored) –
- Returns
predictions –
Forecasted values for the dates in
X
. Columns:TIME_COL dates
PREDICTED_COL predictions
PREDICTED_LOWER_COL lower bound of predictions, optional
PREDICTED_UPPER_COL upper bound of predictions, optional
[other columns], optional
PREDICTED_LOWER_COL and PREDICTED_UPPER_COL are present iff coverage is not None
- Return type
-
summary
()[source]¶ Prints input parameters and Prophet model parameters.
- Returns
log_message – log message printed to logging.info()
- Return type
-
plot_components
(uncertainty=True, plot_cap=True, weekly_start=0, yearly_start=0, figsize=None)[source]¶ Plot the
Prophet
forecast components on the dataset passed topredict
.Will plot whichever are available of: trend, holidays, weekly seasonality, and yearly seasonality.
- Parameters
uncertainty (bool, optional, default True) – Boolean to plot uncertainty intervals.
plot_cap (bool, optional, default True) – Boolean indicating if the capacity should be shown in the figure, if available.
weekly_start (int, optional, default 0) – Specifying the start day of the weekly seasonality plot. 0 (default) starts the week on Sunday. 1 shifts by 1 day to Jan 2, and so on.
yearly_start (int, optional, default 0) – Specifying the start day of the yearly seasonality plot. 0 (default) starts the year on Jan 1. 1 shifts by 1 day to Jan 2, and so on.
figsize (tuple , optional, default None) – Width, height in inches.
- Returns
fig – A matplotlib figure.
- Return type
matplotlib.figure.Figure
-
fit_uncertainty
(df: pandas.core.frame.DataFrame, uncertainty_dict: dict, **kwargs)¶ Fits the uncertainty model with a given
df
anduncertainty_dict
.- Parameters
df (
pandas.DataFrame
) – A dataframe representing the data to fit the uncertainty model.uncertainty_dict (dict [str, any]) –
The uncertainty model specification. It should have the following keys:
- ”uncertainty_method”: a string that is in
UncertaintyMethodEnum
.
”params”: a dictionary that includes any additional parameters needed by the uncertainty method.
kwargs (additional parameters to be fed into the uncertainty method.) – These parameters are from the estimator attributes, not given by user.
- Returns
- Return type
The function sets
self.uncertainty_model
and does not return anything.
-
get_params
(deep=True)¶ Get parameters for this estimator.
-
predict_uncertainty
(df: pandas.core.frame.DataFrame)¶ Makes predictions of prediction intervals for
df
based on the predictions andself.uncertainty_model
.- Parameters
df (
pandas.DataFrame
) – The dataframe to calculate prediction intervals upon. It should have eitherself.value_col_
or PREDICT_COL which the prediction interval is based on.- Returns
result_df – The
df
with prediction interval columns.- Return type
-
score
(X, y, sample_weight=None)¶ Default scorer for the estimator (Used in GridSearchCV/RandomizedSearchCV if scoring=None)
Notes
If null_model_params is not None, returns R2_null_model_score of model error relative to null model, evaluated by score_func.
If null_model_params is None, returns score_func of the model itself.
By default, grid search (with no scoring parameter) optimizes improvement of
score_func
against null model.To optimize a different score function, pass scoring to GridSearchCV/RandomizedSearchCV.
- Parameters
X (
pandas.DataFrame
) – Input timeseries with timestamp column and any additional regressors. Value column, if provided in X, is ignoredy (
pandas.Series
ornumpy.array
) – Actual value, used to compute errorsample_weight (
pandas.Series
ornumpy.array
) – ignored
- Returns
score – Comparison of predictions against null predictions, according to specified score function
- Return type
float or None
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
ARIMA Template¶
-
class
greykite.framework.templates.auto_arima_template.
AutoArimaTemplate
(estimator: greykite.sklearn.estimator.base_forecast_estimator.BaseForecastEstimator = AutoArimaEstimator())[source]¶ A template for
AutoArimaEstimator
.Takes input data and optional configuration parameters to customize the model. Returns a set of parameters to call
forecast_pipeline
.Notes
The attributes of a
ForecastConfig
forAutoArimaEstimator
are:- computation_param: ComputationParam or None, default None
How to compute the result. See
ComputationParam
.- coverage: float or None, default None
Intended coverage of the prediction bands (0.0 to 1.0) If None, the upper/lower predictions are not returned Same as coverage in
forecast_pipeline
- evaluation_metric_param: EvaluationMetricParam or None, default None
What metrics to evaluate. See
EvaluationMetricParam
.- evaluation_period_param: EvaluationPeriodParam or None, default None
How to split data for evaluation. See
EvaluationPeriodParam
.- forecast_horizon: int or None, default None
Number of periods to forecast into the future. Must be > 0 If None, default is determined from input data frequency Same as forecast_horizon in forecast_pipeline
- metadata_param: MetadataParam or None, default None
Information about the input data. See
MetadataParam
.- model_components_param:
ModelComponentsParam
or None, default None Parameters to tune the model. See
ModelComponentsParam
. The fields are dictionaries with the following items.- seasonality: dict [str, any] or None
Ignored. Pass the relevant Auto Arima arguments via custom.
- growth: dict [str, any] or None
Ignored. Pass the relevant Auto Arima arguments via custom.
- events: dict [str, any] or None
Ignored. Pass the relevant Auto Arima arguments via custom.
- changepoints: dict [str, any] or None
Ignored. Pass the relevant Auto Arima arguments via custom.
- regressors: dict [str, any] or None
Ignored. Auto Arima template currently does not support regressors.
- uncertainty: dict [str, any] or None
Ignored. Pass the relevant Auto Arima arguments via custom.
- hyperparameter_override: dict [str, any] or None or list [dict [str, any] or None]
After the above model components are used to create a hyperparameter grid, the result is updated by this dictionary, to create new keys or override existing ones. Allows for complete customization of the grid search.
Keys should have format
{named_step}__{parameter_name}
for the named steps of thesklearn.pipeline.Pipeline
returned by this function. Seesklearn.pipeline.Pipeline
.For example:
hyperparameter_override={ "estimator__max_p": [8, 10], "estimator__information_criterion": ["bic"], }
If a list of dictionaries, grid search will be done for each dictionary in the list. Each dictionary in the list override the defaults. This enables grid search over specific combinations of parameters to reduce the search space.
For example, the first dictionary could define combinations of parameters for a “complex” model, and the second dictionary could define combinations of parameters for a “simple” model, to prevent mixed combinations of simple and complex.
Or the first dictionary could grid search over fit algorithm, and the second dictionary could use a single fit algorithm and grid search over seasonality.
The result is passed as the
param_distributions
parameter tosklearn.model_selection.RandomizedSearchCV
.- autoregression: dict [str, any] or None
Ignored. Pass the relevant Auto Arima arguments via custom.
- custom: dict [str, any] or None
Any parameter in the
AutoArimaEstimator
can be passed.
- model_template: str
This class only accepts “AUTO_ARIMA”.
-
property
allow_model_template_list
¶ AutoArimaTemplate does not allow config.model_template to be a list.
-
property
allow_model_components_param_list
¶ AutoArimaTemplate does not allow config.model_components_param to be a list.
-
get_regressor_cols
()[source]¶ Returns regressor column names from the model components.
Currently does not implement regressors.
-
apply_auto_arima_model_components_defaults
(model_components=None)[source]¶ Sets default values for
model_components
.- Parameters
model_components (
ModelComponentsParam
or None, default None) – Configuration of model growth, seasonality, events, etc. See the docstring of this class for details.- Returns
model_components – The provided
model_components
with default values set- Return type
-
get_hyperparameter_grid
()[source]¶ Returns hyperparameter grid.
Implements the method in
BaseTemplate
.Uses
self.time_properties
andself.config
to generate the hyperparameter grid.Converts model components into
AutoArimaEstimator
. hyperparameters.The output dictionary values are lists, combined via grid search in
forecast_pipeline
.- Parameters
model_components (
ModelComponentsParam
or None, default None) – Configuration of parameter space to search the order (p, d, q etc.) of SARIMAX model. Seeauto_arima_template
for details.coverage (float or None, default=0.95) – Intended coverage of the prediction bands (0.0 to 1.0)
- Returns
hyperparameter_grid –
AutoArimaEstimator
hyperparameters.See
forecast_pipeline
. The output dictionary values are lists, combined in grid search.- Return type
dict [str, list [any]] or None
-
apply_template_for_pipeline_params
(df: pandas.core.frame.DataFrame, config: Optional[greykite.framework.templates.autogen.forecast_config.ForecastConfig] = None) → Dict[source]¶ Explicitly calls the method in
BaseTemplate
to make use of the decorator in this class.- Parameters
df (
pandas.DataFrame
) – The time series dataframe withtime_col
andvalue_col
and optional regressor columns.config (
ForecastConfig
.) – TheForecastConfig
class that includes model training parameters.
- Returns
pipeline_parameters – The pipeline parameters consumable by
forecast_pipeline
.- Return type
dict
-
static
apply_computation_defaults
(computation: Optional[greykite.framework.templates.autogen.forecast_config.ComputationParam] = None) → greykite.framework.templates.autogen.forecast_config.ComputationParam¶ Applies the default ComputationParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a ComputationParam object.
- Parameters
computation (
ComputationParam
or None) – The ComputationParam object.- Returns
computation – Valid ComputationParam object with the provided attribute values and the default attribute values if not.
- Return type
-
static
apply_evaluation_metric_defaults
(evaluation: Optional[greykite.framework.templates.autogen.forecast_config.EvaluationMetricParam] = None) → greykite.framework.templates.autogen.forecast_config.EvaluationMetricParam¶ Applies the default EvaluationMetricParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a EvaluationMetricParam object.
- Parameters
evaluation (
EvaluationMetricParam
or None) – The EvaluationMetricParam object.- Returns
evaluation – Valid EvaluationMetricParam object with the provided attribute values and the default attribute values if not.
- Return type
-
static
apply_evaluation_period_defaults
(evaluation: Optional[greykite.framework.templates.autogen.forecast_config.EvaluationPeriodParam] = None) → greykite.framework.templates.autogen.forecast_config.EvaluationPeriodParam¶ Applies the default EvaluationPeriodParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a EvaluationPeriodParam object.
- Parameters
evaluation (
EvaluationPeriodParam
or None) – The EvaluationMetricParam object.- Returns
evaluation – Valid EvaluationPeriodParam object with the provided attribute values and the default attribute values if not.
- Return type
-
apply_forecast_config_defaults
(config: Optional[greykite.framework.templates.autogen.forecast_config.ForecastConfig] = None) → greykite.framework.templates.autogen.forecast_config.ForecastConfig¶ Applies the default Forecast Config values to the given config. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input config is None, it creates a Forecast Config.
- Parameters
config (
ForecastConfig
or None) – Forecast configuration if available. SeeForecastConfig
.- Returns
config – A valid Forecast Config which contains the provided attribute values and the default attribute values if not.
- Return type
ForecastConfig
-
static
apply_metadata_defaults
(metadata: Optional[greykite.framework.templates.autogen.forecast_config.MetadataParam] = None) → greykite.framework.templates.autogen.forecast_config.MetadataParam¶ Applies the default MetadataParam values to the given object. If an expected attribute value is provided, the value is unchanged. Otherwise the default value for it is used. Other attributes are untouched. If the input object is None, it creates a MetadataParam object.
- Parameters
metadata (
MetadataParam
or None) – The MetadataParam object.- Returns
metadata – Valid MetadataParam object with the provided attribute values and the default attribute values if not.
- Return type
-
static
apply_model_components_defaults
(model_components: Optional[Union[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, List[Optional[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam]]]] = None) → Union[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, List[greykite.framework.templates.autogen.forecast_config.ModelComponentsParam]]¶ Applies the default ModelComponentsParam values to the given object.
Converts None to a ModelComponentsParam object. Unpacks a list of a single element to the element itself.
- Parameters
model_components (
ModelComponentsParam
or None or list of such items) – The ModelComponentsParam object.- Returns
model_components – Valid ModelComponentsParam object with the provided attribute values and the default attribute values if not.
- Return type
ModelComponentsParam
or list of such items
-
apply_model_template_defaults
(model_template: Optional[Union[str, List[Optional[str]]]] = None) → Union[str, List[str]]¶ Applies the default ModelComponentsParam values to the given object.
Unpacks a list of a single element to the element itself. Sets default value if None.
- Parameters
model_template (str or None or list [None, str]) – The model template name. See valid names in
ModelTemplateEnum
.- Returns
model_template – The model template name, with defaults value used if not provided.
- Return type
str or list [str]
-
static
apply_template_decorator
(func)[source]¶ Decorator for
apply_template_for_pipeline_params
function.Overrides the method in
BaseTemplate
.- Raises
ValueError if config.model_template != "AUTO_ARIMA" –
-
property
estimator
¶ The estimator instance to use as the final step in the pipeline. An instance of
greykite.sklearn.estimator.base_forecast_estimator.BaseForecastEstimator
.
-
get_forecast_time_properties
()¶ Returns forecast time parameters.
Uses
self.df
,self.config
,self.regressor_cols
.Available parameters:
self.df
self.config
self.score_func
self.score_func_greater_is_better
self.regressor_cols
self.lagged_regressor_cols
self.estimator
self.pipeline
- Returns
time_properties – Time properties dictionary (likely produced by
get_forecast_time_properties
) with keys:"period"
intPeriod of each observation (i.e. minimum time between observations, in seconds).
"simple_freq"
SimpleTimeFrequencyEnumSimpleTimeFrequencyEnum
member corresponding to data frequency."num_training_points"
intNumber of observations for training.
"num_training_days"
intNumber of days for training.
"start_year"
intStart year of the training period.
"end_year"
intEnd year of the forecast period.
"origin_for_time_vars"
floatContinuous time representation of the first date in
df
.
- Return type
dict [str, any] or None, default None
-
get_lagged_regressor_info
()¶ Returns lagged regressor column names and minimal/maximal lag order. The lag order can be used to check potential imputation in the computation of lags.
Can be overridden by subclass.
- Returns
lagged_regressor_info – A dictionary that includes the lagged regressor column names and maximal/minimal lag order The keys are:
- lagged_regressor_colslist [str] or None
See
forecast_pipeline
.
overall_min_lag_order : int or None overall_max_lag_order : int or None
- Return type
dict
-
get_pipeline
()¶ Returns pipeline.
Implementation may be overridden by subclass if a different pipeline is desired.
Uses
self.estimator
,self.score_func
,self.score_func_greater_is_better
,self.config
,self.regressor_cols
.Available parameters:
self.df
self.config
self.score_func
self.score_func_greater_is_better
self.regressor_cols
self.estimator
- Returns
pipeline – See
forecast_pipeline
.- Return type
-
class
greykite.sklearn.estimator.auto_arima_estimator.
AutoArimaEstimator
(score_func: callable = <function mean_squared_error>, coverage: float = 0.9, null_model_params: Optional[Dict] = None, regressor_cols: Optional[List[str]] = None, freq: Optional[float] = None, start_p: Optional[int] = 2, d: Optional[int] = None, start_q: Optional[int] = 2, max_p: Optional[int] = 5, max_d: Optional[int] = 2, max_q: Optional[int] = 5, start_P: Optional[int] = 1, D: Optional[int] = None, start_Q: Optional[int] = 1, max_P: Optional[int] = 2, max_D: Optional[int] = 1, max_Q: Optional[int] = 2, max_order: Optional[int] = 5, m: Optional[int] = 1, seasonal: Optional[bool] = True, stationary: Optional[bool] = False, information_criterion: Optional[str] = 'aic', alpha: Optional[int] = 0.05, test: Optional[str] = 'kpss', seasonal_test: Optional[str] = 'ocsb', stepwise: Optional[bool] = True, n_jobs: Optional[int] = 1, start_params: Optional[Dict] = None, trend: Optional[str] = None, method: Optional[str] = 'lbfgs', maxiter: Optional[int] = 50, offset_test_args: Optional[Dict] = None, seasonal_test_args: Optional[Dict] = None, suppress_warnings: Optional[bool] = True, error_action: Optional[str] = 'trace', trace: Optional[Union[int, bool]] = False, random: Optional[bool] = False, random_state: Optional[Union[int, callable]] = None, n_fits: Optional[int] = 10, out_of_sample_size: Optional[int] = 0, scoring: Optional[str] = 'mse', scoring_args: Optional[Dict] = None, with_intercept: Optional[Union[bool, str]] = 'auto', return_conf_int: Optional[bool] = True, dynamic: Optional[bool] = False)[source]¶ Wrapper for
pmdarima.arima.AutoARIMA
. It currently does not handle the regressor issue when there is gap between train and predict periods.- Parameters
score_func (callable) – see
BaseForecastEstimator
.coverage (float between [0.0, 1.0]) – see
BaseForecastEstimator
.null_model_params (dict with arguments to define DummyRegressor null model, optional, default=None) – see
BaseForecastEstimator
.regressor_cols (list [str], optional, default None) – A list of regressor columns used during training and prediction. If None, no regressor columns are used.
AutoArima documentation for rest of the parameter descriptions (See) –
-
model
¶ Auto arima model object
- Type
AutoArima
object
-
fit_df
¶ The training data used to fit the model.
- Type
pandas.DataFrame
or None
-
forecast
¶ Output of the predict method of
AutoArima
.- Type
-
fit
(X, y=None, time_col='ts', value_col='y', **fit_params)[source]¶ Fits
ARIMA
forecast model.- Parameters
X (
pandas.DataFrame
) – Input timeseries, with timestamp column, value column, and any additional regressors. The value column is the response, included in X to allow transformation bysklearn.pipeline.Pipeline
y (ignored) – The original timeseries values, ignored. (The y for fitting is included in
X
.)time_col (str) – Time column name in
X
value_col (str) – Value column name in
X
fit_params (dict) – additional parameters for null model
- Returns
self – Fitted model is stored in
self.model
.- Return type
self
-
predict
(X, y=None)[source]¶ Creates forecast for the dates specified in
X
. Currently does not support the regressor case where there is gap between train and predict periods.- Parameters
X (
pandas.DataFrame
) – Input timeseries with timestamp column and any additional regressors. Timestamps are the dates for prediction. Value column, if provided inX
, is ignored.y (ignored.) –
- Returns
predictions –
Forecasted values for the dates in
X
. Columns:TIME_COL
: datesPREDICTED_COL
: predictionsPREDICTED_LOWER_COL
: lower bound of predictionsPREDICTED_UPPER_COL
: upper bound of predictions
- Return type
-
summary
()[source]¶ Creates human readable string of how the model works, including relevant diagnostics These details cannot be extracted from the forecast alone Prints model configuration. Extend this in child class to print the trained model parameters.
Log message is printed to the cst.LOGGER_NAME logger.
-
fit_uncertainty
(df: pandas.core.frame.DataFrame, uncertainty_dict: dict, **kwargs)¶ Fits the uncertainty model with a given
df
anduncertainty_dict
.- Parameters
df (
pandas.DataFrame
) – A dataframe representing the data to fit the uncertainty model.uncertainty_dict (dict [str, any]) –
The uncertainty model specification. It should have the following keys:
- ”uncertainty_method”: a string that is in
UncertaintyMethodEnum
.
”params”: a dictionary that includes any additional parameters needed by the uncertainty method.
kwargs (additional parameters to be fed into the uncertainty method.) – These parameters are from the estimator attributes, not given by user.
- Returns
- Return type
The function sets
self.uncertainty_model
and does not return anything.
-
get_params
(deep=True)¶ Get parameters for this estimator.
-
predict_uncertainty
(df: pandas.core.frame.DataFrame)¶ Makes predictions of prediction intervals for
df
based on the predictions andself.uncertainty_model
.- Parameters
df (
pandas.DataFrame
) – The dataframe to calculate prediction intervals upon. It should have eitherself.value_col_
or PREDICT_COL which the prediction interval is based on.- Returns
result_df – The
df
with prediction interval columns.- Return type
-
score
(X, y, sample_weight=None)¶ Default scorer for the estimator (Used in GridSearchCV/RandomizedSearchCV if scoring=None)
Notes
If null_model_params is not None, returns R2_null_model_score of model error relative to null model, evaluated by score_func.
If null_model_params is None, returns score_func of the model itself.
By default, grid search (with no scoring parameter) optimizes improvement of
score_func
against null model.To optimize a different score function, pass scoring to GridSearchCV/RandomizedSearchCV.
- Parameters
X (
pandas.DataFrame
) – Input timeseries with timestamp column and any additional regressors. Value column, if provided in X, is ignoredy (
pandas.Series
ornumpy.array
) – Actual value, used to compute errorsample_weight (
pandas.Series
ornumpy.array
) – ignored
- Returns
score – Comparison of predictions against null predictions, according to specified score function
- Return type
float or None
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
Forecast Pipeline¶
-
greykite.framework.pipeline.pipeline.
forecast_pipeline
(df: pandas.core.frame.DataFrame, time_col='ts', value_col='y', date_format=None, tz=None, freq=None, train_end_date=None, anomaly_info=None, pipeline=None, regressor_cols=None, lagged_regressor_cols=None, estimator=SimpleSilverkiteEstimator(), hyperparameter_grid=None, hyperparameter_budget=None, n_jobs=1, verbose=1, forecast_horizon=None, coverage=0.95, test_horizon=None, periods_between_train_test=None, agg_periods=None, agg_func=None, score_func='MeanAbsolutePercentError', score_func_greater_is_better=False, cv_report_metrics='ALL', null_model_params=None, relative_error_tolerance=None, cv_horizon=None, cv_min_train_periods=None, cv_expanding_window=False, cv_use_most_recent_splits=False, cv_periods_between_splits=None, cv_periods_between_train_test=None, cv_max_splits=3)[source]¶ Computation pipeline for end-to-end forecasting.
Trains a forecast model end-to-end:
checks input data
runs cross-validation to select optimal hyperparameters e.g. best model
evaluates best model on test set
provides forecast of best model (re-trained on all data) into the future
Returns forecasts with methods to plot and see diagnostics. Also returns the fitted pipeline and CV results.
Provides a high degree of customization over training and evaluation parameters:
model
cross validation
evaluation
forecast horizon
See test cases for examples.
- Parameters
df (
pandas.DataFrame
) – Timeseries data to forecast. Contains columns [time_col, value_col], and optional regressor columns Regressor columns should include future values for predictiontime_col (str, default TIME_COL in constants.py) – name of timestamp column in df
value_col (str, default VALUE_COL in constants.py) – name of value column in df (the values to forecast)
date_format (str or None, default None) – strftime format to parse time column, eg
%m/%d/%Y
. Note that%f
will parse all the way up to nanoseconds. If None (recommended), inferred bypandas.to_datetime
.tz (str or None, default None) – Passed to pandas.tz_localize to localize the timestamp
freq (str or None, default None) – Frequency of input data. Used to generate future dates for prediction. Frequency strings can have multiples, e.g. ‘5H’. See https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases for a list of frequency aliases. If None, inferred by
pandas.infer_freq
. Provide this parameter ifdf
has missing timepoints.train_end_date (
datetime.datetime
, optional, default None) – Last date to use for fitting the model. Forecasts are generated after this date. If None, it is set to the last date with a non-null value invalue_col
ofdf
.anomaly_info (dict or list [dict] or None, default None) –
Anomaly adjustment info. Anomalies in
df
are corrected before any forecasting is done.If None, no adjustments are made.
A dictionary containing the parameters to
adjust_anomalous_data
. See that function for details. The possible keys are:"value_col"
strThe name of the column in
df
to adjust. You may adjust the value to forecast as well as any numeric regressors."anomaly_df"
pandas.DataFrame
Adjustments to correct the anomalies.
"start_date_col"
: str, default START_DATE_COLStart date column in
anomaly_df
."end_date_col"
: str, default END_DATE_COLEnd date column in
anomaly_df
."adjustment_delta_col"
: str or None, default NoneImpact column in
anomaly_df
."filter_by_dict"
: dict or None, default NoneUsed to filter
anomaly_df
to the relevant anomalies for thevalue_col
in this dictionary. Key specifies the column name, value specifies the filter value."filter_by_value_col""
: str or None, default NoneAdds
{filter_by_value_col: value_col}
tofilter_by_dict
if not None, for thevalue_col
in this dictionary."adjustment_method"
str (“add” or “subtract”), default “add”How to make the adjustment, if
adjustment_delta_col
is provided.
Accepts a list of such dictionaries to adjust multiple columns in
df
.pipeline (
sklearn.pipeline.Pipeline
or None, default None) – Pipeline to fit. The final named step must be called “estimator”. If None, will use the default Pipeline fromget_basic_pipeline
.regressor_cols (list [str] or None, default None) – A list of regressor columns used in the training and prediction DataFrames. It should contain only the regressors that are being used in the grid search. If None, no regressor columns are used. Regressor columns that are unavailable in
df
are dropped.lagged_regressor_cols (list [str] or None, default None) – A list of additional columns needed for lagged regressors in the training and prediction DataFrames. This list can have overlap with
regressor_cols
. If None, no additional columns are added to the DataFrame. Lagged regressor columns that are unavailable indf
are dropped.estimator (instance of an estimator that implements greykite.algo.models.base_forecast_estimator.BaseForecastEstimator) – Estimator to use as the final step in the pipeline. Ignored if
pipeline
is provided.forecast_horizon (int or None, default None) – Number of periods to forecast into the future. Must be > 0. If None, default is determined from input data frequency
coverage (float or None, default=0.95) – Intended coverage of the prediction bands (0.0 to 1.0) If None, the upper/lower predictions are not returned Ignored if pipeline is provided. Uses coverage of the
pipeline
estimator instead.test_horizon (int or None, default None) – Numbers of periods held back from end of df for test. The rest is used for cross validation. If None, default is forecast_horizon. Set to 0 to skip backtest.
periods_between_train_test (int or None, default None) – Number of periods for the gap between train and test data. If None, default is 0.
agg_periods (int or None, default None) –
Number of periods to aggregate before evaluation.
Model is fit and forecasted on the dataset’s original frequency.
Before evaluation, the actual and forecasted values are aggregated, using rolling windows of size
agg_periods
and the functionagg_func
. (e.g. if the dataset is hourly, useagg_periods=24, agg_func=np.sum
, to evaluate performance on the daily totals).If None, does not aggregate before evaluation.
Currently, this is only used when calculating CV metrics and the R2_null_model_score metric in backtest/forecast. No pre-aggregation is applied for the other backtest/forecast evaluation metrics.
agg_func (callable or None, default None) –
Takes an array and returns a number, e.g. np.max, np.sum.
Defines how to aggregate rolling windows of actual and predicted values before evaluation.
Ignored if
agg_periods
is None.Currently, this is only used when calculating CV metrics and the R2_null_model_score metric in backtest/forecast. No pre-aggregation is applied for the other backtest/forecast evaluation metrics.
score_func (str or callable, default
EvaluationMetricEnum.MeanAbsolutePercentError.name
) – Score function used to select optimal model in CV. If a callable, takes arraysy_true
,y_pred
and returns a float. If a string, must be either aEvaluationMetricEnum
member name orFRACTION_OUTSIDE_TOLERANCE
.score_func_greater_is_better (bool, default False) – True if
score_func
is a score function, meaning higher is better, and False if it is a loss function, meaning lower is better. Must be provided ifscore_func
is a callable (custom function). Ignored ifscore_func
is a string, because the direction is known.cv_report_metrics (str, or list [str], or None, default CV_REPORT_METRICS_ALL) –
Additional metrics to compute during CV, besides the one specified by
score_func
.If the string constant
greykite.framework.constants.CV_REPORT_METRICS_ALL
, computes all metrics inEvaluationMetricEnum
. Also computesFRACTION_OUTSIDE_TOLERANCE
ifrelative_error_tolerance
is not None. The results are reported by the short name (.get_metric_name()
) forEvaluationMetricEnum
members andFRACTION_OUTSIDE_TOLERANCE_NAME
forFRACTION_OUTSIDE_TOLERANCE
. These names appear in the keys offorecast_result.grid_search.cv_results_
returned by this function.If a list of strings, each of the listed metrics is computed. Valid strings are
EvaluationMetricEnum
member names andFRACTION_OUTSIDE_TOLERANCE
.For example:
["MeanSquaredError", "MeanAbsoluteError", "MeanAbsolutePercentError", "MedianAbsolutePercentError", "FractionOutsideTolerance2"]
If None, no additional metrics are computed.
null_model_params (dict or None, default None) –
Defines baseline model to compute
R2_null_model_score
evaluation metric.R2_null_model_score
is the improvement in the loss function relative to a null model. It can be used to evaluate model quality with respect to a simple baseline. For details, seer2_null_model_score
.The null model is a
DummyRegressor
, which returns constant predictions.Valid keys are “strategy”, “constant”, “quantile”. See
DummyRegressor
. For example:null_model_params = { "strategy": "mean", } null_model_params = { "strategy": "median", } null_model_params = { "strategy": "quantile", "quantile": 0.8, } null_model_params = { "strategy": "constant", "constant": 2.0, }
If None,
R2_null_model_score
is not calculated.Note: CV model selection always optimizes
score_func`, not the ``R2_null_model_score
.relative_error_tolerance (float or None, default None) – Threshold to compute the
Outside Tolerance
metric, defined as the fraction of forecasted values whose relative error is strictly greater thanrelative_error_tolerance
. For example, 0.05 allows for 5% relative error. If None, the metric is not computed.hyperparameter_grid (dict, list [dict] or None, default None) –
Sets properties of the steps in the pipeline, and specifies combinations to search over. Should be valid input to
sklearn.model_selection.GridSearchCV
(param_grid) orsklearn.model_selection.RandomizedSearchCV
(param_distributions).Prefix transform/estimator attributes by the name of the step in the pipeline. See details at: https://scikit-learn.org/stable/modules/compose.html#nested-parameters
If None, uses the default pipeline parameters.
hyperparameter_budget (int or None, default None) –
Max number of hyperparameter sets to try within the
hyperparameter_grid
search spaceRuns a full grid search if
hyperparameter_budget
is sufficient to exhaust fullhyperparameter_grid
, otherwise samples uniformly at random from the space.If None, uses defaults:
full grid search if all values are constant
10 if any value is a distribution to sample from
n_jobs (int or None, default
COMPUTATION_N_JOBS
) – Number of jobs to run in parallel (the maximum number of concurrently running workers).-1
uses all CPUs.-2
uses all CPUs but one.None
is treated as 1 unless in ajoblib.Parallel
backend context that specifies otherwise.verbose (int, default 1) – Verbosity level during CV. if > 0, prints number of fits if > 1, prints fit parameters, total score + fit time if > 2, prints train/test scores
cv_horizon (int or None, default None) – Number of periods in each CV test set If None, default is
forecast_horizon
. Set eithercv_horizon
orcv_max_splits
to 0 to skip CV.cv_min_train_periods (int or None, default None) – Minimum number of periods for training each CV fold. If cv_expanding_window is False, every training period is this size If None, default is 2 *
cv_horizon
cv_expanding_window (bool, default False) – If True, training window for each CV split is fixed to the first available date. Otherwise, train start date is sliding, determined by
cv_min_train_periods
.cv_use_most_recent_splits (bool, default False) – If True, splits from the end of the dataset are used. Else a sampling strategy is applied. Check
_sample_splits
for details.cv_periods_between_splits (int or None, default None) – Number of periods to slide the test window between CV splits If None, default is
cv_horizon
cv_periods_between_train_test (int or None, default None) – Number of periods for the gap between train and test in a CV split. If None, default is
periods_between_train_test
.cv_max_splits (int or None, default 3) – Maximum number of CV splits. Given the above configuration, samples up to max_splits train/test splits, preferring splits toward the end of available data. If None, uses all splits. Set either
cv_horizon
orcv_max_splits
to 0 to skip CV.
- Returns
forecast_result – Forecast result. See
ForecastResult
for details.If
cv_horizon=0
,forecast_result.grid_search.best_estimator_
andforecast_result.grid_search.best_params_
attributes are defined according to the provided single set of parameters. There must be a single set of parameters to skip cross-validation.If
test_horizon=0
,forecast_result.backtest
is None.
- Return type
-
class
greykite.framework.pipeline.pipeline.
ForecastResult
(timeseries: greykite.framework.input.univariate_time_series.UnivariateTimeSeries = None, grid_search: sklearn.model_selection._search.RandomizedSearchCV = None, model: sklearn.pipeline.Pipeline = None, backtest: greykite.framework.output.univariate_forecast.UnivariateForecast = None, forecast: greykite.framework.output.univariate_forecast.UnivariateForecast = None)[source]¶ Forecast results. Contains results from cross-validation, backtest, and forecast, the trained model, and the original input data.
-
timeseries
: greykite.framework.input.univariate_time_series.UnivariateTimeSeries = None¶ Input time series in standard format with stats and convenient plot functions.
-
grid_search
: sklearn.model_selection._search.RandomizedSearchCV = None¶ Result of cross-validation grid search on training dataset. The relevant attributes are:
cv_results_
cross-validation scoresbest_estimator_
the model used for backtestingbest_params_
the optimal parameters used for backtesting.
Also see
summarize_grid_search_results
. We recommend using this function to extract results, rather than accessingcv_results_
directly.
-
model
: sklearn.pipeline.Pipeline = None¶ Model fitted on full dataset, using the best parameters selected via cross-validation. Has
.fit()
,.predict()
, and diagnostic functions depending on the model.
-
backtest
: greykite.framework.output.univariate_forecast.UnivariateForecast = None¶ Forecast on backtest period. Backtest period is a holdout test set to check forecast quality against the most recent actual values available. The best model from cross validation is refit on data prior to this period. The timestamps in
backtest.df
are sorted in ascending order. Has a.plot()
method and attributes to get forecast vs actuals, evaluation results.
-
forecast
: greykite.framework.output.univariate_forecast.UnivariateForecast = None¶ Forecast on future period. Future dates are after the train end date, following the holdout test set. The best model from cross validation is refit on data prior to this period. The timestamps in
forecast.df
are sorted in ascending order. Has a.plot()
method and attributes to get forecast vs actuals, evaluation results.
-
Template Output¶
-
class
greykite.framework.input.univariate_time_series.
UnivariateTimeSeries
[source]¶ Defines univariate time series input. The dataset can include regressors, but only one metric is designated as the target metric to forecast.
Loads time series into a standard format. Provides statistics, plotting functions, and ability to generate future dataframe for prediction.
-
df
¶ Data frame containing timestamp and value, with standardized column names for internal use (TIME_COL, VALUE_COL). Rows are sorted by time index, and missing gaps between dates are filled in so that dates are spaced at regular intervals. Values are adjusted for anomalies according to
anomaly_info
. The index can be timezone aware (but TIME_COL is not).- Type
-
y
¶ Value of time series to forecast.
- Type
pandas.Series
, dtype float64
-
time_stats
¶ Summary statistics about the timestamp column.
- Type
dict
-
value_stats
¶ Summary statistics about the value column.
- Type
dict
-
original_time_col
¶ Name of time column in original input data.
- Type
str
-
original_value_col
¶ Name of value column in original input data.
- Type
str
-
regressor_cols
¶ A list of regressor columns in the training and prediction DataFrames.
- Type
list [str]
-
lagged_regressor_cols
¶ A list of additional columns needed for lagged regressors in the training and prediction DataFrames.
- Type
list [str]
-
last_date_for_val
¶ Date or timestamp corresponding to last non-null value in
df[original_value_col]
.- Type
datetime.datetime
or None, default None
-
last_date_for_reg
¶ Date or timestamp corresponding to last non-null value in
df[regressor_cols]
. Ifregressor_cols
is None,last_date_for_reg
is None.- Type
datetime.datetime
or None, default None
-
last_date_for_lag_reg
¶ Date or timestamp corresponding to last non-null value in
df[lagged_regressor_cols]
. Iflagged_regressor_cols
is None,last_date_for_lag_reg
is None.- Type
datetime.datetime
or None, default None
-
train_end_date
¶ Last date or timestamp in
fit_df
. It is always less than or equal to minimum non-null values oflast_date_for_val
andlast_date_for_reg
.- Type
-
fit_cols
¶ A list of columns used in the training and prediction DataFrames.
- Type
list [str]
-
fit_df
¶ Data frame containing timestamp and value, with standardized column names for internal use. Will be used for fitting (train, cv, backtest).
- Type
-
fit_y
¶ Value of time series for fit_df.
- Type
pandas.Series
, dtype float64
-
freq
¶ timeseries frequency, DateOffset alias, e.g. {‘T’ (minute), ‘H’, D’, ‘W’, ‘M’ (month end), ‘MS’ (month start), ‘Y’ (year end), ‘Y’ (year start)} See https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases
- Type
str
-
anomaly_info
¶ Anomaly adjustment info. Anomalies in
df
are corrected before any forecasting is done. Seeself.load_data()
- Type
dict or list [dict] or None, default None
-
df_before_adjustment
¶ self.df
before adjustment byanomaly_info
. Used byself.plot()
to show the adjustment.- Type
pandas.DataFrame
or None, default None
-
load_data
(df: pandas.core.frame.DataFrame, time_col: str = 'ts', value_col: str = 'y', freq: str = None, date_format: str = None, tz: str = None, train_end_date: datetime.datetime = None, regressor_cols: List[str] = None, lagged_regressor_cols: List[str] = None, anomaly_info: Optional[Union[Dict, List[Dict]]] = None)[source]¶ Loads data to internal representation. Parses date column, sets timezone aware index. Checks for irregularities and raises an error if input is invalid. Adjusts for anomalies according to
anomaly_info
.- Parameters
df (
pandas.DataFrame
) – Input timeseries. A data frame which includes the timestamp column as well as the value column.time_col (str) – The column name in
df
representing time for the time series data. The time column can be anything that can be parsed by pandas DatetimeIndex.value_col (str) – The column name which has the value of interest to be forecasted.
freq (str or None, default None) – Timeseries frequency, DateOffset alias, If None automatically inferred.
date_format (str or None, default None) – strftime format to parse time column, eg
%m/%d/%Y
. Note that%f
will parse all the way up to nanoseconds. If None (recommended), inferred bypandas.to_datetime
.tz (str or pytz.timezone object or None, default None) – Passed to pandas.tz_localize to localize the timestamp.
train_end_date (
datetime.datetime
or None, default None) – Last date to use for fitting the model. Forecasts are generated after this date. If None, it is set to the minimum ofself.last_date_for_val
andself.last_date_for_reg
.regressor_cols (list [str] or None, default None) – A list of regressor columns used in the training and prediction DataFrames. If None, no regressor columns are used. Regressor columns that are unavailable in
df
are dropped.lagged_regressor_cols (list [str] or None, default None) – A list of additional columns needed for lagged regressors in the training and prediction DataFrames. This list can have overlap with
regressor_cols
. If None, no additional columns are added to the DataFrame. Lagged regressor columns that are unavailable indf
are dropped.anomaly_info (dict or list [dict] or None, default None) –
Anomaly adjustment info. Anomalies in
df
are corrected before any forecasting is done.If None, no adjustments are made.
A dictionary containing the parameters to
adjust_anomalous_data
. See that function for details. The possible keys are:"value_col"
strThe name of the column in
df
to adjust. You may adjust the value to forecast as well as any numeric regressors."anomaly_df"
pandas.DataFrame
Adjustments to correct the anomalies.
"start_date_col"
: str, default START_DATE_COLStart date column in
anomaly_df
."end_date_col"
: str, default END_DATE_COLEnd date column in
anomaly_df
."adjustment_delta_col"
: str or None, default NoneImpact column in
anomaly_df
."filter_by_dict"
: dict or None, default NoneUsed to filter
anomaly_df
to the relevant anomalies for thevalue_col
in this dictionary. Key specifies the column name, value specifies the filter value."filter_by_value_col""
: str or None, default NoneAdds
{filter_by_value_col: value_col}
tofilter_by_dict
if not None, for thevalue_col
in this dictionary."adjustment_method"
str (“add” or “subtract”), default “add”How to make the adjustment, if
adjustment_delta_col
is provided.
Accepts a list of such dictionaries to adjust multiple columns in
df
.
- Returns
self – Sets
self.df
with standard column names, value adjusted for anomalies, and time gaps filled in, sorted by time index.- Return type
Returns self.
-
describe_time_col
()[source]¶ Basic descriptive stats on the timeseries time column.
- Returns
time_stats –
Dictionary with descriptive stats on the timeseries time column.
- data_points: int
number of time points
- mean_increment_secs: float
mean frequency
- min_timestamp: datetime64
start date
- max_timestamp: datetime64
end date
- Return type
dict
-
describe_value_col
()[source]¶ Basic descriptive stats on the timeseries value column.
- Returns
value_stats – Dict with keys: count, mean, std, min, 25%, 50%, 75%, max
- Return type
dict [str, float]
-
make_future_dataframe
(periods: int = None, include_history=True)[source]¶ Extends the input data for prediction into the future.
Includes the historical values (VALUE_COL) so this can be fed into a Pipeline that transforms input data for fitting, and for use in evaluation.
- Parameters
- Returns
future_df – Dataframe with future timestamps for prediction. Contains columns for:
prediction dates (
TIME_COL
),values (
VALUE_COL
),optional regressors
- Return type
-
plot
(color='rgb(32, 149, 212)', show_anomaly_adjustment=False, **kwargs)[source]¶ Returns interactive plotly graph of the value against time.
If anomaly info is provided, there is an option to show the anomaly adjustment.
- Parameters
color (str, default “rgb(32, 149, 212)” (light blue)) – Color of the value line (after adjustment, if applicable).
show_anomaly_adjustment (bool, default False) – Whether to show the anomaly adjustment.
kwargs (additional parameters) – Additional parameters to pass to
plot_univariate
such as title and color.
- Returns
fig – Interactive plotly graph of the value against time.
See
plot_forecast_vs_actual
return value for how to plot the figure and add customization.- Return type
-
get_grouping_evaluation
(aggregation_func=<function nanmean>, aggregation_func_name='mean', groupby_time_feature=None, groupby_sliding_window_size=None, groupby_custom_column=None)[source]¶ Group-wise computation of aggregated timeSeries value. Can be used to evaluate error/ aggregated value by a time feature, over time, or by a user-provided column.
Exactly one of:
groupby_time_feature
,groupby_sliding_window_size
,groupby_custom_column
must be provided.- Parameters
aggregation_func (callable, optional, default
numpy.nanmean
) – Function that aggregates an array to a number. Signature (y: array) -> aggregated value: float.aggregation_func_name (str or None, optional, default “mean”) – Name of grouping function, used to report results. If None, defaults to “aggregation”.
groupby_time_feature (str or None, optional) – If provided, groups by a column generated by
build_time_features_df
. See that function for valid values.groupby_sliding_window_size (int or None, optional) – If provided, sequentially partitions data into groups of size
groupby_sliding_window_size
.groupby_custom_column (
pandas.Series
or None, optional) – If provided, groups by this column value. Should be same length as the DataFrame.
- Returns
grouped_df –
grouping_func_name: evaluation metric for aggregation of timeseries.
group name: group name depends on the grouping method:
groupby_time_feature
forgroupby_time_feature
cst.TIME_COL
forgroupby_sliding_window_size
groupby_custom_column.name
forgroupby_custom_column
.
- Return type
pandas.DataFrame
with two columns:
-
plot_grouping_evaluation
(aggregation_func=<function nanmean>, aggregation_func_name='mean', groupby_time_feature=None, groupby_sliding_window_size=None, groupby_custom_column=None, xlabel=None, ylabel=None, title=None)[source]¶ Computes aggregated timeseries by group and plots the result. Can be used to plot aggregated timeseries by a time feature, over time, or by a user-provided column.
Exactly one of:
groupby_time_feature
,groupby_sliding_window_size
,groupby_custom_column
must be provided.- Parameters
aggregation_func (callable, optional, default
numpy.nanmean
) – Function that aggregates an array to a number. Signature (y: array) -> aggregated value: float.aggregation_func_name (str or None, optional, default “mean”) – Name of grouping function, used to report results. If None, defaults to “aggregation”.
groupby_time_feature (str or None, optional) – If provided, groups by a column generated by
build_time_features_df
. See that function for valid values.groupby_sliding_window_size (int or None, optional) – If provided, sequentially partitions data into groups of size
groupby_sliding_window_size
.groupby_custom_column (
pandas.Series
or None, optional) – If provided, groups by this column value. Should be same length as the DataFrame.xlabel (str, optional, default None) – X-axis label of the plot.
ylabel (str, optional, default None) – Y-axis label of the plot.
title (str or None, optional) – Plot title. If None, default is based on axis labels.
- Returns
fig – plotly graph object showing aggregated timeseries by group. x-axis label depends on the grouping method:
groupby_time_feature
forgroupby_time_feature
TIME_COL
forgroupby_sliding_window_size
groupby_custom_column.name
forgroupby_custom_column
.- Return type
-
get_quantiles_and_overlays
(groupby_time_feature=None, groupby_sliding_window_size=None, groupby_custom_column=None, show_mean=False, show_quantiles=False, show_overlays=False, overlay_label_time_feature=None, overlay_label_sliding_window_size=None, overlay_label_custom_column=None, center_values=False, value_col='y', mean_col_name='mean', quantile_col_prefix='Q', **overlay_pivot_table_kwargs)[source]¶ Computes mean, quantiles, and overlays by the requested grouping dimension.
Overlays are best explained in the plotting context. The grouping dimension goes on the x-axis, and one line is shown for each level of the overlay dimension. This function returns a column for each line to plot (e.g. mean, each quantile, each overlay value).
Exactly one of:
groupby_time_feature
,groupby_sliding_window_size
,groupby_custom_column
must be provided as the grouping dimension.If
show_overlays
is True, exactly one of:overlay_label_time_feature
,overlay_label_sliding_window_size
,overlay_label_custom_column
can be provided to specify thelabel_col
(overlay dimension). Internally, the function callspandas.DataFrame.pivot_table
withindex=groupby_col
,columns=label_col
,values=value_col
to get the overlay values for plotting. You can pass additional parameters topandas.DataFrame.pivot_table
viaoverlay_pivot_table_kwargs
, e.g. to change the aggregation method. If an explicit label is not provided, the records are labeled by their position within the group.For example, to show yearly seasonality mean, quantiles, and overlay plots for each individual year, use:
self.get_quantiles_and_overlays( groupby_time_feature="doy", # Rows: a row for each day of year (1, 2, ..., 366) show_mean=True, # mean value on that day show_quantiles=[0.1, 0.9], # quantiles of the observed distribution on that day show_overlays=True, # Include overlays defined by ``overlay_label_time_feature`` overlay_label_time_feature="year") # One column for each observed "year" (2016, 2017, 2018, ...)
To show weekly seasonality over time, use:
self.get_quantiles_and_overlays( groupby_time_feature="dow", # Rows: a row for each day of week (1, 2, ..., 7) show_mean=True, # mean value on that day show_quantiles=[0.1, 0.5, 0.9], # quantiles of the observed distribution on that day show_overlays=True, # Include overlays defined by ``overlay_label_time_feature`` overlay_label_sliding_window_size=90, # One column for each 90 period sliding window in the dataset, aggfunc="median") # overlay value is the median value for the dow over the period (default="mean").
It may be difficult to assess the weekly seasonality from the previous result, because overlays shift up/down over time due to trend/yearly seasonality. Use
center_values=True
to adjust each overlay so its average value is centered at 0. Mean and quantiles are shifted by a single constant to center the mean at 0, while preserving their relative values:self.get_quantiles_and_overlays( groupby_time_feature="dow", show_mean=True, show_quantiles=[0.1, 0.5, 0.9], show_overlays=True, overlay_label_sliding_window_size=90, aggfunc="median", center_values=True) # Centers the output
Centering reduces the variability in the overlays to make it easier to isolate the effect by the groupby column. As a result, centered overlays have smaller variability than that reported by the quantiles, which operate on the original, uncentered data points. Similarly, if overlays are aggregates of individual values (i.e.
aggfunc
is needed in the call topandas.DataFrame.pivot_table
), the quantiles of overlays will be less extreme than those of the original data.To assess variability conditioned on the groupby value, check the quantiles.
To assess variability conditioned on both the groupby and overlay value, after any necessary aggregation, check the variability of the overlay values. Compute quantiles of overlays from the return value if desired.
- Parameters
groupby_time_feature (str or None, default None) – If provided, groups by a column generated by
build_time_features_df
. See that function for valid values.groupby_sliding_window_size (int or None, default None) – If provided, sequentially partitions data into groups of size
groupby_sliding_window_size
.groupby_custom_column (
pandas.Series
or None, default None) – If provided, groups by this column value. Should be same length as the DataFrame.show_mean (bool, default False) – Whether to return the mean value by the groupby column.
show_quantiles (bool or list [float] or
numpy.array
, default False) – Whether to return the quantiles of the value by the groupby column. If False, does not return quantiles. If True, returns default quantiles (0.1 and 0.9). If array-like, a list of quantiles to compute (e.g. (0.1, 0.25, 0.75, 0.9)).show_overlays (bool or int or array-like [int or str] or None, default False) –
Whether to return overlays of the value by the groupby column.
If False, no overlays are shown.
If True and
label_col
is defined, callspandas.DataFrame.pivot_table
withindex=groupby_col
,columns=label_col
,values=value_col
.label_col
is defined by one ofoverlay_label_time_feature
,overlay_label_sliding_window_size
, oroverlay_label_custom_column
. Returns one column for each value of thelabel_col
.If True and the
label_col
is not defined, returns the raw values within each group. Values across groups are put into columns by their position in the group (1st element in group, 2nd, 3rd, etc.). Positional order in a group is not guaranteed to correspond to anything meaningful, so the items within a column may not have anything in common. It is better to specify one ofoverlay_*
to explicitly define the overlay labels.If an integer, the number of overlays to randomly sample. The same as True, then randomly samples up to int columns. This is useful if there are too many values.
If a list [int], a list of column indices (int type). The same as True, then selects the specified columns by index.
If a list [str], a list of column names. Column names are matched by their string representation to the names in this list. The same as True, then selects the specified columns by name.
overlay_label_time_feature (str or None, default None) –
If
show_overlays
is True, can be used to definelabel_col
, i.e. which dimension to show separately as overlays.If provided, uses a column generated by
build_time_features_df
. See that function for valid values.overlay_label_sliding_window_size (int or None, default None) –
If
show_overlays
is True, can be used to definelabel_col
, i.e. which dimension to show separately as overlays.If provided, uses a column that sequentially partitions data into groups of size
groupby_sliding_window_size
.overlay_label_custom_column (
pandas.Series
or None, default None) –If
show_overlays
is True, can be used to definelabel_col
, i.e. which dimension to show separately as overlays.If provided, uses this column value. Should be same length as the DataFrame.
value_col (str, default VALUE_COL) – The column name for the value column. By default, shows the univariate time series value, but it can be any other column in
self.df
.mean_col_name (str, default “mean”) – The name to use for the mean column in the output. Applies if
show_mean=True
.quantile_col_prefix (str, default “Q”) – The prefix to use for quantile column names in the output. Columns are named with this prefix followed by the quantile, rounded to 2 decimal places.
center_values (bool, default False) –
Whether to center the return values. If True, shifts each overlay so its average value is centered at 0. Shifts mean and quantiles by a constant to center the mean at 0, while preserving their relative values.
If False, values are not centered.
overlay_pivot_table_kwargs (additional parameters) – Additional keyword parameters to pass to
pandas.DataFrame.pivot_table
, used in generating the overlays. See above description for details.
- Returns
grouped_df – Dataframe with mean, quantiles, and overlays by the grouping column. Overlays are defined by the grouping column and overlay dimension.
ColumnIndex is a multiindex with first level as the “category”, a subset of [MEAN_COL_GROUP, QUANTILE_COL_GROUP, OVERLAY_COL_GROUP] depending on what is requests.
grouped_df[MEAN_COL_GROUP] = df with single column, named
mean_col_name
.grouped_df[QUANTILE_COL_GROUP] = df with a column for each quantile, named f”{quantile_col_prefix}{round(str(q))}”, where
q
is the quantile.grouped_df[OVERLAY_COL_GROUP] = df with one column per overlay value, named by the overlay value.
For example, it might look like:
category mean quantile overlay name mean Q0.1 Q0.9 2007 2008 2009 doy 1 8.42 7.72 9.08 8.29 7.75 8.33 2 8.82 8.20 9.56 8.43 8.80 8.53 3 8.95 8.25 9.88 8.26 9.12 8.70 4 9.07 8.60 9.49 8.10 9.99 8.73 5 8.73 8.29 9.24 7.95 9.26 8.37 ... ... ... ... ... ... ...
- Return type
-
plot_quantiles_and_overlays
(groupby_time_feature=None, groupby_sliding_window_size=None, groupby_custom_column=None, show_mean=False, show_quantiles=False, show_overlays=False, overlay_label_time_feature=None, overlay_label_sliding_window_size=None, overlay_label_custom_column=None, center_values=False, value_col='y', mean_col_name='mean', quantile_col_prefix='Q', mean_style=None, quantile_style=None, overlay_style=None, xlabel=None, ylabel=None, title=None, showlegend=True, **overlay_pivot_table_kwargs)[source]¶ Plots mean, quantiles, and overlays by the requested grouping dimension.
The grouping dimension goes on the x-axis, and one line is shown for the mean, each quantile, and each level of the overlay dimension, as requested. By default, shading is applied between the quantiles.
Exactly one of:
groupby_time_feature
,groupby_sliding_window_size
,groupby_custom_column
must be provided as the grouping dimension.If
show_overlays
is True, exactly one of:overlay_label_time_feature
,overlay_label_sliding_window_size
,overlay_label_custom_column
can be provided to specify thelabel_col
(overlay dimension). Internally, the function callspandas.DataFrame.pivot_table
withindex=groupby_col
,columns=label_col
,values=value_col
to get the overlay values for plotting. You can pass additional parameters topandas.DataFrame.pivot_table
viaoverlay_pivot_table_kwargs
, e.g. to change the aggregation method. If an explicit label is not provided, the records are labeled by their position within the group.For example, to show yearly seasonality mean, quantiles, and overlay plots for each individual year, use:
self.plot_quantiles_and_overlays( groupby_time_feature="doy", # Rows: a row for each day of year (1, 2, ..., 366) show_mean=True, # mean value on that day show_quantiles=[0.1, 0.9], # quantiles of the observed distribution on that day show_overlays=True, # Include overlays defined by ``overlay_label_time_feature`` overlay_label_time_feature="year") # One column for each observed "year" (2016, 2017, 2018, ...)
To show weekly seasonality over time, use:
self.plot_quantiles_and_overlays( groupby_time_feature="dow", # Rows: a row for each day of week (1, 2, ..., 7) show_mean=True, # mean value on that day show_quantiles=[0.1, 0.5, 0.9], # quantiles of the observed distribution on that day show_overlays=True, # Include overlays defined by ``overlay_label_time_feature`` overlay_label_sliding_window_size=90, # One column for each 90 period sliding window in the dataset, aggfunc="median") # overlay value is the median value for the dow over the period (default="mean").
It may be difficult to assess the weekly seasonality from the previous result, because overlays shift up/down over time due to trend/yearly seasonality. Use
center_values=True
to adjust each overlay so its average value is centered at 0. Mean and quantiles are shifted by a single constant to center the mean at 0, while preserving their relative values:self.plot_quantiles_and_overlays( groupby_time_feature="dow", show_mean=True, show_quantiles=[0.1, 0.5, 0.9], show_overlays=True, overlay_label_sliding_window_size=90, aggfunc="median", center_values=True) # Centers the output
Centering reduces the variability in the overlays to make it easier to isolate the effect by the groupby column. As a result, centered overlays have smaller variability than that reported by the quantiles, which operate on the original, uncentered data points. Similarly, if overlays are aggregates of individual values (i.e.
aggfunc
is needed in the call topandas.DataFrame.pivot_table
), the quantiles of overlays will be less extreme than those of the original data.To assess variability conditioned on the groupby value, check the quantiles.
To assess variability conditioned on both the groupby and overlay value, after any necessary aggregation, check the variability of the overlay values. Compute quantiles of overlays from the return value if desired.
- Parameters
groupby_time_feature (str or None, default None) – If provided, groups by a column generated by
build_time_features_df
. See that function for valid values.groupby_sliding_window_size (int or None, default None) – If provided, sequentially partitions data into groups of size
groupby_sliding_window_size
.groupby_custom_column (
pandas.Series
or None, default None) – If provided, groups by this column value. Should be same length as the DataFrame.show_mean (bool, default False) – Whether to return the mean value by the groupby column.
show_quantiles (bool or list [float] or
numpy.array
, default False) – Whether to return the quantiles of the value by the groupby column. If False, does not return quantiles. If True, returns default quantiles (0.1 and 0.9). If array-like, a list of quantiles to compute (e.g. (0.1, 0.25, 0.75, 0.9)).show_overlays (bool or int or array-like [int or str], default False) –
Whether to return overlays of the value by the groupby column.
If False, no overlays are shown.
If True and
label_col
is defined, callspandas.DataFrame.pivot_table
withindex=groupby_col
,columns=label_col
,values=value_col
.label_col
is defined by one ofoverlay_label_time_feature
,overlay_label_sliding_window_size
, oroverlay_label_custom_column
. Returns one column for each value of thelabel_col
.If True and the
label_col
is not defined, returns the raw values within each group. Values across groups are put into columns by their position in the group (1st element in group, 2nd, 3rd, etc.). Positional order in a group is not guaranteed to correspond to anything meaningful, so the items within a column may not have anything in common. It is better to specify one ofoverlay_*
to explicitly define the overlay labels.If an integer, the number of overlays to randomly sample. The same as True, then randomly samples up to int columns. This is useful if there are too many values.
If a list [int], a list of column indices (int type). The same as True, then selects the specified columns by index.
If a list [str], a list of column names. Column names are matched by their string representation to the names in this list. The same as True, then selects the specified columns by name.
overlay_label_time_feature (str or None, default None) –
If
show_overlays
is True, can be used to definelabel_col
, i.e. which dimension to show separately as overlays.If provided, uses a column generated by
build_time_features_df
. See that function for valid values.overlay_label_sliding_window_size (int or None, default None) –
If
show_overlays
is True, can be used to definelabel_col
, i.e. which dimension to show separately as overlays.If provided, uses a column that sequentially partitions data into groups of size
groupby_sliding_window_size
.overlay_label_custom_column (
pandas.Series
or None, default None) –If
show_overlays
is True, can be used to definelabel_col
, i.e. which dimension to show separately as overlays.If provided, uses this column value. Should be same length as the DataFrame.
value_col (str, default VALUE_COL) – The column name for the value column. By default, shows the univariate time series value, but it can be any other column in
self.df
.mean_col_name (str, default “mean”) – The name to use for the mean column in the output. Applies if
show_mean=True
.quantile_col_prefix (str, default “Q”) – The prefix to use for quantile column names in the output. Columns are named with this prefix followed by the quantile, rounded to 2 decimal places.
center_values (bool, default False) –
Whether to center the return values. If True, shifts each overlay so its average value is centered at 0. Shifts mean and quantiles by a constant to center the mean at 0, while preserving their relative values.
If False, values are not centered.
mean_style (dict or None, default None) –
How to style the mean line, passed as keyword arguments to
plotly.graph_objects.Scatter
. If None, the default is:mean_style = { "line": dict( width=2, color="#595959"), # gray "legendgroup": MEAN_COL_GROUP}
quantile_style (dict or None, default None) –
How to style the quantile lines, passed as keyword arguments to
plotly.graph_objects.Scatter
. If None, the default is:quantile_style = { "line": dict( width=2, color="#1F9AFF", # blue dash="solid"), "legendgroup": QUANTILE_COL_GROUP, # show/hide them together "fill": "tonexty"}
Note that fill style is removed from to the first quantile line, to fill only between items in the same category.
overlay_style (dict or None, default None) –
How to style the overlay lines, passed as keyword arguments to
plotly.graph_objects.Scatter
. If None, the default is:overlay_style = { "opacity": 0.5, # makes it easier to see density "line": dict( width=1, color="#B3B3B3", # light gray dash="solid"), "legendgroup": OVERLAY_COL_GROUP}
xlabel (str, optional, default None) – X-axis label of the plot.
ylabel (str, optional, default None) – Y-axis label of the plot. If None, uses
value_col
.title (str or None, default None) – Plot title. If None, default is based on axis labels.
showlegend (bool, default True) – Whether to show the legend.
overlay_pivot_table_kwargs (additional parameters) – Additional keyword parameters to pass to
pandas.DataFrame.pivot_table
, used in generating the overlays. Seeget_quantiles_and_overlays
description for details.
- Returns
fig – plotly graph object showing the mean, quantiles, and overlays.
- Return type
See also
None
To get the mean, quantiles, and overlays as a
pandas.DataFrame
without plotting.
-
-
class
greykite.framework.output.univariate_forecast.
UnivariateForecast
(df, time_col='ts', actual_col='actual', predicted_col='forecast', predicted_lower_col='forecast_lower', predicted_upper_col='forecast_upper', null_model_predicted_col='forecast_null', ylabel='y', train_end_date=None, test_start_date=None, forecast_horizon=None, coverage=0.95, r2_loss_function=<function mean_squared_error>, estimator=None, relative_error_tolerance=None)[source]¶ Stores predicted and actual values. Provides functionality to evaluate a forecast:
plots true against actual with prediction bands.
evaluates model performance.
Input should be one of two kinds of forecast results:
model fit to train data, forecast on test set (actuals available).
model fit to all data, forecast on future dates (actuals not available).
The input
df
is a concatenation of fitted and forecasted values.-
df
¶ Timestamp, predicted, and actual values.
- Type
-
time_col
¶ Column in
df
with timestamp (default “ts”).- Type
str
-
actual_col
¶ Column in
df
with actual values (default “y”).- Type
str
-
predicted_col
¶ Column in
df
with predicted values (default “forecast”).- Type
str
-
predicted_lower_col
¶ Column in
df
with predicted lower bound (default “forecast_lower”, optional).- Type
str or None
-
predicted_upper_col
¶ Column in
df
with predicted upper bound (default “forecast_upper”, optional).- Type
str or None
-
null_model_predicted_col
¶ Column in
df
with predicted value of null model (default “forecast_null”, optional).- Type
str or None
-
ylabel
¶ Unit of measurement (default “y”)
- Type
str
-
train_end_date
¶ End date for train period. If None, assumes all data were used for training.
- Type
str or
datetime
or None, default None
-
test_start_date
¶ Start date of test period. If None, set to the
time_col
value immediately aftertrain_end_date
. This assumes that all data not used in training were used for testing.- Type
str or
datetime
or None, default None
-
forecast_horizon
¶ Number of periods forecasted into the future. Must be > 0.
- Type
int or None, default None
-
coverage
¶ Intended coverage of the prediction bands (0.0 to 1.0).
- Type
float or None
-
r2_loss_function
¶ Loss function to calculate
cst.R2_null_model_score
, with signatureloss_func(y_true, y_pred)
(default mean_squared_error)- Type
function
-
estimator
¶ The fitted estimator, the last step in the forecast pipeline.
- Type
An instance of an estimator that implements greykite.models.base_forecast_estimator.BaseForecastEstimator.
-
relative_error_tolerance
¶ Threshold to compute the
Outside Tolerance
metric, defined as the fraction of forecasted values whose relative error is strictly greater thanrelative_error_tolerance
. For example, 0.05 allows for 5% relative error. If None, the metric is not computed.- Type
float or None, default None
-
df_train
¶ Subset of
df
wheredf[time_col]
<=train_end_date
.- Type
-
df_test
¶ Subset of
df
wheredf[time_col]
>train_end_date
.- Type
-
train_evaluation
¶ Evaluation metrics on training set.
- Type
dict [str, float]
-
test_evaluation
¶ Evaluation metrics on test set (if actual values provided after train_end_date).
- Type
dict [str, float]
-
test_na_count
¶ Count of NA values in test data.
- Type
int
-
compute_evaluation_metrics_split
()[source]¶ Computes __evaluation_metrics for train and test set separately.
- Returns
dictionary with train and test evaluation metrics
-
plot
(**kwargs)[source]¶ Plots predicted against actual.
- Parameters
kwargs (additional parameters) – Additional parameters to pass to
plot_forecast_vs_actual
such as title, colors, and line styling.- Returns
fig – Plotly figure of forecast against actuals, with prediction intervals if available.
See
plot_forecast_vs_actual
return value for how to plot the figure and add customization.- Return type
-
get_grouping_evaluation
(score_func=<function add_finite_filter_to_scorer.<locals>.score_func_finite>, score_func_name='MAPE', which='train', groupby_time_feature=None, groupby_sliding_window_size=None, groupby_custom_column=None)[source]¶ Group-wise computation of forecasting error. Can be used to evaluate error/ aggregated value by a time feature, over time, or by a user-provided column.
Exactly one of:
groupby_time_feature
,groupby_sliding_window_size
,groupby_custom_column
must be provided.- Parameters
score_func (callable, optional) – Function that maps two arrays to a number. Signature (y_true: array, y_pred: array) -> error: float
score_func_name (str or None, optional) – Name of the score function used to report results. If None, defaults to “metric”.
which (str) – “train” or “test”. Which dataset to evaluate.
groupby_time_feature (str or None, optional) – If provided, groups by a column generated by
build_time_features_df
. See that function for valid values.groupby_sliding_window_size (int or None, optional) – If provided, sequentially partitions data into groups of size
groupby_sliding_window_size
.groupby_custom_column (
pandas.Series
or None, optional) – If provided, groups by this column value. Should be same length as the DataFrame.
- Returns
grouped_df –
grouping_func_name: evaluation metric computing forecasting error of timeseries.
group name: group name depends on the grouping method:
groupby_time_feature
forgroupby_time_feature
cst.TIME_COL
forgroupby_sliding_window_size
groupby_custom_column.name
forgroupby_custom_column
.
- Return type
pandas.DataFrame
with two columns:
-
plot_grouping_evaluation
(score_func=<function add_finite_filter_to_scorer.<locals>.score_func_finite>, score_func_name='MAPE', which='train', groupby_time_feature=None, groupby_sliding_window_size=None, groupby_custom_column=None, xlabel=None, ylabel=None, title=None)[source]¶ Computes error by group and plots the result. Can be used to plot error by a time feature, over time, or by a user-provided column.
Exactly one of:
groupby_time_feature
,groupby_sliding_window_size
,groupby_custom_column
must be provided.- Parameters
score_func (callable, optional) – Function that maps two arrays to a number. Signature (y_true: array, y_pred: array) -> error: float
score_func_name (str or None, optional) – Name of the score function used to report results. If None, defaults to “metric”.
which (str, optional, default “train”) – Which dataset to evaluate, “train” or “test”.
groupby_time_feature (str or None, optional) – If provided, groups by a column generated by
build_time_features_df
. See that function for valid values.groupby_sliding_window_size (int or None, optional) – If provided, sequentially partitions data into groups of size
groupby_sliding_window_size
.groupby_custom_column (
pandas.Series
or None, optional) – If provided, groups by this column value. Should be same length as the DataFrame.xlabel (str, optional, default None) – X-axis label of the plot.
ylabel (str, optional, default None) – Y-axis label of the plot.
title (str or None, optional) – Plot title, if None this function creates a suitable title.
- Returns
fig – plotly graph object showing forecasting error by group. x-axis label depends on the grouping method:
groupby_time_feature
forgroupby_time_feature
time_col
forgroupby_sliding_window_size
groupby_custom_column.name
forgroupby_custom_column
.- Return type
-
autocomplete_map_func_dict
(map_func_dict)[source]¶ Sweeps through
map_func_dict
, converting values that areElementwiseEvaluationMetricEnum
member names to their corresponding row-wise evaluation function with appropriate column names for this UnivariateForecast instance.For example:
map_func_dict = { "squared_error": ElementwiseEvaluationMetricEnum.SquaredError.name, "coverage": ElementwiseEvaluationMetricEnum.Coverage.name, "custom_metric": custom_function } is converted to map_func_dict = { "squared_error": lambda row: ElementwiseEvaluationMetricEnum.SquaredError.get_metric_func()( row[self.actual_col], row[self.predicted_col]), "coverage": lambda row: ElementwiseEvaluationMetricEnum.Coverage.get_metric_func()( row[self.actual_col], row[self.predicted_lower_col], row[self.predicted_upper_col]), "custom_metric": custom_function }
- Parameters
map_func_dict (dict or None) – Same as
flexible_grouping_evaluation
, with one exception: values may a ElementwiseEvaluationMetricEnum member name. There are converted a callable forflexible_grouping_evaluation
.- Returns
map_func_dict – Can be passed to
flexible_grouping_evaluation
.- Return type
dict
-
get_flexible_grouping_evaluation
(which='train', groupby_time_feature=None, groupby_sliding_window_size=None, groupby_custom_column=None, map_func_dict=None, agg_kwargs=None, extend_col_names=False)[source]¶ Group-wise computation of evaluation metrics. Whereas
self.get_grouping_evaluation
computes one metric, this allows computation of any number of custom metrics.For example:
Mean and quantiles of squared error by group.
Mean and quantiles of residuals by group.
Mean and quantiles of actual and forecast by group.
% of actuals outside prediction intervals by group
any combination of the above metrics by the same group
First adds a groupby column by passing
groupby_
parameters toadd_groupby_column
. Then computes grouped evaluation metrics by passingmap_func_dict
,agg_kwargs
andextend_col_names
toflexible_grouping_evaluation
.Exactly one of:
groupby_time_feature
,groupby_sliding_window_size
,groupby_custom_column
must be provided.- which: str
“train” or “test”. Which dataset to evaluate.
- groupby_time_featurestr or None, optional
If provided, groups by a column generated by
build_time_features_df
. See that function for valid values.- groupby_sliding_window_sizeint or None, optional
If provided, sequentially partitions data into groups of size
groupby_sliding_window_size
.- groupby_custom_column
pandas.Series
or None, optional If provided, groups by this column value. Should be same length as the DataFrame.
- map_func_dictdict [str, callable] or None, default None
Row-wise transformation functions to create new columns. If None, no new columns are added.
key: new column name
- value: row-wise function to apply to
df
to generate the column value. Signature (row:
pandas.DataFrame
) -> transformed value: float.
- value: row-wise function to apply to
For example:
map_func_dict = { "residual": lambda row: row["actual"] - row["forecast"], "squared_error": lambda row: (row["actual"] - row["forecast"])**2 }
Some predefined functions are available in
ElementwiseEvaluationMetricEnum
. For example:map_func_dict = { "residual": lambda row: ElementwiseEvaluationMetricEnum.Residual.get_metric_func()( row["actual"], row["forecast"]), "squared_error": lambda row: ElementwiseEvaluationMetricEnum.SquaredError.get_metric_func()( row["actual"], row["forecast"]), "q90_loss": lambda row: ElementwiseEvaluationMetricEnum.Quantile90.get_metric_func()( row["actual"], row["forecast"]), "abs_percent_error": lambda row: ElementwiseEvaluationMetricEnum.AbsolutePercentError.get_metric_func()( row["actual"], row["forecast"]), "coverage": lambda row: ElementwiseEvaluationMetricEnum.Coverage.get_metric_func()( row["actual"], row["forecast_lower"], row["forecast_upper"]), }
As shorthand, it is sufficient to provide the enum member name. These are auto-expanded into the appropriate function. So the following is equivalent:
map_func_dict = { "residual": ElementwiseEvaluationMetricEnum.Residual.name, "squared_error": ElementwiseEvaluationMetricEnum.SquaredError.name, "q90_loss": ElementwiseEvaluationMetricEnum.Quantile90.name, "abs_percent_error": ElementwiseEvaluationMetricEnum.AbsolutePercentError.name, "coverage": ElementwiseEvaluationMetricEnum.Coverage.name, }
- agg_kwargsdict or None, default None
Passed as keyword args to
pandas.core.groupby.DataFrameGroupBy.aggregate
after creating new columns and grouping bygroupby_col
.See
pandas.core.groupby.DataFrameGroupBy.aggregate
orflexible_grouping_evaluation
for details.- extend_col_namesbool or None, default False
How to flatten index after aggregation. In some cases, the column index after aggregation is a multi-index. This parameter controls how to flatten an index with 2 levels to 1 level.
If None, the index is not flattened.
If True, column name is a composite:
{index0}_{index1}
Use this option if index1 is not unique.If False, column name is simply
{index1}
Ignored if the ColumnIndex after aggregation has only one level (e.g. if named aggregation is used in
agg_kwargs
).
- Returns
df_transformed –
df
after transformation and optional aggregation.If
groupby_col
is None, returnsdf
with additional columns as the keys inmap_func_dict
. Otherwise,df
is grouped bygroupby_col
and this becomes the index. Columns are determined byagg_kwargs
andextend_col_names
.- Return type
See also
None
called by this function
None
called by this function
-
plot_flexible_grouping_evaluation
(which='train', groupby_time_feature=None, groupby_sliding_window_size=None, groupby_custom_column=None, map_func_dict=None, agg_kwargs=None, extend_col_names=False, y_col_style_dict='auto-fill', default_color='rgba(0, 145, 202, 1.0)', xlabel=None, ylabel=None, title=None, showlegend=True)[source]¶ Plots group-wise evaluation metrics. Whereas
plot_grouping_evaluation
shows one metric, this can show any number of custom metrics.For example:
Mean and quantiles of squared error by group.
Mean and quantiles of residuals by group.
Mean and quantiles of actual and forecast by group.
% of actuals outside prediction intervals by group
any combination of the above metrics by the same group
See
get_flexible_grouping_evaluation
for details.- which: str
“train” or “test”. Which dataset to evaluate.
- groupby_time_featurestr or None, optional
If provided, groups by a column generated by
build_time_features_df
. See that function for valid values.- groupby_sliding_window_sizeint or None, optional
If provided, sequentially partitions data into groups of size
groupby_sliding_window_size
.- groupby_custom_column
pandas.Series
or None, optional If provided, groups by this column value. Should be same length as the DataFrame.
- map_func_dictdict [str, callable] or None, default None
Grouping evaluation metric specification, along with
agg_kwargs
. Seeget_flexible_grouping_evaluation
.- agg_kwargsdict or None, default None
Grouping evaluation metric specification, along with
map_func_dict
. Seeget_flexible_grouping_evaluation
.- extend_col_namesbool or None, default False
How to name the grouping metrics. See
get_flexible_grouping_evaluation
.- y_col_style_dict: dict [str, dict or None] or “plotly” or “auto” or “auto-fill”, default “auto-fill”
The column(s) to plot on the y-axis, and how to style them. The names should match those generated by
agg_kwargs
andextend_col_names
. The functionget_flexible_grouping_evaluation
can be used to check the column names.For convenience, start with “auto-fill” or “plotly”, then adjust styling as needed.
See
plot_multivariate
for details.- default_color: str, default “rgba(0, 145, 202, 1.0)” (blue)
Default line color when
y_col_style_dict
is one of “auto”, “auto-fill”.- xlabelstr or None, default None
x-axis label. If None, default is
x_col
.- ylabelstr or None, default None
y-axis label. If None, y-axis is not labeled.
- titlestr or None, default None
Plot title. If None and
ylabel
is provided, a default title is used.- showlegendbool, default True
Whether to show the legend.
- Returns
fig – Interactive plotly graph showing the evaluation metrics.
See
plot_forecast_vs_actual
return value for how to plot the figure and add customization.- Return type
See also
None
called by this function
None
called by this function
-
make_univariate_time_series
()[source]¶ Converts prediction into a UnivariateTimeSeries Useful to convert a forecast into the input regressor for a subsequent forecast.
- Returns
UnivariateTimeSeries
-
plot_components
(**kwargs)[source]¶ Class method to plot the components of a
UnivariateForecast
object.Silverkite
calculates component plots based onfit
dataset.Prophet
calculates component plots based onpredict
dataset.For estimator specific component plots with advanced plotting options call
self.estimator.plot_components()
.- Returns
fig –
matplotlib.figure.Figure
forProphet
Figure plotting components against appropriate time scale.- Return type
plotly.graph_objects.Figure
forSilverkite
-
class
greykite.algo.common.model_summary.
ModelSummary
(x, y, pred_cols, pred_category, fit_algorithm, ml_model, max_colwidth=20)[source]¶ A class to store regression model summary statistics.
The class can be printed to get a well formatted model summary.
-
x
¶ The design matrix.
- Type
-
beta
¶ The estimated coefficients.
- Type
-
y
¶ The response.
- Type
-
pred_cols
¶ List of predictor names.
- Type
list [ str ]
-
pred_category
¶ Predictor category, returned by
create_pred_category
.- Type
dict
-
fit_algorithm
¶ The name of algorithm to fit the regression.
- Type
str
-
ml_model
¶ The trained machine learning model class.
- Type
class
-
max_colwidth
¶ The maximum length for predictors to be shown in their original name. If the maximum length of predictors exceeds this parameter, all predictors name will be suppressed and only indices are shown.
- Type
int
-
info_dict
¶ The model summary dictionary, output of
_get_summary
- Type
dict
-
_get_summary
()[source]¶ Gets the model summary from input. This function is called during initialization.
- Returns
info_dict – Includes direct and derived metrics about the trained model. For detailed keys, refer to
get_info_dict_lm
orget_info_dict_tree
.- Return type
dict
-
get_coef_summary
(is_intercept=None, is_time_feature=None, is_event=None, is_trend=None, is_seasonality=None, is_lag=None, is_regressor=None, is_interaction=None, return_df=False)[source]¶ Gets the coefficient summary filtered by conditions.
- Parameters
is_intercept (bool or None, default None) – Intercept or not.
is_time_feature (bool or None, default None) – Time features or not. Time features belong to
TIME_FEATURES
.is_event (bool or None, default None) – Event features or not. Event features have
EVENT_PREFIX
.is_trend (bool or None, default None) – Trend features or not. Trend features have
CHANGEPOINT_COL_PREFIX
or “cpd”.is_seasonality (bool or None, default None) – Seasonality feature or not. Seasonality features have
SEASONALITY_REGEX
.is_lag (bool or None, default None) – Lagged features or not. Lagged features have “lag”.
is_regressor (0 or 1) – Extra features provided by users. They are provided through
extra_pred_cols
in the fit function.is_interaction (bool or None, default None) – Interaction feature or not. Interaction features have “:”.
return_df (bool, default False) –
- If True, the filtered coefficient summary df is also returned.
Otherwise, the filtered coefficient summary df is printed only.
- Returns
filtered_coef_summary – If
return_df
is set to True, returns the filtered coefficient summary df filtered by the given conditions.- Return type
pandas.DataFrame
or None
-
Constants¶
-
class
greykite.common.evaluation.
EvaluationMetricEnum
(value)[source]¶ Valid evaluation metrics. The values tuple is
(score_func: callable, greater_is_better: boolean, short_name: str)
add_finite_filter_to_scorer
is added to the metrics that are directly imported fromsklearn.metrics
(e.g.mean_squared_error
) to ensure that the metric gets calculated even when inputs have missing values.-
Correlation
= (<function add_finite_filter_to_scorer.<locals>.score_func_finite>, True, 'CORR')¶ Pearson correlation coefficient between forecast and actuals. Higher is better.
-
CoefficientOfDetermination
= (<function add_finite_filter_to_scorer.<locals>.score_func_finite>, True, 'R2')¶ Coefficient of determination. See
sklearn.metrics.r2_score
. Higher is better. Equals 1.0 - mean_squared_error / variance(actuals).
-
MeanSquaredError
= (<function add_finite_filter_to_scorer.<locals>.score_func_finite>, False, 'MSE')¶ Mean squared error, the average of squared differences, see
sklearn.metrics.mean_squared_error
.
-
RootMeanSquaredError
= (<function add_finite_filter_to_scorer.<locals>.score_func_finite>, False, 'RMSE')¶ Root mean squared error, the square root of
sklearn.metrics.mean_squared_error
-
MeanAbsoluteError
= (<function add_finite_filter_to_scorer.<locals>.score_func_finite>, False, 'MAE')¶ Mean absolute error, average of absolute differences, see
sklearn.metrics.mean_absolute_error
.
-
MedianAbsoluteError
= (<function add_finite_filter_to_scorer.<locals>.score_func_finite>, False, 'MedAE')¶ Median absolute error, median of absolute differences, see
sklearn.metrics.median_absolute_error
.
-
MeanAbsolutePercentError
= (<function add_finite_filter_to_scorer.<locals>.score_func_finite>, False, 'MAPE')¶ Mean absolute percent error, error relative to actuals expressed as a %, see wikipedia MAPE.
-
MedianAbsolutePercentError
= (<function add_finite_filter_to_scorer.<locals>.score_func_finite>, False, 'MedAPE')¶ Median absolute percent error, median of error relative to actuals expressed as a %, a median version of the MeanAbsolutePercentError, less affected by extreme values.
-
SymmetricMeanAbsolutePercentError
= (<function add_finite_filter_to_scorer.<locals>.score_func_finite>, False, 'sMAPE')¶ Symmetric mean absolute percent error, error relative to (actuals+forecast) expressed as a %. Note that we do not include a factor of 2 in the denominator, so the range is 0% to 100%, see wikipedia sMAPE.
-
Quantile80
= (<function quantile_loss_q.<locals>.quantile_loss_wrapper>, False, 'Q80')¶ Quantile loss with q=0.80:
np.where(y_true < y_pred, (1 - q) * (y_pred - y_true), q * (y_true - y_pred)).mean()
-
Quantile95
= (<function quantile_loss_q.<locals>.quantile_loss_wrapper>, False, 'Q95')¶ Quantile loss with q=0.95:
np.where(y_true < y_pred, (1 - q) * (y_pred - y_true), q * (y_true - y_pred)).mean()
-
Quantile99
= (<function quantile_loss_q.<locals>.quantile_loss_wrapper>, False, 'Q99')¶ Quantile loss with q=0.99:
np.where(y_true < y_pred, (1 - q) * (y_pred - y_true), q * (y_true - y_pred)).mean()
-
FractionOutsideTolerance1
= (functools.partial(<function fraction_outside_tolerance>, rtol=0.01), False, 'OutsideTolerance1p')¶ Fraction of forecasted values that deviate more than 1% from the actual
-
FractionOutsideTolerance2
= (functools.partial(<function fraction_outside_tolerance>, rtol=0.02), False, 'OutsideTolerance2p')¶ Fraction of forecasted values that deviate more than 2% from the actual
-
FractionOutsideTolerance3
= (functools.partial(<function fraction_outside_tolerance>, rtol=0.03), False, 'OutsideTolerance3p')¶ Fraction of forecasted values that deviate more than 3% from the actual
-
FractionOutsideTolerance4
= (functools.partial(<function fraction_outside_tolerance>, rtol=0.04), False, 'OutsideTolerance4p')¶ Fraction of forecasted values that deviate more than 4% from the actual
-
FractionOutsideTolerance5
= (functools.partial(<function fraction_outside_tolerance>, rtol=0.05), False, 'OutsideTolerance5p')¶ Fraction of forecasted values that deviate more than 5% from the actual
-
Constants used by code in common
or in multiple places:
algo
, sklearn
,
and/or framework
.
-
greykite.common.constants.
TIME_COL
= 'ts'¶ The default name for the column with the timestamps of the time series
-
greykite.common.constants.
VALUE_COL
= 'y'¶ The default name for the column with the values of the time series
-
greykite.common.constants.
ACTUAL_COL
= 'actual'¶ The column name representing actual (observed) values
-
greykite.common.constants.
PREDICTED_COL
= 'forecast'¶ The column name representing the predicted values
-
greykite.common.constants.
PREDICTED_LOWER_COL
= 'forecast_lower'¶ The column name representing upper bounds of prediction interval
-
greykite.common.constants.
PREDICTED_UPPER_COL
= 'forecast_upper'¶ The column name representing lower bounds of prediction interval
-
greykite.common.constants.
NULL_PREDICTED_COL
= 'forecast_null'¶ The column name representing predicted values from null model
-
greykite.common.constants.
ERR_STD_COL
= 'err_std'¶ The column name representing the error standard deviation from models
-
greykite.common.constants.
R2_null_model_score
= 'R2_null_model_score'¶ Evaluation metric. Improvement in the specified loss function compared to the predictions of a null model.
-
greykite.common.constants.
FRACTION_OUTSIDE_TOLERANCE
= 'Outside Tolerance (fraction)'¶ Evaluation metric. The fraction of predictions outside the specified tolerance level
-
greykite.common.constants.
PREDICTION_BAND_WIDTH
= 'Prediction Band Width (%)'¶ Evaluation metric. Relative size of prediction bands vs actual, as a percent
-
greykite.common.constants.
PREDICTION_BAND_COVERAGE
= 'Prediction Band Coverage (fraction)'¶ Evaluation metric. Fraction of observations within the bands
-
greykite.common.constants.
LOWER_BAND_COVERAGE
= 'Coverage: Lower Band'¶ Evaluation metric. Fraction of observations within the lower band
-
greykite.common.constants.
UPPER_BAND_COVERAGE
= 'Coverage: Upper Band'¶ Evaluation metric. Fraction of observations within the upper band
-
greykite.common.constants.
COVERAGE_VS_INTENDED_DIFF
= 'Coverage Diff: Actual_Coverage - Intended_Coverage'¶ Evaluation metric. Difference between actual and intended coverage
-
greykite.common.constants.
EVENT_DF_DATE_COL
= 'date'¶ Name of date column for the DataFrames passed to silverkite custom_daily_event_df_dict
-
greykite.common.constants.
EVENT_DF_LABEL_COL
= 'event_name'¶ Name of event column for the DataFrames passed to silverkite custom_daily_event_df_dict
-
greykite.common.constants.
EVENT_PREFIX
= 'events'¶ Prefix for naming event features.
-
greykite.common.constants.
EVENT_DEFAULT
= ''¶ Label used for days without an event.
-
greykite.common.constants.
EVENT_INDICATOR
= 'event'¶ Binary indicatory for an event
-
greykite.common.constants.
CHANGEPOINT_COL_PREFIX
= 'changepoint'¶ Prefix for naming changepoint features.
-
greykite.common.constants.
CHANGEPOINT_COL_PREFIX_SHORT
= 'cp'¶ Short prefix for naming changepoint features.
-
greykite.common.constants.
START_DATE_COL
= 'start_date'¶ Start timestamp column name
-
greykite.common.constants.
END_DATE_COL
= 'end_date'¶ Standard end timestamp column
-
greykite.common.constants.
ADJUSTMENT_DELTA_COL
= 'adjustment_delta'¶ Adjustment column
-
greykite.common.constants.
METRIC_COL
= 'metric'¶ Column to denote metric of interest
-
greykite.common.constants.
DIMENSION_COL
= 'dimension'¶ Dimension column
-
greykite.common.constants.
GROWTH_COL_ALIAS
= {'cuberoot': 'ct_root3', 'cubic': 'ct3', 'linear': 'ct1', 'quadratic': 'ct2', 'sqrt': 'ct_sqrt'}¶ Human-readable names for the growth columns generated by
build_time_features_df
-
greykite.common.constants.
TIME_FEATURES
= ['datetime', 'date', 'year', 'year_length', 'quarter', 'quarter_start', 'quarter_length', 'month', 'month_length', 'woy', 'doy', 'doq', 'dom', 'dow', 'str_dow', 'str_doy', 'hour', 'minute', 'second', 'year_month', 'year_woy', 'month_dom', 'year_woy_dow', 'woy_dow', 'dow_hr', 'dow_hr_min', 'tod', 'tow', 'tom', 'toq', 'toy', 'conti_year', 'is_weekend', 'dow_grouped', 'ct1', 'ct2', 'ct3', 'ct_sqrt', 'ct_root3']¶ Time features generated by
build_time_features_df
-
greykite.common.constants.
LAG_INFIX
= '_lag'¶ Infix for lagged feature names
-
greykite.common.constants.
AGG_LAG_INFIX
= 'avglag'¶ Infix for aggregated lag feature names
-
greykite.common.constants.
TREND_REGEX
= 'changepoint\\d|ct\\d|ct_|cp\\d'¶ Growth terms, including changepoints.
-
greykite.common.constants.
SEASONALITY_REGEX
= 'sin\\d|cos\\d'¶ Seasonality terms modeled by fourier series.
-
greykite.common.constants.
EVENT_REGEX
= 'events_'¶ Event terms.
-
greykite.common.constants.
LAG_REGEX
= '_lag\\d|_avglag_\\d'¶ Lag terms.
-
greykite.common.constants.
LOGGER_NAME
= 'Greykite'¶ Name used by the logger.
Constants used by `~greykite.framework.
-
greykite.framework.constants.
EVALUATION_PERIOD_CV_MAX_SPLITS
= 3¶ Default value for EvaluationPeriodParam().cv_max_splits
-
greykite.framework.constants.
COMPUTATION_N_JOBS
= 1¶ Default value for ComputationParam.n_jobs
-
greykite.framework.constants.
COMPUTATION_VERBOSE
= 1¶ Default value for ComputationParam.verbose
-
greykite.framework.constants.
CV_REPORT_METRICS_ALL
= 'ALL'¶ Set cv_report_metrics to this value to compute all metrics during CV
-
greykite.framework.constants.
FRACTION_OUTSIDE_TOLERANCE_NAME
= 'OutsideTolerance'¶ Short name used to report the result of FRACTION_OUTSIDE_TOLERANCE in CV
-
greykite.framework.constants.
CUSTOM_SCORE_FUNC_NAME
= 'score'¶ Short name used to report the result of custom score_func in CV
-
greykite.framework.constants.
MEAN_COL_GROUP
= 'mean'¶ Columns with mean.
-
greykite.framework.constants.
QUANTILE_COL_GROUP
= 'quantile'¶ Columns with quantile.
-
greykite.framework.constants.
OVERLAY_COL_GROUP
= 'overlay'¶ Columns with overlay.
-
greykite.framework.constants.
FORECAST_STEP_COL
= 'forecast_step'¶ The column name for forecast step in benchmarking
-
class
greykite.algo.forecast.silverkite.constants.silverkite_constant.
SilverkiteConstant
[source]¶ Uses the appropriate constant mixins to provide all the constants that will be used by Silverkite.
-
get_silverkite_column
() → Type[greykite.algo.forecast.silverkite.constants.silverkite_column.SilverkiteColumn]¶ Return the SilverkiteColumn constants
-
get_silverkite_components_enum
() → Type[greykite.algo.forecast.silverkite.constants.silverkite_component.SilverkiteComponentsEnum]¶ Return the SilverkiteComponentsEnum constants
-
get_silverkite_holiday
() → Type[greykite.algo.forecast.silverkite.constants.silverkite_holiday.SilverkiteHoliday]¶ Return the SilverkiteHoliday constants
-
get_silverkite_seasonality_enum
() → Type[greykite.algo.forecast.silverkite.constants.silverkite_seasonality.SilverkiteSeasonalityEnum]¶ Return the SilverkiteSeasonalityEnum constants
-
get_silverkite_time_frequency_enum
() → Type[greykite.algo.forecast.silverkite.constants.silverkite_time_frequency.SilverkiteTimeFrequencyEnum]¶ Return the SilverkiteTimeFrequencyEnum constants
-
-
class
greykite.algo.forecast.silverkite.constants.silverkite_column.
SilverkiteColumn
[source]¶ Silverkite feature sets for sub-daily data.
-
COLS_HOUR_OF_WEEK
: str = 'hour_of_week'¶ Silverkite feature_sets_enabled key. constant hour of week effect
-
COLS_WEEKEND_SEAS
: str = 'is_weekend:daily_seas'¶ Silverkite feature_sets_enabled key. daily seasonality interaction with is_weekend
-
COLS_DAY_OF_WEEK_SEAS
: str = 'day_of_week:daily_seas'¶ Silverkite feature_sets_enabled key. daily seasonality interaction with day of week
-
COLS_TREND_DAILY_SEAS
: str = 'trend:is_weekend:daily_seas'¶ Silverkite feature_sets_enabled key. allow daily seasonality to change over time, depending on is_weekend
-
COLS_EVENT_SEAS
: str = 'event:daily_seas'¶ Silverkite feature_sets_enabled key. allow sub-daily event effects
-
COLS_EVENT_WEEKEND_SEAS
: str = 'event:is_weekend:daily_seas'¶ Silverkite feature_sets_enabled key. allow sub-daily event effect to interact with is_weekend
-
COLS_DAY_OF_WEEK
: str = 'day_of_week'¶ Silverkite feature_sets_enabled key. constant day of week effect
-
COLS_TREND_WEEKEND
: str = 'trend:is_weekend'¶ Silverkite feature_sets_enabled key. allow trend (growth, changepoints) to interact with is_weekend
-
-
class
greykite.algo.forecast.silverkite.constants.silverkite_component.
SilverkiteComponentsEnum
(value)[source]¶ Defines groupby time feature, xlabel and ylabel for Silverkite Component Plots.
-
class
greykite.algo.forecast.silverkite.constants.silverkite_holiday.
SilverkiteHoliday
[source]¶ Holiday constants to be used by Silverkite
-
HOLIDAY_LOOKUP_COUNTRIES_AUTO
= ('UnitedStates', 'UnitedKingdom', 'India', 'France', 'China')¶ Auto setting for the countries that contain the holidays to include in the model
-
HOLIDAYS_TO_MODEL_SEPARATELY_AUTO
= ("New Year's Day", 'Chinese New Year', 'Christmas Day', 'Independence Day', 'Thanksgiving', 'Labor Day', 'Good Friday', 'Easter Monday [England, Wales, Northern Ireland]', 'Memorial Day', 'Veterans Day')¶ Auto setting for the holidays to include in the model
-
ALL_HOLIDAYS_IN_COUNTRIES
= 'ALL_HOLIDAYS_IN_COUNTRIES'¶ Value for holidays_to_model_separately to request all holidays in the lookup countries
-
HOLIDAYS_TO_INTERACT
= ('Christmas Day', 'Christmas Day_minus_1', 'Christmas Day_minus_2', 'Christmas Day_plus_1', 'Christmas Day_plus_2', 'New Years Day', 'New Years Day_minus_1', 'New Years Day_minus_2', 'New Years Day_plus_1', 'New Years Day_plus_2', 'Thanksgiving', 'Thanksgiving_plus_1', 'Independence Day')¶ Significant holidays that may have a different daily seasonality pattern
-
-
class
greykite.algo.forecast.silverkite.constants.silverkite_seasonality.
SilverkiteSeasonalityEnum
(value)[source]¶ Defines default seasonalities for Silverkite estimator. Names should match those in SeasonalityEnum. The default order for various seasonalities is stored in this enum.
-
DAILY_SEASONALITY
: greykite.algo.forecast.silverkite.constants.silverkite_seasonality.SilverkiteSeasonality = SilverkiteSeasonality(name='tod', period=24.0, order=12, seas_names='daily', default_min_days=2)¶ tod
is 0-24 time of day (tod granularity based on input data, up to second level). Requires at least two full cycles to add the seasonal term (default_min_days=2
).
-
WEEKLY_SEASONALITY
: greykite.algo.forecast.silverkite.constants.silverkite_seasonality.SilverkiteSeasonality = SilverkiteSeasonality(name='tow', period=7.0, order=4, seas_names='weekly', default_min_days=14)¶ tow
is 0-7 time of week (tow granularity based on input data, up to second level).order=4
for full flexibility to model daily input.
-
MONTHLY_SEASONALITY
: greykite.algo.forecast.silverkite.constants.silverkite_seasonality.SilverkiteSeasonality = SilverkiteSeasonality(name='tom', period=1.0, order=2, seas_names='monthly', default_min_days=60)¶ tom
is 0-1 time of month (tom granularity based on input data, up to daily level).
-
QUARTERLY_SEASONALITY
: greykite.algo.forecast.silverkite.constants.silverkite_seasonality.SilverkiteSeasonality = SilverkiteSeasonality(name='toq', period=1.0, order=5, seas_names='quarterly', default_min_days=180)¶ toq
(continuous time of quarter) with natural period. Each day is mapped to a value in [0.0, 1.0) based on its position in the calendar quarter: (Jan1-Mar31, Apr1-Jun30, Jul1-Sep30, Oct1-Dec31). The start of each quarter is 0.0.
-
YEARLY_SEASONALITY
: greykite.algo.forecast.silverkite.constants.silverkite_seasonality.SilverkiteSeasonality = SilverkiteSeasonality(name='ct1', period=1.0, order=15, seas_names='yearly', default_min_days=548)¶ ct1
(continuous year) with natural period.
-
-
class
greykite.algo.forecast.silverkite.constants.silverkite_time_frequency.
SilverkiteTimeFrequencyEnum
(value)[source]¶ Provides properties for modeling for various time frequencies in Silverkite. The enum names is the time frequency, corresponding to the simple time frequencies in
SimpleTimeFrequencyEnum
.
Provides templates for SimpleSilverkiteEstimator that are pre-tuned to fit specific use cases.
A subset of these templates are recognized by ModelTemplateEnum.
simple_silverkite_template
also accepts any model_template
name that follows
the naming convention in this file. For details, see
the model_template
parameter in
SimpleSilverkiteTemplate
.
-
class
greykite.framework.templates.simple_silverkite_template_config.
SILVERKITE_FREQ
(value)[source]¶ Valid values for simple silverkite template string name frequency.
-
greykite.framework.templates.simple_silverkite_template_config.
VALID_FREQ
= ['HOURLY', 'DAILY', 'WEEKLY']¶ Valid non-default values for simple silverkite template string name frequency. These are the non-default frequencies recognized by
SimpleSilverkiteTemplateOptions
.
-
class
greykite.framework.templates.simple_silverkite_template_config.
SILVERKITE_SEAS
(value)[source]¶ Valid values for simple silverkite template string name seasonality.
-
class
greykite.framework.templates.simple_silverkite_template_config.
SILVERKITE_GR
(value)[source]¶ Valid values for simple silverkite template string name growth_term.
-
class
greykite.framework.templates.simple_silverkite_template_config.
SILVERKITE_CP
(value)[source]¶ Valid values for simple silverkite template string name changepoints_dict.
-
class
greykite.framework.templates.simple_silverkite_template_config.
SILVERKITE_HOL
(value)[source]¶ Valid values for simple silverkite template string name events.
-
class
greykite.framework.templates.simple_silverkite_template_config.
SILVERKITE_FEASET
(value)[source]¶ Valid values for simple silverkite template string name feature_sets_enabled.
-
class
greykite.framework.templates.simple_silverkite_template_config.
SILVERKITE_ALGO
(value)[source]¶ Valid values for simple silverkite template string name fit_algorithm.
-
class
greykite.framework.templates.simple_silverkite_template_config.
SILVERKITE_AR
(value)[source]¶ Valid values for simple silverkite template string name autoregression.
-
class
greykite.framework.templates.simple_silverkite_template_config.
SILVERKITE_DSI
(value)[source]¶ Valid values for simple silverkite template string name daily seasonality max interaction order.
-
class
greykite.framework.templates.simple_silverkite_template_config.
SILVERKITE_WSI
(value)[source]¶ Valid values for simple silverkite template string name weekly seasonality max interaction order.
-
class
greykite.framework.templates.simple_silverkite_template_config.
SILVERKITE_COMPONENT_KEYWORDS
(value)[source]¶ Valid values for simple silverkite template string name keywords. The names are the keywords and the values are the corresponding value enum. Can be used to create an instance of
SimpleSilverkiteTemplateOptions
.
-
class
greykite.framework.templates.simple_silverkite_template_config.
SimpleSilverkiteTemplateOptions
(freq: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_FREQ = <SILVERKITE_FREQ.DAILY: 'DAILY'>, seas: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_SEAS = <SILVERKITE_SEAS.LT: 'LT'>, gr: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_GR = <SILVERKITE_GR.LINEAR: 'LINEAR'>, cp: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_CP = <SILVERKITE_CP.NONE: 'NONE'>, hol: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_HOL = <SILVERKITE_HOL.NONE: 'NONE'>, feaset: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_FEASET = <SILVERKITE_FEASET.OFF: 'OFF'>, algo: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_ALGO = <SILVERKITE_ALGO.LINEAR: 'LINEAR'>, ar: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_AR = <SILVERKITE_AR.OFF: 'OFF'>, dsi: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_DSI = <SILVERKITE_DSI.AUTO: 'AUTO'>, wsi: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_WSI = <SILVERKITE_WSI.AUTO: 'AUTO'>)[source]¶ Defines generic simple silverkite template options.
Attributes can be set to different values using
SILVERKITE_COMPONENT_KEYWORDS
for high level tuning.freq
represents data frequency.The other attributes stand for seasonality, growth, changepoints_dict, events, feature_sets_enabled, fit_algorithm and autoregression in
ModelComponentsParam
, which are used inSimpleSilverkiteTemplate
.-
freq
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_FREQ = 'DAILY'¶ Valid values for simple silverkite template string name frequency. See
SILVERKITE_FREQ
.
-
seas
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_SEAS = 'LT'¶ Valid values for simple silverkite template string name seasonality. See
SILVERKITE_SEAS
.
-
gr
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_GR = 'LINEAR'¶ Valid values for simple silverkite template string name growth. See
SILVERKITE_GR
.
-
cp
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_CP = 'NONE'¶ Valid values for simple silverkite template string name changepoints. See
SILVERKITE_CP
.
-
hol
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_HOL = 'NONE'¶ Valid values for simple silverkite template string name holiday. See
SILVERKITE_HOL
.
-
feaset
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_FEASET = 'OFF'¶ Valid values for simple silverkite template string name feature sets enabled. See
SILVERKITE_FEASET
.
-
algo
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_ALGO = 'LINEAR'¶ Valid values for simple silverkite template string name fit algorithm. See
SILVERKITE_ALGO
.
-
ar
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_AR = 'OFF'¶ Valid values for simple silverkite template string name autoregression. See
SILVERKITE_AR
.
-
dsi
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_DSI = 'AUTO'¶ Valid values for simple silverkite template string name max daily seasonality interaction order. See
SILVERKITE_DSI
.
-
wsi
: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_WSI = 'AUTO'¶ Valid values for simple silverkite template string name max weekly seasonality interaction order. See
SILVERKITE_WSI
.
-
-
greykite.framework.templates.simple_silverkite_template_config.
COMMON_MODELCOMPONENTPARAM_PARAMETERS
= {'ALGO': {'LASSO': {'fit_algorithm': 'lasso', 'fit_algorithm_params': None}, 'LINEAR': {'fit_algorithm': 'linear', 'fit_algorithm_params': None}, 'RIDGE': {'fit_algorithm': 'ridge', 'fit_algorithm_params': None}, 'SGD': {'fit_algorithm': 'sgd', 'fit_algorithm_params': None}}, 'AR': {'AUTO': {'autoreg_dict': 'auto', 'simulation_num': 10}, 'OFF': {'autoreg_dict': None, 'simulation_num': 10}}, 'CP': {'DAILY': {'HV': {'method': 'auto', 'no_changepoint_distance_from_end': '180D', 'potential_changepoint_distance': '15D', 'regularization_strength': 0.3, 'resample_freq': '7D', 'yearly_seasonality_change_freq': '365D', 'yearly_seasonality_order': 15}, 'LT': {'method': 'auto', 'no_changepoint_distance_from_end': '90D', 'potential_changepoint_distance': '15D', 'regularization_strength': 0.6, 'resample_freq': '7D', 'yearly_seasonality_change_freq': None, 'yearly_seasonality_order': 15}, 'NM': {'method': 'auto', 'no_changepoint_distance_from_end': '180D', 'potential_changepoint_distance': '15D', 'regularization_strength': 0.5, 'resample_freq': '7D', 'yearly_seasonality_change_freq': '365D', 'yearly_seasonality_order': 15}, 'NONE': None}, 'HOURLY': {'HV': {'method': 'auto', 'no_changepoint_distance_from_end': '30D', 'potential_changepoint_distance': '15D', 'regularization_strength': 0.3, 'resample_freq': 'D', 'yearly_seasonality_change_freq': '365D', 'yearly_seasonality_order': 15}, 'LT': {'method': 'auto', 'no_changepoint_distance_from_end': '30D', 'potential_changepoint_distance': '7D', 'regularization_strength': 0.6, 'resample_freq': 'D', 'yearly_seasonality_change_freq': None, 'yearly_seasonality_order': 15}, 'NM': {'method': 'auto', 'no_changepoint_distance_from_end': '30D', 'potential_changepoint_distance': '15D', 'regularization_strength': 0.5, 'resample_freq': 'D', 'yearly_seasonality_change_freq': '365D', 'yearly_seasonality_order': 15}, 'NONE': None}, 'WEEKLY': {'HV': {'method': 'auto', 'no_changepoint_distance_from_end': '180D', 'potential_changepoint_distance': '14D', 'regularization_strength': 0.3, 'resample_freq': '7D', 'yearly_seasonality_change_freq': '365D', 'yearly_seasonality_order': 15}, 'LT': {'method': 'auto', 'no_changepoint_distance_from_end': '180D', 'potential_changepoint_distance': '14D', 'regularization_strength': 0.6, 'resample_freq': '7D', 'yearly_seasonality_change_freq': None, 'yearly_seasonality_order': 15}, 'NM': {'method': 'auto', 'no_changepoint_distance_from_end': '180D', 'potential_changepoint_distance': '14D', 'regularization_strength': 0.5, 'resample_freq': '7D', 'yearly_seasonality_change_freq': '365D', 'yearly_seasonality_order': 15}, 'NONE': None}}, 'DSI': {'DAILY': {'AUTO': 0, 'OFF': 0}, 'HOURLY': {'AUTO': 5, 'OFF': 0}, 'WEEKLY': {'AUTO': 0, 'OFF': 0}}, 'FEASET': {'AUTO': 'auto', 'OFF': False, 'ON': True}, 'GR': {'LINEAR': {'growth_term': 'linear'}, 'NONE': {'growth_term': None}}, 'HOL': {'NONE': {'daily_event_df_dict': None, 'holiday_lookup_countries': [], 'holiday_post_num_days': 0, 'holiday_pre_num_days': 0, 'holiday_pre_post_num_dict': None, 'holidays_to_model_separately': []}, 'SP1': {'daily_event_df_dict': None, 'holiday_lookup_countries': 'auto', 'holiday_post_num_days': 1, 'holiday_pre_num_days': 1, 'holiday_pre_post_num_dict': None, 'holidays_to_model_separately': 'auto'}, 'SP2': {'daily_event_df_dict': None, 'holiday_lookup_countries': 'auto', 'holiday_post_num_days': 2, 'holiday_pre_num_days': 2, 'holiday_pre_post_num_dict': None, 'holidays_to_model_separately': 'auto'}, 'SP4': {'daily_event_df_dict': None, 'holiday_lookup_countries': 'auto', 'holiday_post_num_days': 4, 'holiday_pre_num_days': 4, 'holiday_pre_post_num_dict': None, 'holidays_to_model_separately': 'auto'}, 'TG': {'daily_event_df_dict': None, 'holiday_lookup_countries': 'auto', 'holiday_post_num_days': 3, 'holiday_pre_num_days': 3, 'holiday_pre_post_num_dict': None, 'holidays_to_model_separately': []}}, 'SEAS': {'DAILY': {'HV': {'daily_seasonality': 0, 'monthly_seasonality': 0, 'quarterly_seasonality': 0, 'weekly_seasonality': 4, 'yearly_seasonality': 25}, 'HVQM': {'daily_seasonality': 0, 'monthly_seasonality': 4, 'quarterly_seasonality': 6, 'weekly_seasonality': 4, 'yearly_seasonality': 25}, 'LT': {'daily_seasonality': 0, 'monthly_seasonality': 0, 'quarterly_seasonality': 0, 'weekly_seasonality': 3, 'yearly_seasonality': 8}, 'LTQM': {'daily_seasonality': 0, 'monthly_seasonality': 2, 'quarterly_seasonality': 3, 'weekly_seasonality': 3, 'yearly_seasonality': 8}, 'NM': {'daily_seasonality': 0, 'monthly_seasonality': 0, 'quarterly_seasonality': 0, 'weekly_seasonality': 3, 'yearly_seasonality': 15}, 'NMQM': {'daily_seasonality': 0, 'monthly_seasonality': 4, 'quarterly_seasonality': 4, 'weekly_seasonality': 3, 'yearly_seasonality': 15}, 'NONE': {'daily_seasonality': 0, 'monthly_seasonality': 0, 'quarterly_seasonality': 0, 'weekly_seasonality': 0, 'yearly_seasonality': 0}}, 'HOURLY': {'HV': {'daily_seasonality': 12, 'monthly_seasonality': 0, 'quarterly_seasonality': 0, 'weekly_seasonality': 6, 'yearly_seasonality': 25}, 'HVQM': {'daily_seasonality': 12, 'monthly_seasonality': 4, 'quarterly_seasonality': 4, 'weekly_seasonality': 6, 'yearly_seasonality': 25}, 'LT': {'daily_seasonality': 5, 'monthly_seasonality': 0, 'quarterly_seasonality': 0, 'weekly_seasonality': 3, 'yearly_seasonality': 8}, 'LTQM': {'daily_seasonality': 5, 'monthly_seasonality': 2, 'quarterly_seasonality': 2, 'weekly_seasonality': 3, 'yearly_seasonality': 8}, 'NM': {'daily_seasonality': 8, 'monthly_seasonality': 0, 'quarterly_seasonality': 0, 'weekly_seasonality': 4, 'yearly_seasonality': 15}, 'NMQM': {'daily_seasonality': 8, 'monthly_seasonality': 3, 'quarterly_seasonality': 3, 'weekly_seasonality': 4, 'yearly_seasonality': 15}, 'NONE': {'daily_seasonality': 0, 'monthly_seasonality': 0, 'quarterly_seasonality': 0, 'weekly_seasonality': 0, 'yearly_seasonality': 0}}, 'WEEKLY': {'HV': {'daily_seasonality': 0, 'monthly_seasonality': 0, 'quarterly_seasonality': 0, 'weekly_seasonality': 0, 'yearly_seasonality': 25}, 'HVQM': {'daily_seasonality': 0, 'monthly_seasonality': 4, 'quarterly_seasonality': 4, 'weekly_seasonality': 0, 'yearly_seasonality': 25}, 'LT': {'daily_seasonality': 0, 'monthly_seasonality': 0, 'quarterly_seasonality': 0, 'weekly_seasonality': 0, 'yearly_seasonality': 8}, 'LTQM': {'daily_seasonality': 0, 'monthly_seasonality': 2, 'quarterly_seasonality': 2, 'weekly_seasonality': 0, 'yearly_seasonality': 8}, 'NM': {'daily_seasonality': 0, 'monthly_seasonality': 0, 'quarterly_seasonality': 0, 'weekly_seasonality': 0, 'yearly_seasonality': 15}, 'NMQM': {'daily_seasonality': 0, 'monthly_seasonality': 3, 'quarterly_seasonality': 3, 'weekly_seasonality': 0, 'yearly_seasonality': 15}, 'NONE': {'daily_seasonality': 0, 'monthly_seasonality': 0, 'quarterly_seasonality': 0, 'weekly_seasonality': 0, 'yearly_seasonality': 0}}}, 'WSI': {'DAILY': {'AUTO': 2, 'OFF': 0}, 'HOURLY': {'AUTO': 2, 'OFF': 0}, 'WEEKLY': {'AUTO': 0, 'OFF': 0}}}¶ Defines the default component values for
SimpleSilverkiteTemplate
. The components include seasonality, growth, holiday, trend changepoints, feature sets, autoregression, fit algorithm, etc. These are used when config.model_template provides theSimpleSilverkiteTemplateOptions
.
-
greykite.framework.templates.simple_silverkite_template_config.
SILVERKITE
= ModelComponentsParam(autoregression={'autoreg_dict': None, 'simulation_num': 10}, changepoints={'changepoints_dict': None, 'seasonality_changepoints_dict': None}, custom={'fit_algorithm_dict': {'fit_algorithm': 'ridge', 'fit_algorithm_params': None}, 'feature_sets_enabled': 'auto', 'max_daily_seas_interaction_order': 5, 'max_weekly_seas_interaction_order': 2, 'extra_pred_cols': [], 'drop_pred_cols': None, 'explicit_pred_cols': None, 'min_admissible_value': None, 'max_admissible_value': None, 'regression_weight_col': None, 'normalize_method': None}, events={'holidays_to_model_separately': 'auto', 'holiday_lookup_countries': 'auto', 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, growth={'growth_term': 'linear'}, hyperparameter_override=None, regressors={'regressor_cols': []}, lagged_regressors={'lagged_regressor_dict': None}, seasonality={'yearly_seasonality': 'auto', 'quarterly_seasonality': 'auto', 'monthly_seasonality': 'auto', 'weekly_seasonality': 'auto', 'daily_seasonality': 'auto'}, uncertainty={'uncertainty_dict': None})¶ Defines the
SILVERKITE
template. Contains automatic growth, seasonality, holidays, and interactions. Does not include autoregression. Best for hourly and daily frequencies. Uses SimpleSilverkiteEstimator.
-
greykite.framework.templates.simple_silverkite_template_config.
SILVERKITE_WITH_AR
= ModelComponentsParam(autoregression={'autoreg_dict': 'auto', 'simulation_num': 10}, changepoints={'changepoints_dict': None, 'seasonality_changepoints_dict': None}, custom={'fit_algorithm_dict': {'fit_algorithm': 'ridge', 'fit_algorithm_params': None}, 'feature_sets_enabled': 'auto', 'max_daily_seas_interaction_order': 5, 'max_weekly_seas_interaction_order': 2, 'extra_pred_cols': [], 'drop_pred_cols': None, 'explicit_pred_cols': None, 'min_admissible_value': None, 'max_admissible_value': None, 'regression_weight_col': None, 'normalize_method': None}, events={'holidays_to_model_separately': 'auto', 'holiday_lookup_countries': 'auto', 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, growth={'growth_term': 'linear'}, hyperparameter_override=None, regressors={'regressor_cols': []}, lagged_regressors={'lagged_regressor_dict': None}, seasonality={'yearly_seasonality': 'auto', 'quarterly_seasonality': 'auto', 'monthly_seasonality': 'auto', 'weekly_seasonality': 'auto', 'daily_seasonality': 'auto'}, uncertainty={'uncertainty_dict': None})¶ Defines the
SILVERKITE_WITH_AR
template. Has the same config asSILVERKITE
except for adding autoregression. Best for short-term daily forecasts. Uses SimpleSilverkiteEstimator.
-
greykite.framework.templates.simple_silverkite_template_config.
SILVERKITE_DAILY_1
= ['SILVERKITE_DAILY_1_CONFIG_1', 'SILVERKITE_DAILY_1_CONFIG_2', 'SILVERKITE_DAILY_1_CONFIG_3']¶ Defines the
SILVERKITE_DAILY_1
template, which contains 3 candidate configs for grid search, optimized for the seasonality and changepoint parameters. Best for 1-day forecast for daily time series. Uses SimpleSilverkiteEstimator.
-
greykite.framework.templates.simple_silverkite_template_config.
MULTI_TEMPLATES
= {'SILVERKITE_DAILY_1': ['SILVERKITE_DAILY_1_CONFIG_1', 'SILVERKITE_DAILY_1_CONFIG_2', 'SILVERKITE_DAILY_1_CONFIG_3'], 'SILVERKITE_DAILY_90': ['DAILY_SEAS_LTQM_GR_LINEAR_CP_LT_HOL_SP2_FEASET_AUTO_ALGO_LINEAR_AR_OFF_DSI_AUTO_WSI_AUTO', 'DAILY_SEAS_LTQM_GR_LINEAR_CP_NONE_HOL_SP2_FEASET_AUTO_ALGO_LINEAR_AR_OFF_DSI_AUTO_WSI_AUTO', 'DAILY_SEAS_LTQM_GR_LINEAR_CP_LT_HOL_SP2_FEASET_AUTO_ALGO_RIDGE_AR_OFF_DSI_AUTO_WSI_AUTO', 'DAILY_SEAS_NM_GR_LINEAR_CP_LT_HOL_SP4_FEASET_AUTO_ALGO_RIDGE_AR_OFF_DSI_AUTO_WSI_AUTO'], 'SILVERKITE_HOURLY_1': ['HOURLY_SEAS_LT_GR_LINEAR_CP_NONE_HOL_TG_FEASET_AUTO_ALGO_LINEAR_AR_AUTO', 'HOURLY_SEAS_NM_GR_LINEAR_CP_LT_HOL_SP4_FEASET_AUTO_ALGO_LINEAR_AR_AUTO', 'HOURLY_SEAS_LT_GR_LINEAR_CP_NM_HOL_SP4_FEASET_OFF_ALGO_RIDGE_AR_AUTO', 'HOURLY_SEAS_NM_GR_LINEAR_CP_NM_HOL_SP1_FEASET_AUTO_ALGO_RIDGE_AR_AUTO'], 'SILVERKITE_HOURLY_168': ['HOURLY_SEAS_LT_GR_LINEAR_CP_LT_HOL_SP4_FEASET_AUTO_ALGO_RIDGE_AR_OFF', 'HOURLY_SEAS_LT_GR_LINEAR_CP_LT_HOL_SP2_FEASET_AUTO_ALGO_RIDGE_AR_OFF', 'HOURLY_SEAS_NM_GR_LINEAR_CP_NONE_HOL_SP4_FEASET_OFF_ALGO_LINEAR_AR_AUTO', 'HOURLY_SEAS_NM_GR_LINEAR_CP_NM_HOL_SP1_FEASET_AUTO_ALGO_RIDGE_AR_OFF'], 'SILVERKITE_HOURLY_24': ['HOURLY_SEAS_LT_GR_LINEAR_CP_NM_HOL_SP4_FEASET_AUTO_ALGO_RIDGE_AR_AUTO', 'HOURLY_SEAS_LT_GR_LINEAR_CP_NONE_HOL_SP4_FEASET_AUTO_ALGO_RIDGE_AR_AUTO', 'HOURLY_SEAS_NM_GR_LINEAR_CP_LT_HOL_SP1_FEASET_OFF_ALGO_LINEAR_AR_AUTO', 'HOURLY_SEAS_NM_GR_LINEAR_CP_NM_HOL_SP4_FEASET_AUTO_ALGO_RIDGE_AR_AUTO'], 'SILVERKITE_HOURLY_336': ['HOURLY_SEAS_LT_GR_LINEAR_CP_LT_HOL_SP2_FEASET_AUTO_ALGO_RIDGE_AR_OFF', 'HOURLY_SEAS_LT_GR_LINEAR_CP_LT_HOL_SP4_FEASET_AUTO_ALGO_RIDGE_AR_OFF', 'HOURLY_SEAS_NM_GR_LINEAR_CP_LT_HOL_SP1_FEASET_AUTO_ALGO_LINEAR_AR_OFF', 'HOURLY_SEAS_NM_GR_LINEAR_CP_NM_HOL_SP1_FEASET_AUTO_ALGO_LINEAR_AR_AUTO'], 'SILVERKITE_WEEKLY': ['WEEKLY_SEAS_NM_GR_LINEAR_CP_NONE_HOL_NONE_FEASET_OFF_ALGO_LINEAR_AR_OFF_DSI_AUTO_WSI_AUTO', 'WEEKLY_SEAS_NM_GR_LINEAR_CP_LT_HOL_NONE_FEASET_OFF_ALGO_LINEAR_AR_OFF_DSI_AUTO_WSI_AUTO', 'WEEKLY_SEAS_HV_GR_LINEAR_CP_NM_HOL_NONE_FEASET_OFF_ALGO_RIDGE_AR_OFF_DSI_AUTO_WSI_AUTO', 'WEEKLY_SEAS_HV_GR_LINEAR_CP_LT_HOL_NONE_FEASET_OFF_ALGO_RIDGE_AR_OFF_DSI_AUTO_WSI_AUTO']}¶ A dictionary of multi templates.
Keys are the available multi templates names (valid strings for config.model_template).
Values correspond to a list of
ModelComponentsParam
.
-
greykite.framework.templates.simple_silverkite_template_config.
SINGLE_MODEL_TEMPLATE_TYPE
¶ Types accepted by SimpleSilverkiteTemplate for
config.model_template
for a single template.alias of Union[str, greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, greykite.framework.templates.simple_silverkite_template_config.SimpleSilverkiteTemplateOptions]
-
class
greykite.framework.templates.simple_silverkite_template_config.
SimpleSilverkiteTemplateConstants
(COMMON_MODELCOMPONENTPARAM_PARAMETERS: Dict = <factory>, MULTI_TEMPLATES: Dict = <factory>, SILVERKITE: Union[str, greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, greykite.framework.templates.simple_silverkite_template_config.SimpleSilverkiteTemplateOptions] = ModelComponentsParam(autoregression={'autoreg_dict': None, 'simulation_num': 10}, changepoints={'changepoints_dict': None, 'seasonality_changepoints_dict': None}, custom={'fit_algorithm_dict': {'fit_algorithm': 'ridge', 'fit_algorithm_params': None}, 'feature_sets_enabled': 'auto', 'max_daily_seas_interaction_order': 5, 'max_weekly_seas_interaction_order': 2, 'extra_pred_cols': [], 'drop_pred_cols': None, 'explicit_pred_cols': None, 'min_admissible_value': None, 'max_admissible_value': None, 'regression_weight_col': None, 'normalize_method': None}, events={'holidays_to_model_separately': 'auto', 'holiday_lookup_countries': 'auto', 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, growth={'growth_term': 'linear'}, hyperparameter_override=None, regressors={'regressor_cols': []}, lagged_regressors={'lagged_regressor_dict': None}, seasonality={'yearly_seasonality': 'auto', 'quarterly_seasonality': 'auto', 'monthly_seasonality': 'auto', 'weekly_seasonality': 'auto', 'daily_seasonality': 'auto'}, uncertainty={'uncertainty_dict': None}), SILVERKITE_WITH_AR: Union[str, greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, greykite.framework.templates.simple_silverkite_template_config.SimpleSilverkiteTemplateOptions] = ModelComponentsParam(autoregression={'autoreg_dict': 'auto', 'simulation_num': 10}, changepoints={'changepoints_dict': None, 'seasonality_changepoints_dict': None}, custom={'fit_algorithm_dict': {'fit_algorithm': 'ridge', 'fit_algorithm_params': None}, 'feature_sets_enabled': 'auto', 'max_daily_seas_interaction_order': 5, 'max_weekly_seas_interaction_order': 2, 'extra_pred_cols': [], 'drop_pred_cols': None, 'explicit_pred_cols': None, 'min_admissible_value': None, 'max_admissible_value': None, 'regression_weight_col': None, 'normalize_method': None}, events={'holidays_to_model_separately': 'auto', 'holiday_lookup_countries': 'auto', 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, growth={'growth_term': 'linear'}, hyperparameter_override=None, regressors={'regressor_cols': []}, lagged_regressors={'lagged_regressor_dict': None}, seasonality={'yearly_seasonality': 'auto', 'quarterly_seasonality': 'auto', 'monthly_seasonality': 'auto', 'weekly_seasonality': 'auto', 'daily_seasonality': 'auto'}, uncertainty={'uncertainty_dict': None}), SILVERKITE_DAILY_1_CONFIG_1: Union[str, greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, greykite.framework.templates.simple_silverkite_template_config.SimpleSilverkiteTemplateOptions] = ModelComponentsParam(autoregression={'autoreg_dict': 'auto', 'simulation_num': 10}, changepoints={'changepoints_dict': {'method': 'auto', 'resample_freq': '7D', 'regularization_strength': 0.809, 'potential_changepoint_distance': '7D', 'no_changepoint_distance_from_end': '7D', 'yearly_seasonality_order': 8, 'yearly_seasonality_change_freq': None}, 'seasonality_changepoints_dict': None}, custom={'fit_algorithm_dict': {'fit_algorithm': 'ridge', 'fit_algorithm_params': None}, 'feature_sets_enabled': 'auto', 'max_daily_seas_interaction_order': 5, 'max_weekly_seas_interaction_order': 2, 'extra_pred_cols': [], 'drop_pred_cols': None, 'explicit_pred_cols': None, 'min_admissible_value': None, 'max_admissible_value': None, 'regression_weight_col': None, 'normalize_method': None}, events={'holidays_to_model_separately': ("New Year's Day", 'Chinese New Year', 'Christmas Day', 'Independence Day', 'Thanksgiving', 'Labor Day', 'Good Friday', 'Easter Monday [England, Wales, Northern Ireland]', 'Memorial Day', 'Veterans Day'), 'holiday_lookup_countries': ('UnitedStates', 'UnitedKingdom', 'India', 'France', 'China'), 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, growth={'growth_term': 'linear'}, hyperparameter_override=None, regressors={'regressor_cols': []}, lagged_regressors={'lagged_regressor_dict': None}, seasonality={'yearly_seasonality': 8, 'quarterly_seasonality': 0, 'monthly_seasonality': 7, 'weekly_seasonality': 1, 'daily_seasonality': 0}, uncertainty={'uncertainty_dict': None}), SILVERKITE_DAILY_1_CONFIG_2: Union[str, greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, greykite.framework.templates.simple_silverkite_template_config.SimpleSilverkiteTemplateOptions] = ModelComponentsParam(autoregression={'autoreg_dict': 'auto', 'simulation_num': 10}, changepoints={'changepoints_dict': {'method': 'auto', 'resample_freq': '7D', 'regularization_strength': 0.624, 'potential_changepoint_distance': '7D', 'no_changepoint_distance_from_end': '17D', 'yearly_seasonality_order': 1, 'yearly_seasonality_change_freq': None}, 'seasonality_changepoints_dict': None}, custom={'fit_algorithm_dict': {'fit_algorithm': 'ridge', 'fit_algorithm_params': None}, 'feature_sets_enabled': 'auto', 'max_daily_seas_interaction_order': 5, 'max_weekly_seas_interaction_order': 2, 'extra_pred_cols': [], 'drop_pred_cols': None, 'explicit_pred_cols': None, 'min_admissible_value': None, 'max_admissible_value': None, 'regression_weight_col': None, 'normalize_method': None}, events={'holidays_to_model_separately': ("New Year's Day", 'Chinese New Year', 'Christmas Day', 'Independence Day', 'Thanksgiving', 'Labor Day', 'Good Friday', 'Easter Monday [England, Wales, Northern Ireland]', 'Memorial Day', 'Veterans Day'), 'holiday_lookup_countries': ('UnitedStates', 'UnitedKingdom', 'India', 'France', 'China'), 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, growth={'growth_term': 'linear'}, hyperparameter_override=None, regressors={'regressor_cols': []}, lagged_regressors={'lagged_regressor_dict': None}, seasonality={'yearly_seasonality': 1, 'quarterly_seasonality': 0, 'monthly_seasonality': 4, 'weekly_seasonality': 6, 'daily_seasonality': 0}, uncertainty={'uncertainty_dict': None}), SILVERKITE_DAILY_1_CONFIG_3: Union[str, greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, greykite.framework.templates.simple_silverkite_template_config.SimpleSilverkiteTemplateOptions] = ModelComponentsParam(autoregression={'autoreg_dict': 'auto', 'simulation_num': 10}, changepoints={'changepoints_dict': {'method': 'auto', 'resample_freq': '7D', 'regularization_strength': 0.59, 'potential_changepoint_distance': '7D', 'no_changepoint_distance_from_end': '8D', 'yearly_seasonality_order': 40, 'yearly_seasonality_change_freq': None}, 'seasonality_changepoints_dict': None}, custom={'fit_algorithm_dict': {'fit_algorithm': 'ridge', 'fit_algorithm_params': None}, 'feature_sets_enabled': 'auto', 'max_daily_seas_interaction_order': 5, 'max_weekly_seas_interaction_order': 2, 'extra_pred_cols': [], 'drop_pred_cols': None, 'explicit_pred_cols': None, 'min_admissible_value': None, 'max_admissible_value': None, 'regression_weight_col': None, 'normalize_method': None}, events={'holidays_to_model_separately': ("New Year's Day", 'Chinese New Year', 'Christmas Day', 'Independence Day', 'Thanksgiving', 'Labor Day', 'Good Friday', 'Easter Monday [England, Wales, Northern Ireland]', 'Memorial Day', 'Veterans Day'), 'holiday_lookup_countries': ('UnitedStates', 'UnitedKingdom', 'India', 'France', 'China'), 'holiday_pre_num_days': 2, 'holiday_post_num_days': 2, 'holiday_pre_post_num_dict': None, 'daily_event_df_dict': None}, growth={'growth_term': 'linear'}, hyperparameter_override=None, regressors={'regressor_cols': []}, lagged_regressors={'lagged_regressor_dict': None}, seasonality={'yearly_seasonality': 40, 'quarterly_seasonality': 0, 'monthly_seasonality': 0, 'weekly_seasonality': 2, 'daily_seasonality': 0}, uncertainty={'uncertainty_dict': None}), SILVERKITE_COMPONENT_KEYWORDS: Type[enum.Enum] = <enum 'SILVERKITE_COMPONENT_KEYWORDS'>, SILVERKITE_EMPTY: Union[str, greykite.framework.templates.autogen.forecast_config.ModelComponentsParam, greykite.framework.templates.simple_silverkite_template_config.SimpleSilverkiteTemplateOptions] = 'DAILY_SEAS_NONE_GR_NONE_CP_NONE_HOL_NONE_FEASET_OFF_ALGO_LINEAR_AR_OFF_DSI_OFF_WSI_OFF', VALID_FREQ: List = <factory>, SimpleSilverkiteTemplateOptions: dataclasses.dataclass = <class 'greykite.framework.templates.simple_silverkite_template_config.SimpleSilverkiteTemplateOptions'>)[source]¶ Constants used by
SimpleSilverkiteTemplate
. Includes the model templates and their default values.mutable_field
is used when the default value is a mutable type like dict and list. Dataclass requires mutable default values to be wrapped in ‘default_factory’, so that instances of this dataclass cannot accidentally modify the default value.mutable_field
wraps the constant accordingly.-
COMMON_MODELCOMPONENTPARAM_PARAMETERS
¶ Defines the default component values for
SimpleSilverkiteTemplate
. The components include seasonality, growth, holiday, trend changepoints, feature sets, autoregression, fit algorithm, etc. These are used when config.model_template provides theSimpleSilverkiteTemplateOptions
.
-
MULTI_TEMPLATES
¶ A dictionary of multi templates.
Keys are the available multi templates names (valid strings for config.model_template).
Values correspond to a list of
ModelComponentsParam
.
-
SILVERKITE
¶ Defines the
"SILVERKITE"
template. Contains automatic growth, seasonality, holidays, and interactions. Does not include autoregression. Best for hourly and daily frequencies. Uses SimpleSilverkiteEstimator.
-
SILVERKITE_WITH_AR
¶ Defines the
SILVERKITE_WITH_AR
template. Has the same config asSILVERKITE
except for adding autoregression. Best for short-term daily forecasts. Uses SimpleSilverkiteEstimator.
-
SILVERKITE_DAILY_1_CONFIG_1
¶ Config 1 in template
SILVERKITE_DAILY_1
. Compared toSILVERKITE
, it adds change points and uses parameters specifically tuned for daily data and 1-day forecast.
-
SILVERKITE_DAILY_1_CONFIG_2
¶ Config 2 in template
SILVERKITE_DAILY_1
. Compared toSILVERKITE
, it adds change points and uses parameters specifically tuned for daily data and 1-day forecast.
-
SILVERKITE_DAILY_1_CONFIG_3
¶ Config 3 in template
SILVERKITE_DAILY_1
. Compared toSILVERKITE
, it adds change points and uses parameters specifically tuned for daily data and 1-day forecast.
-
class
SILVERKITE_COMPONENT_KEYWORDS
(value)¶ Valid values for simple silverkite template string name keywords. The names are the keywords and the values are the corresponding value enum. Can be used to create an instance of
SimpleSilverkiteTemplateOptions
.
-
SILVERKITE_EMPTY
¶ Defines the
"SILVERKITE_EMPTY"
template. Everything here is None or off.
-
VALID_FREQ
¶ Valid non-default values for simple silverkite template string name frequency.
SimpleSilverkiteTemplateOptions
.
-
class
SimpleSilverkiteTemplateOptions
(freq: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_FREQ = <SILVERKITE_FREQ.DAILY: 'DAILY'>, seas: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_SEAS = <SILVERKITE_SEAS.LT: 'LT'>, gr: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_GR = <SILVERKITE_GR.LINEAR: 'LINEAR'>, cp: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_CP = <SILVERKITE_CP.NONE: 'NONE'>, hol: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_HOL = <SILVERKITE_HOL.NONE: 'NONE'>, feaset: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_FEASET = <SILVERKITE_FEASET.OFF: 'OFF'>, algo: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_ALGO = <SILVERKITE_ALGO.LINEAR: 'LINEAR'>, ar: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_AR = <SILVERKITE_AR.OFF: 'OFF'>, dsi: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_DSI = <SILVERKITE_DSI.AUTO: 'AUTO'>, wsi: greykite.framework.templates.simple_silverkite_template_config.SILVERKITE_WSI = <SILVERKITE_WSI.AUTO: 'AUTO'>)¶ Defines generic simple silverkite template options. Attributes can be set to different values using
SILVERKITE_COMPONENT_KEYWORDS
for high level tuning.
-
Changepoint Detection¶
-
class
greykite.algo.changepoint.adalasso.changepoint_detector.
ChangepointDetector
[source]¶ A class to implement change point detection.
Currently supports long-term change point detection only. Input is a dataframe with time_col indicating the column of time info (the format should be able to be parsed by pd.to_datetime), and value_col indicating the column of observed time series values.
-
original_df
¶ The original data df, used to retrieve original observations, if aggregation is used in fitting change points.
- Type
-
time_col
¶ The column name for time column.
- Type
str
-
value_col
¶ The column name for value column.
- Type
str
-
trend_potential_changepoint_n
¶ The number of change points that are evenly distributed over the time period.
- Type
int
-
yearly_seasonality_order
¶ The yearly seasonality order used when fitting trend.
- Type
int
-
y
¶ The observations after aggregation.
- Type
-
trend_df
¶ The augmented df of the original_df, including regressors of trend change points and Fourier series for yearly seasonality.
- Type
-
trend_model
¶ The fitted trend model.
- Type
sklearn.base.RegressionMixin
-
trend_coef
¶ The estimated trend coefficients.
- Type
-
trend_intercept
¶ The estimated trend intercept.
- Type
float
-
adaptive_lasso_coef
¶ The list of length two, first element is estimated trend coefficients, and second element is intercept, both estimated by adaptive lasso.
- Type
list
-
trend_changepoints
¶ The list of detected trend change points, parsable by pd.to_datetime
- Type
list
-
trend_estimation
¶ The estimated trend with detected trend change points.
- Type
pd.Series
-
seasonality_df
¶ The augmented df of
original_df
, including regressors of seasonality change points with different Fourier series frequencies.- Type
-
seasonality_changepoints
¶ The dictionary of detected seasonality change points for each component. Keys are component names, and values are list of change points.
- Type
dict
-
seasonality_estimation
¶ The estimated seasonality with detected seasonality change points. The series has the same length as
original_df
. Index is timestamp, and values are the estimated seasonality at each timestamp. The seasonality estimation is the estimated of seasonality effect with trend estimated byestimate_trend_with_detected_changepoints
removed.- Type
-
find_trend_changepoints : callable
Finds the potential trend change points for a given time series df.
-
plot : callable
Plot the results after implementing find_trend_changepoints.
-
find_trend_changepoints
(df, time_col, value_col, yearly_seasonality_order=8, yearly_seasonality_change_freq=None, resample_freq='D', trend_estimator='ridge', adaptive_lasso_initial_estimator='ridge', regularization_strength=None, actual_changepoint_min_distance='30D', potential_changepoint_distance=None, potential_changepoint_n=100, no_changepoint_distance_from_begin=None, no_changepoint_proportion_from_begin=0.0, no_changepoint_distance_from_end=None, no_changepoint_proportion_from_end=0.0)[source]¶ Finds trend change points automatically by adaptive lasso.
The algorithm does an aggregation with a user-defined frequency, defaults daily.
If
potential_changepoint_distance
is not given,potential_changepoint_n
potential change points are evenly distributed over the time period, elsepotential_changepoint_n
is overridden by:total_time_length / ``potential_changepoint_distance``
Users can specify either
no_changepoint_proportion_from_end
to specify what proportion from the end of data they do not want changepoints, orno_changepoint_distance_from_end
(overridesno_changepoint_proportion_from_end
) to specify how long from the end they do not want change points.Then all potential change points will be selected by adaptive lasso, with the initial estimator specified by
adaptive_lasso_initial_estimator
. If user specifiesregularization_strength
, then the adaptive lasso will be run with a single tuning parameter calculated based on user provided prior, else a cross-validation will be run to automatically select the tuning parameter.A yearly seasonality is also fitted at the same time, preventing trend from catching yearly periodical changes.
A rule-based guard function is applied at the end to ensure change points are not too close, as specified by
actual_changepoint_min_distance
.- Parameters
df (
pandas.DataFrame
) – The data dftime_col (str) – Time column name in
df
value_col (str) – Value column name in
df
yearly_seasonality_order (int, default 8) – Fourier series order to capture yearly seasonality.
yearly_seasonality_change_freq (DateOffset, Timedelta or str or None, default None) –
How often to change the yearly seasonality model. Set to None to disable this feature.
This is useful if you have more than 2.5 years of data and the detected trend without this feature is inaccurate because yearly seasonality changes over the training period. Modeling yearly seasonality separately over the each period can prevent trend changepoints from fitting changes in yearly seasonality. For example, if you have 2.5 years of data and yearly seasonality increases in magnitude after the first year, setting this parameter to “365D” will model each year’s yearly seasonality differently and capture both shapes. However, without this feature, both years will have the same yearly seasonality, roughly the average effect across the training set.
Note that if you use str as input, the maximal supported unit is day, i.e., you might use “200D” but not “12M” or “1Y”.
resample_freq (DateOffset, Timedelta or str, default “D”.) – The frequency to aggregate data. Coarser aggregation leads to fitting longer term trends.
trend_estimator (str in [“ridge”, “lasso” or “ols”], default “ridge”.) – The estimator to estimate trend. The estimated trend is only for plotting purposes. ‘ols’ is not recommended when
yearly_seasonality_order
is specified other than 0, because significant over-fitting will happen. In this case, the given value is overridden by “ridge”.adaptive_lasso_initial_estimator (str in [“ridge”, “lasso” or “ols”], default “ridge”.) – The initial estimator to compute adaptive lasso weights
regularization_strength (float in [0, 1] or None) – The regularization for change points. Greater value implies fewer change points. 0 indicates all change points, and 1 indicates no change point. If None, the turning parameter will be selected by cross-validation. If a value is given, it will be used as the tuning parameter.
actual_changepoint_min_distance (DateOffset, Timedelta or str, default “30D”) – The minimal distance allowed between detected change points. If consecutive change points are within this minimal distance, the one with smaller absolute change coefficient will be dropped. Note: maximal unit is ‘D’, i.e., you may use units no more than ‘D’ such as ‘10D’, ‘5H’, ‘100T’, ‘200S’. The reason is that ‘W’, ‘M’ or higher has either cycles or indefinite number of days, thus is not parsable by pandas as timedelta.
potential_changepoint_distance (DateOffset, Timedelta, str or None, default None) – The distance between potential change points. If provided, will override the parameter
potential_changepoint_n
. Note: maximal unit is ‘D’, i.e., you may only use units no more than ‘D’ such as ‘10D’, ‘5H’, ‘100T’, ‘200S’. The reason is that ‘W’, ‘M’ or higher has either cycles or indefinite number of days, thus is not parsable by pandas as timedelta.potential_changepoint_n (int, default 100) – Number of change points to be evenly distributed, recommended 1-2 per month, based on the training data length.
no_changepoint_distance_from_begin (DateOffset, Timedelta, str or None, default None) – The length of time from the beginning of training data, within which no change point will be placed. If provided, will override the parameter
no_changepoint_proportion_from_begin
. Note: maximal unit is ‘D’, i.e., you may only use units no more than ‘D’ such as ‘10D’, ‘5H’, ‘100T’, ‘200S’. The reason is that ‘W’, ‘M’ or higher has either cycles or indefinite number of days, thus is not parsable by pandas as timedelta.no_changepoint_proportion_from_begin (float in [0, 1], default 0.0.) –
potential_changepoint_n
change points will be placed evenly over the whole training period, however, change points that are located within the firstno_changepoint_proportion_from_begin
proportion of training period will not be used for change point detection.no_changepoint_distance_from_end (DateOffset, Timedelta, str or None, default None) – The length of time from the end of training data, within which no change point will be placed. If provided, will override the parameter
no_changepoint_proportion_from_end
. Note: maximal unit is ‘D’, i.e., you may only use units no more than ‘D’ such as ‘10D’, ‘5H’, ‘100T’, ‘200S’. The reason is that ‘W’, ‘M’ or higher has either cycles or indefinite number of days, thus is not parsable by pandas as timedelta.no_changepoint_proportion_from_end (float in [0, 1], default 0.0.) –
potential_changepoint_n
change points will be placed evenly over the whole training period, however, change points that are located within the lastno_changepoint_proportion_from_end
proportion of training period will not be used for change point detection.
- Returns
result – result dictionary with keys:
"trend_feature_df"
pandas.DataFrame
The augmented df for change detection, in other words, the design matrix for the regression model. Columns:
’changepoint0’: regressor for change point 0, equals the continuous time of the observation minus the continuous time for time of origin.
…
’changepoint{potential_changepoint_n}’: regressor for change point {potential_changepoint_n}, equals the continuous time of the observation minus the continuous time of the {potential_changepoint_n}th change point.
’cos1_conti_year_yearly’: cosine yearly seasonality regressor of first order.
’sin1_conti_year_yearly’: sine yearly seasonality regressor of first order.
…
’cos{yearly_seasonality_order}_conti_year_yearly’ : cosine yearly seasonality regressor of {yearly_seasonality_order}th order.
’sin{yearly_seasonality_order}_conti_year_yearly’ : sine yearly seasonality regressor of {yearly_seasonality_order}th order.
"trend_changepoints"
listThe list of detected change points.
"changepoints_dict"
dictThe change point dictionary that is compatible as an input with
forecast
"trend_estimation"
pandas.Series
The estimated trend with detected trend change points.
- Return type
dict
-
find_seasonality_changepoints
(df, time_col, value_col, seasonality_components_df= name period order seas_names 0 tod 24.0 3 daily 1 tow 7.0 3 weekly 2 conti_year 1.0 5 yearly, resample_freq='H', regularization_strength=0.6, actual_changepoint_min_distance='30D', potential_changepoint_distance=None, potential_changepoint_n=50, no_changepoint_distance_from_end=None, no_changepoint_proportion_from_end=0.0, trend_changepoints=None)[source]¶ Finds the seasonality change points (defined as the time points where seasonality magnitude changes, i.e., the time series becomes “fatter” or “thinner”.)
Subtracts the estimated trend from the original time series first, then uses regression-based regularization methods to select important seasonality change points. Regressors are built from truncated Fourier series.
If you have run
find_trend_changepoints
before runningfind_seasonality_changepoints
with the same df, the estimated trend will be automatically used for removing trend infind_seasonality_changepoints
. Otherwise,find_trend_changepoints
will be run automatically with the same parameters as you passed tofind_seasonality_changepoints
. If you do not want to use the same parameters, runfind_trend_changepoints
with your desired parameter before callingfind_seasonality_changepoints
.The algorithm does an aggregation with a user-defined frequency, default hourly.
The regression features consists of
potential_changepoint_n
+ 1 blocks of predictors. The first block consists of Fourier series according toseasonality_components_df
, and other blocks are a copy of the first block truncated at the corresponding potential change point.If
potential_changepoint_distance
is not given,potential_changepoint_n
potential change points are evenly distributed over the time period, elsepotential_changepoint_n
is overridden by:total_time_length / ``potential_changepoint_distance``
Users can specify either
no_changepoint_proportion_from_end
to specify what proportion from the end of data they do not want changepoints, orno_changepoint_distance_from_end
(overridesno_changepoint_proportion_from_end
) to specify how long from the end they do not want change points.Then all potential change points will be selected by adaptive lasso, with the initial estimator specified by
adaptive_lasso_initial_estimator
. The regularization strength is specified byregularization_strength
, which lies between 0 and 1.A rule-based guard function is applied at the end to ensure change points are not too close, as specified by
actual_changepoint_min_distance
.- Parameters
df (
pandas.DataFrame
) – The data dftime_col (str) – Time column name in
df
value_col (str) – Value column name in
df
seasonality_components_df (
pandas.DataFrame
) – The df to generate seasonality design matrix, which is compatible withseasonality_components_df
infind_seasonality_changepoints
resample_freq (DateOffset, Timedelta or str, default “H”.) – The frequency to aggregate data. Coarser aggregation leads to fitting longer term trends.
regularization_strength (float in [0, 1] or None, default 0.6.) – The regularization for change points. Greater value implies fewer change points. 0 indicates all change points, and 1 indicates no change point. If None, the turning parameter will be selected by cross-validation. If a value is given, it will be used as the tuning parameter. Here “None” is not recommended, because seasonality change has different levels, and automatic selection by cross-validation may produce more change points than desired. Practically, 0.6 is a good choice for most cases. Tuning around 0.6 is recommended.
actual_changepoint_min_distance (DateOffset, Timedelta or str, default “30D”) – The minimal distance allowed between detected change points. If consecutive change points are within this minimal distance, the one with smaller absolute change coefficient will be dropped. Note: maximal unit is ‘D’, i.e., you may use units no more than ‘D’ such as ‘10D’, ‘5H’, ‘100T’, ‘200S’. The reason is that ‘W’, ‘M’ or higher has either cycles or indefinite number of days, thus is not parsable by pandas as timedelta.
potential_changepoint_distance (DateOffset, Timedelta, str or None, default None) – The distance between potential change points. If provided, will override the parameter
potential_changepoint_n
. Note: maximal unit is ‘D’, i.e., you may only use units no more than ‘D’ such as ‘10D’, ‘5H’, ‘100T’, ‘200S’. The reason is that ‘W’, ‘M’ or higher has either cycles or indefinite number of days, thus is not parsable by pandas as timedelta.potential_changepoint_n (int, default 50) – Number of change points to be evenly distributed, recommended 1 per month, based on the training data length.
no_changepoint_distance_from_end (DateOffset, Timedelta, str or None, default None) – The length of time from the end of training data, within which no change point will be placed. If provided, will override the parameter
no_changepoint_proportion_from_end
. Note: maximal unit is ‘D’, i.e., you may only use units no more than ‘D’ such as ‘10D’, ‘5H’, ‘100T’, ‘200S’. The reason is that ‘W’, ‘M’ or higher has either cycles or indefinite number of days, thus is not parsable by pandas as timedelta.no_changepoint_proportion_from_end (float in [0, 1], default 0.0.) –
potential_changepoint_n
change points will be placed evenly over the whole training period, however, only change points that are not located within the lastno_changepoint_proportion_from_end
proportion of training period will be used for change point detection.trend_changepoints (list or None) – A list of user specified trend change points, used to estimated the trend to be removed from the time series before detecting seasonality change points. If provided, the algorithm will not check existence of detected trend change points or run
find_trend_changepoints
, but will use these change points directly for trend estimation.
- Returns
result – result dictionary with keys:
"seasonality_feature_df"
pandas.DataFrame
The augmented df for seasonality changepoint detection, in other words, the design matrix for the regression model. Columns:
”cos1_tod_daily”: cosine daily seasonality regressor of first order at change point 0.
”sin1_tod_daily”: sine daily seasonality regressor of first order at change point 0.
…
”cos1_conti_year_yearly”: cosine yearly seasonality regressor of first order at change point 0.
”sin1_conti_year_yearly”: sine yearly seasonality regressor of first order at change point 0.
…
”cos{daily_seasonality_order}_tod_daily_cp{potential_changepoint_n}” : cosine daily seasonality regressor of {yearly_seasonality_order}th order at change point {potential_changepoint_n}.
”sin{daily_seasonality_order}_tod_daily_cp{potential_changepoint_n}” : sine daily seasonality regressor of {yearly_seasonality_order}th order at change point {potential_changepoint_n}.
…
”cos{yearly_seasonality_order}_conti_year_yearly_cp{potential_changepoint_n}” : cosine yearly seasonality regressor of {yearly_seasonality_order}th order at change point {potential_changepoint_n}.
”sin{yearly_seasonality_order}_conti_year_yearly_cp{potential_changepoint_n}” : sine yearly seasonality regressor of {yearly_seasonality_order}th order at change point {potential_changepoint_n}.
"seasonality_changepoints"
dict`[`list`[`datetime]]The dictionary of detected seasonality change points for each component. Keys are component names, and values are list of change points.
"seasonality_estimation"
pandas.Series
- The estimated seasonality with detected seasonality change points.
The series has the same length as
original_df
. Index is timestamp, and values are the estimated seasonality at each timestamp. The seasonality estimation is the estimated of seasonality effect with trend estimated byestimate_trend_with_detected_changepoints
removed.
"seasonality_components_df
pandas.DataFrame
The processed
seasonality_components_df
. Daily component row is removed if inferred frequency or aggregation frequency is at least one day.
- Return type
dict
-
plot
(observation=True, observation_original=True, trend_estimate=True, trend_change=True, yearly_seasonality_estimate=False, adaptive_lasso_estimate=False, seasonality_change=False, seasonality_change_by_component=True, seasonality_estimate=False, plot=True)[source]¶ Makes a plot to show the observations/estimations/change points.
In this function, component parameters specify if each component in the plot is included or not. These are bool variables. For those components that are set to True, their values will be replaced by the corresponding data. Other components values will be set to None. Then these variables will be fed into
plot_change
- Parameters
observation (bool) – Whether to include observation
observation_original (bool) – Set True to plot original observations, and False to plot aggregated observations. No effect is
observation
is Falsetrend_estimate (bool) – Set True to add trend estimation.
trend_change (bool) – Set True to add change points.
yearly_seasonality_estimate (bool) – Set True to add estimated yearly seasonality.
adaptive_lasso_estimate (bool) – Set True to add adaptive lasso estimated trend.
seasonality_change (bool) – Set True to add seasonality change points.
seasonality_change_by_component (bool) – If true, seasonality changes will be plotted separately for different components, else all will be in the same symbol. No effect if
seasonality_change
is Falseseasonality_estimate (bool) – Set True to add estimated seasonality. The seasonality if plotted around trend, so the actual seasonality shown is trend estimation + seasonality estimation.
plot (bool, default True) – Set to True to display the plot, and set to False to return the plotly figure object.
- Returns
None (if
plot
== True) – The function shows a plot.fig (
plotly.graph_objects.Figure
) – The plot object.
-
Benchmarking¶
-
class
greykite.framework.benchmark.benchmark_class.
BenchmarkForecastConfig
(df: pandas.core.frame.DataFrame, configs: Dict[str, greykite.framework.templates.autogen.forecast_config.ForecastConfig], tscv: greykite.sklearn.cross_validation.RollingTimeSeriesSplit, forecaster: greykite.framework.templates.forecaster.Forecaster = <greykite.framework.templates.forecaster.Forecaster object>)[source]¶ Class for benchmarking multiple ForecastConfig on a rolling window basis.
-
df
¶ Timeseries data to forecast. Contains columns [time_col, value_col], and optional regressor columns. Regressor columns should include future values for prediction.
- Type
-
configs
¶ Dictionary of model configurations. A model configuration is a
ForecastConfig
. SeeForecastConfig
for details on validForecastConfig
. Validity of theconfigs
for benchmarking is checked via thevalidate
method.- Type
Dict
[str,ForecastConfig
]
-
tscv
¶ Cross-validation object that determines the rolling window evaluation. See
RollingTimeSeriesSplit
for details. Theforecast_horizon
andperiods_between_train_test
parameters ofconfigs
are matched against that oftscv
. A ValueError is raised if there is a mismatch.
-
forecaster
¶ Forecaster used to create the forecasts.
- Type
-
is_run
¶ Indicator of whether the
run
method is executed. After executingrun
, this indicator is set to True. Some class methods likeget_forecast
requiresis_run
to be True to be executed.- Type
bool, default False
-
result
¶ Stores the benchmarking results. Has the same keys as
configs
.- Type
dict
-
forecasts
¶ Merged DataFrame of forecasts, upper and lower confidence interval for all input
configs
. Also stores train end date and forecast step for each prediction.- Type
pandas.DataFrame
, default None
-
validate
()[source]¶ Validates the inputs to the class for the method
run
.Raises a ValueError if there is a mismatch between the following parameters of
configs
andtscv
:forecast_horizon
periods_between_train_test
Raises ValueError if all the
configs
do not have the samecoverage
parameter.
-
run
()[source]¶ Runs every config and stores the output of the
forecast_pipeline
. This function runs only if theconfigs
andtscv
are jointly valid.- Returns
self
- Return type
Returns self. Stores pipeline output of every config in
self.result
.
-
extract_forecasts
()[source]¶ Extracts forecasts, upper and lower confidence interval for each individual config. This is saved as a
pandas.DataFrame
with the namerolling_forecast_df
within the corresponding config ofself.result
. e.g. if config key is “silverkite”, then the forecasts are stored inself.result["silverkite"]["rolling_forecast_df"]
.This method also constructs a merged DataFrame of forecasts, upper and lower confidence interval for all input
configs
.
-
plot_forecasts_by_step
(forecast_step: int, config_names: List = None, xlabel: str = 'ts', ylabel: str = 'y', title: str = None, showlegend: bool = True)[source]¶ Returns a
forecast_step
ahead rolling forecast plot. The plot consists one line for each valid.config_names
. If available, the corresponding actual values are also plotted.For a more customizable plot, see
plot_multivariate
- Parameters
forecast_step (int) – Which forecast step to plot. A forecast step is an integer between 1 and the forecast horizon, inclusive, indicating the number of periods from train end date to the prediction date (# steps ahead).
config_names (list [str], default None) – Which config results to plot. A list of config names. If None, uses all the available config keys.
xlabel (str or None, default TIME_COL) – x-axis label.
ylabel (str or None, default VALUE_COL) – y-axis label.
title (str or None, default None) – Plot title. If None, default is based on
forecast_step
.showlegend (bool, default True) – Whether to show the legend.
- Returns
fig – Interactive plotly graph. Plots multiple column(s) in
self.forecasts
againstTIME_COL
.See
plot_forecast_vs_actual
return value for how to plot the figure and add customization.- Return type
-