Auto Configuration Tools

The Silverkite model has many hyperparameters to tune. Besides domain knowledge, we also have tools that can help find good choices for certain hyperparameters. In this tutorial, we will present

  • seasonality inferrer

  • holiday inferrer

  • holiday grouper

Note

If you use the model templates, you can specify the “auto” option for certain model components (growth, seasonality and holiday), and the auto configuration tool will be activated automatically. See auto seasonality, auto growth and auto holidays for the way to activate them. This doc explains how the “auto” options work behind the code. You can replay the “auto” options with the Seasonality Inferrer and Holiday Inferrer below. Please remember that if you are doing train-test split, running the inferrers on training data only is closer to the reality.

Seasonality Inferrer

The Silverkite model uses Fourier series to model seasonalities. It’s sometimes difficult to decide what orders we should use for each Fourier series. Larger orders tend to fit more closely to the curves, while having the risk of overfitting. Small orders tend to underfit the curve and may not learn the exact seasonality patterns.

SeasonalityInferrer is a tool that can help you decide what order to use for a seasonality’s Fourier series. Note that there are many ways to decide the orders, and you don’t have to strictly stick to the results from Seasonality Inferrer.

How it works

The seasonality inferrer utilizes criteria including AIC and BIC to find the most appropriate Fourier series orders. For a specific seasonality, e.g. yearly seasonality, the steps are as follows:

  • Trend removal: seasonality inferrer provides 4 options for trend removal. They are listed in TrendAdjustMethodEnum. Specifically:

    • "seasonal_average": given an indicator of seasonal period, the method subtracts the average within each seasonal period from the original time series. For example, given the column year, the average is calculated on each different year.

    • "overall_average": subtracts the overall average from the original time series.

    • "spline_fit": fits a polynomial up to a given degree and subtract from the original time series.

    • "none": does not adjust the trend.

    Typically “seasonal_average” is a good choice with appropriate columns. For example, we can use year_quarter for quarterly seasonality, year_month for monthly seasonality, year_woy_iso for weekly seasonality and year_woy_dow_iso for daily seasonality.

  • Optional aggregation: sometimes we want to get rid of shorter fluctuations before fitting a longer seasonality period. We can do an optional aggregation beforehand. For example, when we model yearly seasonality, we can do a "7D" aggregation to eliminate weekly effects to make the result more stable.

  • With a pre-specified maximum order n, we fit the de-trended (and aggregated) time series with Fourier series from 1 to n, and calculate the AIC/BIC for those fits. The most appropriate order is then decided by choosing the order with best AIC or BIC. The method also allows to slightly sacrifice the criterion and reduce the order for less risk of overfitting using the tolerance parameter.

  • Finally, an optional offset can be applied to any inferred orders to allow manual adjustments. For example, if one would like to use less yearly seasonality order, they may specify offset for yearly seasonality to be -2, and the final order will subtract 2 from the inferred result. This is useful when users tend to use more or less orders to model seasonality, and want a knob on top of the inferring results.

Example

Now we look at an example with the Peyton-Manning Wiki page view data.

83 import pandas as pd
84 import plotly
85 from greykite.common.data_loader import DataLoader
86 from greykite.algo.common.seasonality_inferrer import SeasonalityInferConfig
87 from greykite.algo.common.seasonality_inferrer import SeasonalityInferrer
88 from greykite.algo.common.seasonality_inferrer import TrendAdjustMethodEnum
89 from greykite.common import constants as cst

The SeasonalityInferrer class uses SeasonalityInferConfig to specify configuration for a single seasonality component, and it takes a list of such configurations to infer multiple seasonality components together. Now we specify seasonality inferring configs for yearly to weekly seasonalities. In each of these configs, specify the parameters that are distinct for each component. If there are parameters that are the same across all configs, you can specify them in the function directly.

102 yearly_config = SeasonalityInferConfig(
103     seas_name="yearly",                     # name for seasonality
104     col_name="toy",                         # column to generate Fourier series, fixed for yearly
105     period=1.0,                             # seasonal period, fixed for yearly
106     max_order=30,                           # max number of orders to model
107     adjust_trend_param=dict(
108         trend_average_col="year"
109     ),                                      # column to adjust trend for method "seasonal_average"
110     aggregation_period="W",                 # aggregation period,
111     offset=0                                # add this to the inferred result, default 0
112 )
113 quarterly_config = SeasonalityInferConfig(
114     seas_name="quarterly",                  # name for seasonality
115     col_name="toq",                         # column to generate Fourier series, fixed for quarterly
116     period=1.0,                             # seasonal period, fixed for quarterly
117     max_order=20,                           # max number of orders to model
118     adjust_trend_param=dict(
119         trend_average_col="year_quarter"
120     ),                                      # column to adjust trend for method "seasonal_average"
121     aggregation_period="2D",                # aggregation period
122 )
123 monthly_config = SeasonalityInferConfig(
124     seas_name="monthly",                    # name for seasonality
125     col_name="tom",                         # column to generate Fourier series, fixed for monthly
126     period=1.0,                             # seasonal period, fixed for monthly
127     max_order=20,                           # max number of orders to model
128     adjust_trend_param=dict(
129         trend_average_col="year_month"
130     ),                                      # column to adjust trend for method "seasonal_average"
131     aggregation_period="D"                  # aggregation period
132 )
133 weekly_config = SeasonalityInferConfig(
134     seas_name="weekly",                     # name for seasonality
135     col_name="tow",                         # column to generate Fourier series, fixed for weekly
136     period=7.0,                             # seasonal period, fixed for weekly
137     max_order=10,                           # max number of orders to model
138     adjust_trend_param=dict(
139         trend_average_col="year_woy_iso"
140     ),                                      # column to adjust trend for method "seasonal_average"
141     aggregation_period="D",
142     tolerance=0.005,                        # allows 0.5% higher criterion for lower orders
143 )

Next, we put everything together to infer seasonality effects.

148 df = DataLoader().load_peyton_manning()
149 df[cst.TIME_COL] = pd.to_datetime((df[cst.TIME_COL]))
150
151 model = SeasonalityInferrer()
152 result = model.infer_fourier_series_order(
153     df=df,
154     time_col=cst.TIME_COL,
155     value_col=cst.VALUE_COL,
156     configs=[
157         yearly_config,
158         quarterly_config,
159         monthly_config,
160         weekly_config
161     ],
162     adjust_trend_method=TrendAdjustMethodEnum.seasonal_average.name,
163     fit_algorithm="linear",
164     plotting=True,
165     criterion="bic",
166 )

The method runs quickly and we can simply extract the inferred results from the output.

172 result["best_orders"]

Out:

{'yearly': 6, 'quarterly': 2, 'monthly': 1, 'weekly': 2}

We can also plot the results to see how different orders vary the criterion. Similar to other trade-off plots, the plot first goes down and then goes up, reaching the best at some appropriate value in the middle.

179 # The [0] extracts the first seasonality component from the results.
180 plotly.io.show(result["result"][0]["fig"])

Holiday Inferrer

The Silverkite model supports modeling holidays and their neighboring days as indicators. Significant days are modeled separately, while similar days can be grouped together as one indicator, assuming their effects are the same.

It’s sometimes difficult to decide which holidays to include, to model separately or to model together. HolidayInferrer is a tool that can help you decide which holidays to model and how to model them. It can also automatically generate the holiday configuration parameters. Note that there are many ways to decide the holiday configurations, and you don’t have to strictly stick to the results from Holiday Inferrer.

How it works

The holiday inferrer estimates individual holiday or their neighboring days’ effects by comparing the observations on these days with some baseline prior to or after the holiday period. Then it ranks the effects by their magnitude. Depending on some thresholds, it decides whether to model a day independently, together with others or do not model it.

In detail, the first step is to unify the data frequency. For data whose frequency is greater than daily, holiday effect is automatically turned off. For data whose frequency is less than daily, it is aggregated into daily data, since holidays are daily events. From now on, we have daily data for the next step.

Given a list of countries, the tool automatically pulls candidate holidays from the database. With a pre_search_days and a post_search_days parameters, those holidays’ neighboring days are included in the candidate pool as well.

For every candidate holiday or neighboring day, the baseline is the average of a configurable offsets. For example, for data that exhibits strong weekly seasonality, the offsets can be (-7, 7), where the baseline will be the average of the last same day of week’s observation and the next same day of week’s observation. For example, if the holiday is New Year on 1/1 while 12/25 (7 days ago) is Christmas, it will look at the value on 12/18 instead of 12/25 as baseline.

The day’s effect is the average of the signed difference between the true observation and the baseline across all occurrences in the time series. The effects are ranked from the highest to the lowest by their absolute effects.

To decide how each holiday is modeled, we rely on two parameters: independent_holiday_thres and together_holiday_thres. These parameters are between 0 and 1. Starting from the largest effect, we calculate the cumulative sum of effect of all candidates. Once the cumulative effect reaches independend_holiday_thres of the total effects, these days will be modeled independently (i.e, each day has an individual coefficient). We keep accumulating effects until the sum reaches together_holiday_thres, the days in the between are grouped into “positive_group” and “negative_group”, with each group modeled together.

Example

Now we look at an example with the Peyton-Manning Wiki page view data.

252 import pandas as pd
253 import plotly
254 from greykite.algo.common.holiday_inferrer import HolidayInferrer
255 from greykite.common.data_loader import DataLoader
256 from greykite.common import constants as cst
257
258 df = DataLoader().load_peyton_manning()
259 df[cst.TIME_COL] = pd.to_datetime(df[cst.TIME_COL])

Let’s say we want to infer the holidays in the United States, with consideration on +/- 2 days of each holiday as potential candidates too.

265 hi = HolidayInferrer()
266 result = hi.infer_holidays(
267     df=df,
268     countries=["US"],                   # Search holidays in United States
269     plot=True,                          # Output a plot
270     pre_search_days=2,                  # Considers 2 days before each holiday
271     post_search_days=2,                 # Considers 2 days after each holiday
272     independent_holiday_thres=0.9,      # The first 90% of effects are modeled separately
273     together_holiday_thres=0.99,        # The 90% to 99% of effects are modeled together
274     baseline_offsets=[-7, 7]            # The baseline is the average of -7/+7 observations
275 )

We can plot the inferred holiday results.

280 plotly.io.show(result["fig"])

The class also has a method to generate the holiday configuration based on the inferred results, that is consumable directly by the Silverkite model.

286 hi.generate_daily_event_dict()

Out:

{'US_Labor Day':          date    event_name
0  2016-09-05  US_Labor Day
1  2017-09-04  US_Labor Day
2  2007-09-03  US_Labor Day
3  2008-09-01  US_Labor Day
4  2009-09-07  US_Labor Day
5  2010-09-06  US_Labor Day
6  2011-09-05  US_Labor Day
7  2012-09-03  US_Labor Day
8  2013-09-02  US_Labor Day
9  2014-09-01  US_Labor Day
10 2015-09-07  US_Labor Day, 'US_Christmas Day':          date        event_name
0  2016-12-26  US_Christmas Day
1  2017-12-25  US_Christmas Day
2  2007-12-25  US_Christmas Day
3  2008-12-25  US_Christmas Day
4  2009-12-25  US_Christmas Day
5  2010-12-24  US_Christmas Day
6  2011-12-26  US_Christmas Day
7  2012-12-25  US_Christmas Day
8  2013-12-25  US_Christmas Day
9  2014-12-25  US_Christmas Day
10 2015-12-25  US_Christmas Day, 'US_Labor Day_minus_1':          date            event_name
0  2016-09-04  US_Labor Day_minus_1
1  2017-09-03  US_Labor Day_minus_1
2  2007-09-02  US_Labor Day_minus_1
3  2008-08-31  US_Labor Day_minus_1
4  2009-09-06  US_Labor Day_minus_1
5  2010-09-05  US_Labor Day_minus_1
6  2011-09-04  US_Labor Day_minus_1
7  2012-09-02  US_Labor Day_minus_1
8  2013-09-01  US_Labor Day_minus_1
9  2014-08-31  US_Labor Day_minus_1
10 2015-09-06  US_Labor Day_minus_1, 'US_Martin Luther King Jr. Day':          date                     event_name
0  2016-01-18  US_Martin Luther King Jr. Day
1  2017-01-16  US_Martin Luther King Jr. Day
2  2007-01-15  US_Martin Luther King Jr. Day
3  2008-01-21  US_Martin Luther King Jr. Day
4  2009-01-19  US_Martin Luther King Jr. Day
5  2010-01-18  US_Martin Luther King Jr. Day
6  2011-01-17  US_Martin Luther King Jr. Day
7  2012-01-16  US_Martin Luther King Jr. Day
8  2013-01-21  US_Martin Luther King Jr. Day
9  2014-01-20  US_Martin Luther King Jr. Day
10 2015-01-19  US_Martin Luther King Jr. Day, 'US_Washingtons Birthday_minus_1':          date                       event_name
0  2016-02-14  US_Washingtons Birthday_minus_1
1  2017-02-19  US_Washingtons Birthday_minus_1
2  2007-02-18  US_Washingtons Birthday_minus_1
3  2008-02-17  US_Washingtons Birthday_minus_1
4  2009-02-15  US_Washingtons Birthday_minus_1
5  2010-02-14  US_Washingtons Birthday_minus_1
6  2011-02-20  US_Washingtons Birthday_minus_1
7  2012-02-19  US_Washingtons Birthday_minus_1
8  2013-02-17  US_Washingtons Birthday_minus_1
9  2014-02-16  US_Washingtons Birthday_minus_1
10 2015-02-15  US_Washingtons Birthday_minus_1, 'US_Thanksgiving_minus_2':          date               event_name
0  2016-11-22  US_Thanksgiving_minus_2
1  2017-11-21  US_Thanksgiving_minus_2
2  2007-11-20  US_Thanksgiving_minus_2
3  2008-11-25  US_Thanksgiving_minus_2
4  2009-11-24  US_Thanksgiving_minus_2
5  2010-11-23  US_Thanksgiving_minus_2
6  2011-11-22  US_Thanksgiving_minus_2
7  2012-11-20  US_Thanksgiving_minus_2
8  2013-11-26  US_Thanksgiving_minus_2
9  2014-11-25  US_Thanksgiving_minus_2
10 2015-11-24  US_Thanksgiving_minus_2, 'US_Washingtons Birthday_plus_1':          date                      event_name
0  2016-02-16  US_Washingtons Birthday_plus_1
1  2017-02-21  US_Washingtons Birthday_plus_1
2  2007-02-20  US_Washingtons Birthday_plus_1
3  2008-02-19  US_Washingtons Birthday_plus_1
4  2009-02-17  US_Washingtons Birthday_plus_1
5  2010-02-16  US_Washingtons Birthday_plus_1
6  2011-02-22  US_Washingtons Birthday_plus_1
7  2012-02-21  US_Washingtons Birthday_plus_1
8  2013-02-19  US_Washingtons Birthday_plus_1
9  2014-02-18  US_Washingtons Birthday_plus_1
10 2015-02-17  US_Washingtons Birthday_plus_1, 'US_New Years Day_plus_1':          date               event_name
0  2016-01-02  US_New Years Day_plus_1
1  2017-01-03  US_New Years Day_plus_1
2  2007-01-02  US_New Years Day_plus_1
3  2008-01-02  US_New Years Day_plus_1
4  2009-01-02  US_New Years Day_plus_1
5  2010-01-02  US_New Years Day_plus_1
6  2011-01-01  US_New Years Day_plus_1
7  2012-01-03  US_New Years Day_plus_1
8  2013-01-02  US_New Years Day_plus_1
9  2014-01-02  US_New Years Day_plus_1
10 2015-01-02  US_New Years Day_plus_1, 'US_Veterans Day_minus_2':          date               event_name
0  2016-11-09  US_Veterans Day_minus_2
1  2017-11-08  US_Veterans Day_minus_2
2  2007-11-10  US_Veterans Day_minus_2
3  2008-11-09  US_Veterans Day_minus_2
4  2009-11-09  US_Veterans Day_minus_2
5  2010-11-09  US_Veterans Day_minus_2
6  2011-11-09  US_Veterans Day_minus_2
7  2012-11-10  US_Veterans Day_minus_2
8  2013-11-09  US_Veterans Day_minus_2
9  2014-11-09  US_Veterans Day_minus_2
10 2015-11-09  US_Veterans Day_minus_2, 'US_Washingtons Birthday_plus_2':          date                      event_name
0  2016-02-17  US_Washingtons Birthday_plus_2
1  2017-02-22  US_Washingtons Birthday_plus_2
2  2007-02-21  US_Washingtons Birthday_plus_2
3  2008-02-20  US_Washingtons Birthday_plus_2
4  2009-02-18  US_Washingtons Birthday_plus_2
5  2010-02-17  US_Washingtons Birthday_plus_2
6  2011-02-23  US_Washingtons Birthday_plus_2
7  2012-02-22  US_Washingtons Birthday_plus_2
8  2013-02-20  US_Washingtons Birthday_plus_2
9  2014-02-19  US_Washingtons Birthday_plus_2
10 2015-02-18  US_Washingtons Birthday_plus_2, 'US_Christmas Day_plus_1':          date               event_name
0  2016-12-27  US_Christmas Day_plus_1
1  2017-12-26  US_Christmas Day_plus_1
2  2007-12-26  US_Christmas Day_plus_1
3  2008-12-26  US_Christmas Day_plus_1
4  2009-12-26  US_Christmas Day_plus_1
5  2010-12-25  US_Christmas Day_plus_1
6  2011-12-27  US_Christmas Day_plus_1
7  2012-12-26  US_Christmas Day_plus_1
8  2013-12-26  US_Christmas Day_plus_1
9  2014-12-26  US_Christmas Day_plus_1
10 2015-12-26  US_Christmas Day_plus_1, 'US_Memorial Day':          date       event_name
0  2016-05-30  US_Memorial Day
1  2017-05-29  US_Memorial Day
2  2007-05-28  US_Memorial Day
3  2008-05-26  US_Memorial Day
4  2009-05-25  US_Memorial Day
5  2010-05-31  US_Memorial Day
6  2011-05-30  US_Memorial Day
7  2012-05-28  US_Memorial Day
8  2013-05-27  US_Memorial Day
9  2014-05-26  US_Memorial Day
10 2015-05-25  US_Memorial Day, 'US_Veterans Day':          date       event_name
0  2016-11-11  US_Veterans Day
1  2017-11-10  US_Veterans Day
2  2007-11-12  US_Veterans Day
3  2008-11-11  US_Veterans Day
4  2009-11-11  US_Veterans Day
5  2010-11-11  US_Veterans Day
6  2011-11-11  US_Veterans Day
7  2012-11-12  US_Veterans Day
8  2013-11-11  US_Veterans Day
9  2014-11-11  US_Veterans Day
10 2015-11-11  US_Veterans Day, 'US_Washingtons Birthday_minus_2':          date                       event_name
0  2016-02-13  US_Washingtons Birthday_minus_2
1  2017-02-18  US_Washingtons Birthday_minus_2
2  2007-02-17  US_Washingtons Birthday_minus_2
3  2008-02-16  US_Washingtons Birthday_minus_2
4  2009-02-14  US_Washingtons Birthday_minus_2
5  2010-02-13  US_Washingtons Birthday_minus_2
6  2011-02-19  US_Washingtons Birthday_minus_2
7  2012-02-18  US_Washingtons Birthday_minus_2
8  2013-02-16  US_Washingtons Birthday_minus_2
9  2014-02-15  US_Washingtons Birthday_minus_2
10 2015-02-14  US_Washingtons Birthday_minus_2, 'US_Thanksgiving_minus_1':          date               event_name
0  2016-11-23  US_Thanksgiving_minus_1
1  2017-11-22  US_Thanksgiving_minus_1
2  2007-11-21  US_Thanksgiving_minus_1
3  2008-11-26  US_Thanksgiving_minus_1
4  2009-11-25  US_Thanksgiving_minus_1
5  2010-11-24  US_Thanksgiving_minus_1
6  2011-11-23  US_Thanksgiving_minus_1
7  2012-11-21  US_Thanksgiving_minus_1
8  2013-11-27  US_Thanksgiving_minus_1
9  2014-11-26  US_Thanksgiving_minus_1
10 2015-11-25  US_Thanksgiving_minus_1, 'US_Labor Day_minus_2':          date            event_name
0  2016-09-03  US_Labor Day_minus_2
1  2017-09-02  US_Labor Day_minus_2
2  2007-09-01  US_Labor Day_minus_2
3  2008-08-30  US_Labor Day_minus_2
4  2009-09-05  US_Labor Day_minus_2
5  2010-09-04  US_Labor Day_minus_2
6  2011-09-03  US_Labor Day_minus_2
7  2012-09-01  US_Labor Day_minus_2
8  2013-08-31  US_Labor Day_minus_2
9  2014-08-30  US_Labor Day_minus_2
10 2015-09-05  US_Labor Day_minus_2, 'US_Columbus Day':          date       event_name
0  2016-10-10  US_Columbus Day
1  2017-10-09  US_Columbus Day
2  2007-10-08  US_Columbus Day
3  2008-10-13  US_Columbus Day
4  2009-10-12  US_Columbus Day
5  2010-10-11  US_Columbus Day
6  2011-10-10  US_Columbus Day
7  2012-10-08  US_Columbus Day
8  2013-10-14  US_Columbus Day
9  2014-10-13  US_Columbus Day
10 2015-10-12  US_Columbus Day, 'US_Memorial Day_plus_1':          date              event_name
0  2016-05-31  US_Memorial Day_plus_1
1  2017-05-30  US_Memorial Day_plus_1
2  2007-05-29  US_Memorial Day_plus_1
3  2008-05-27  US_Memorial Day_plus_1
4  2009-05-26  US_Memorial Day_plus_1
5  2010-06-01  US_Memorial Day_plus_1
6  2011-05-31  US_Memorial Day_plus_1
7  2012-05-29  US_Memorial Day_plus_1
8  2013-05-28  US_Memorial Day_plus_1
9  2014-05-27  US_Memorial Day_plus_1
10 2015-05-26  US_Memorial Day_plus_1, 'US_Halloween':          date    event_name
0  2016-10-31  US_Halloween
1  2017-10-31  US_Halloween
2  2007-10-31  US_Halloween
3  2008-10-31  US_Halloween
4  2009-10-31  US_Halloween
5  2010-10-31  US_Halloween
6  2011-10-31  US_Halloween
7  2012-10-31  US_Halloween
8  2013-10-31  US_Halloween
9  2014-10-31  US_Halloween
10 2015-10-31  US_Halloween, 'US_Labor Day_plus_1':          date           event_name
0  2016-09-06  US_Labor Day_plus_1
1  2017-09-05  US_Labor Day_plus_1
2  2007-09-04  US_Labor Day_plus_1
3  2008-09-02  US_Labor Day_plus_1
4  2009-09-08  US_Labor Day_plus_1
5  2010-09-07  US_Labor Day_plus_1
6  2011-09-06  US_Labor Day_plus_1
7  2012-09-04  US_Labor Day_plus_1
8  2013-09-03  US_Labor Day_plus_1
9  2014-09-02  US_Labor Day_plus_1
10 2015-09-08  US_Labor Day_plus_1, 'US_Martin Luther King Jr. Day_minus_1':          date                             event_name
0  2016-01-17  US_Martin Luther King Jr. Day_minus_1
1  2017-01-15  US_Martin Luther King Jr. Day_minus_1
2  2007-01-14  US_Martin Luther King Jr. Day_minus_1
3  2008-01-20  US_Martin Luther King Jr. Day_minus_1
4  2009-01-18  US_Martin Luther King Jr. Day_minus_1
5  2010-01-17  US_Martin Luther King Jr. Day_minus_1
6  2011-01-16  US_Martin Luther King Jr. Day_minus_1
7  2012-01-15  US_Martin Luther King Jr. Day_minus_1
8  2013-01-20  US_Martin Luther King Jr. Day_minus_1
9  2014-01-19  US_Martin Luther King Jr. Day_minus_1
10 2015-01-18  US_Martin Luther King Jr. Day_minus_1, 'US_Independence Day_minus_2':          date                   event_name
0  2016-07-02  US_Independence Day_minus_2
1  2017-07-02  US_Independence Day_minus_2
2  2007-07-02  US_Independence Day_minus_2
3  2008-07-02  US_Independence Day_minus_2
4  2009-07-01  US_Independence Day_minus_2
5  2010-07-03  US_Independence Day_minus_2
6  2011-07-02  US_Independence Day_minus_2
7  2012-07-02  US_Independence Day_minus_2
8  2013-07-02  US_Independence Day_minus_2
9  2014-07-02  US_Independence Day_minus_2
10 2015-07-01  US_Independence Day_minus_2, 'US_Christmas Day_minus_1':          date                event_name
0  2016-12-25  US_Christmas Day_minus_1
1  2017-12-24  US_Christmas Day_minus_1
2  2007-12-24  US_Christmas Day_minus_1
3  2008-12-24  US_Christmas Day_minus_1
4  2009-12-24  US_Christmas Day_minus_1
5  2010-12-23  US_Christmas Day_minus_1
6  2011-12-25  US_Christmas Day_minus_1
7  2012-12-24  US_Christmas Day_minus_1
8  2013-12-24  US_Christmas Day_minus_1
9  2014-12-24  US_Christmas Day_minus_1
10 2015-12-24  US_Christmas Day_minus_1, 'US_Halloween_plus_2':          date           event_name
0  2016-11-02  US_Halloween_plus_2
1  2017-11-02  US_Halloween_plus_2
2  2007-11-02  US_Halloween_plus_2
3  2008-11-02  US_Halloween_plus_2
4  2009-11-02  US_Halloween_plus_2
5  2010-11-02  US_Halloween_plus_2
6  2011-11-02  US_Halloween_plus_2
7  2012-11-02  US_Halloween_plus_2
8  2013-11-02  US_Halloween_plus_2
9  2014-11-02  US_Halloween_plus_2
10 2015-11-02  US_Halloween_plus_2, 'US_Independence Day_minus_1':          date                   event_name
0  2016-07-03  US_Independence Day_minus_1
1  2017-07-03  US_Independence Day_minus_1
2  2007-07-03  US_Independence Day_minus_1
3  2008-07-03  US_Independence Day_minus_1
4  2009-07-02  US_Independence Day_minus_1
5  2010-07-04  US_Independence Day_minus_1
6  2011-07-03  US_Independence Day_minus_1
7  2012-07-03  US_Independence Day_minus_1
8  2013-07-03  US_Independence Day_minus_1
9  2014-07-03  US_Independence Day_minus_1
10 2015-07-02  US_Independence Day_minus_1, 'US_Veterans Day_minus_1':          date               event_name
0  2016-11-10  US_Veterans Day_minus_1
1  2017-11-09  US_Veterans Day_minus_1
2  2007-11-11  US_Veterans Day_minus_1
3  2008-11-10  US_Veterans Day_minus_1
4  2009-11-10  US_Veterans Day_minus_1
5  2010-11-10  US_Veterans Day_minus_1
6  2011-11-10  US_Veterans Day_minus_1
7  2012-11-11  US_Veterans Day_minus_1
8  2013-11-10  US_Veterans Day_minus_1
9  2014-11-10  US_Veterans Day_minus_1
10 2015-11-10  US_Veterans Day_minus_1, 'US_Martin Luther King Jr. Day_plus_1':          date                            event_name
0  2016-01-19  US_Martin Luther King Jr. Day_plus_1
1  2017-01-17  US_Martin Luther King Jr. Day_plus_1
2  2007-01-16  US_Martin Luther King Jr. Day_plus_1
3  2008-01-22  US_Martin Luther King Jr. Day_plus_1
4  2009-01-20  US_Martin Luther King Jr. Day_plus_1
5  2010-01-19  US_Martin Luther King Jr. Day_plus_1
6  2011-01-18  US_Martin Luther King Jr. Day_plus_1
7  2012-01-17  US_Martin Luther King Jr. Day_plus_1
8  2013-01-22  US_Martin Luther King Jr. Day_plus_1
9  2014-01-21  US_Martin Luther King Jr. Day_plus_1
10 2015-01-20  US_Martin Luther King Jr. Day_plus_1, 'US_Halloween_minus_2':          date            event_name
0  2016-10-29  US_Halloween_minus_2
1  2017-10-29  US_Halloween_minus_2
2  2007-10-29  US_Halloween_minus_2
3  2008-10-29  US_Halloween_minus_2
4  2009-10-29  US_Halloween_minus_2
5  2010-10-29  US_Halloween_minus_2
6  2011-10-29  US_Halloween_minus_2
7  2012-10-29  US_Halloween_minus_2
8  2013-10-29  US_Halloween_minus_2
9  2014-10-29  US_Halloween_minus_2
10 2015-10-29  US_Halloween_minus_2, 'US_Independence Day_plus_1':          date                  event_name
0  2016-07-05  US_Independence Day_plus_1
1  2017-07-05  US_Independence Day_plus_1
2  2007-07-05  US_Independence Day_plus_1
3  2008-07-05  US_Independence Day_plus_1
4  2009-07-04  US_Independence Day_plus_1
5  2010-07-06  US_Independence Day_plus_1
6  2011-07-05  US_Independence Day_plus_1
7  2012-07-05  US_Independence Day_plus_1
8  2013-07-05  US_Independence Day_plus_1
9  2014-07-05  US_Independence Day_plus_1
10 2015-07-04  US_Independence Day_plus_1, 'US_Martin Luther King Jr. Day_plus_2':          date                            event_name
0  2016-01-20  US_Martin Luther King Jr. Day_plus_2
1  2017-01-18  US_Martin Luther King Jr. Day_plus_2
2  2007-01-17  US_Martin Luther King Jr. Day_plus_2
3  2008-01-23  US_Martin Luther King Jr. Day_plus_2
4  2009-01-21  US_Martin Luther King Jr. Day_plus_2
5  2010-01-20  US_Martin Luther King Jr. Day_plus_2
6  2011-01-19  US_Martin Luther King Jr. Day_plus_2
7  2012-01-18  US_Martin Luther King Jr. Day_plus_2
8  2013-01-23  US_Martin Luther King Jr. Day_plus_2
9  2014-01-22  US_Martin Luther King Jr. Day_plus_2
10 2015-01-21  US_Martin Luther King Jr. Day_plus_2, 'US_Independence Day':          date           event_name
0  2016-07-04  US_Independence Day
1  2017-07-04  US_Independence Day
2  2007-07-04  US_Independence Day
3  2008-07-04  US_Independence Day
4  2009-07-03  US_Independence Day
5  2010-07-05  US_Independence Day
6  2011-07-04  US_Independence Day
7  2012-07-04  US_Independence Day
8  2013-07-04  US_Independence Day
9  2014-07-04  US_Independence Day
10 2015-07-03  US_Independence Day, 'US_Labor Day_plus_2':          date           event_name
0  2016-09-07  US_Labor Day_plus_2
1  2017-09-06  US_Labor Day_plus_2
2  2007-09-05  US_Labor Day_plus_2
3  2008-09-03  US_Labor Day_plus_2
4  2009-09-09  US_Labor Day_plus_2
5  2010-09-08  US_Labor Day_plus_2
6  2011-09-07  US_Labor Day_plus_2
7  2012-09-05  US_Labor Day_plus_2
8  2013-09-04  US_Labor Day_plus_2
9  2014-09-03  US_Labor Day_plus_2
10 2015-09-09  US_Labor Day_plus_2, 'US_New Years Day':          date        event_name
0  2016-01-01  US_New Years Day
1  2017-01-02  US_New Years Day
2  2007-01-01  US_New Years Day
3  2008-01-01  US_New Years Day
4  2009-01-01  US_New Years Day
5  2010-01-01  US_New Years Day
6  2010-12-31  US_New Years Day
7  2012-01-02  US_New Years Day
8  2013-01-01  US_New Years Day
9  2014-01-01  US_New Years Day
10 2015-01-01  US_New Years Day, 'US_Columbus Day_plus_1':          date              event_name
0  2016-10-11  US_Columbus Day_plus_1
1  2017-10-10  US_Columbus Day_plus_1
2  2007-10-09  US_Columbus Day_plus_1
3  2008-10-14  US_Columbus Day_plus_1
4  2009-10-13  US_Columbus Day_plus_1
5  2010-10-12  US_Columbus Day_plus_1
6  2011-10-11  US_Columbus Day_plus_1
7  2012-10-09  US_Columbus Day_plus_1
8  2013-10-15  US_Columbus Day_plus_1
9  2014-10-14  US_Columbus Day_plus_1
10 2015-10-13  US_Columbus Day_plus_1, 'US_Martin Luther King Jr. Day_minus_2':          date                             event_name
0  2016-01-16  US_Martin Luther King Jr. Day_minus_2
1  2017-01-14  US_Martin Luther King Jr. Day_minus_2
2  2007-01-13  US_Martin Luther King Jr. Day_minus_2
3  2008-01-19  US_Martin Luther King Jr. Day_minus_2
4  2009-01-17  US_Martin Luther King Jr. Day_minus_2
5  2010-01-16  US_Martin Luther King Jr. Day_minus_2
6  2011-01-15  US_Martin Luther King Jr. Day_minus_2
7  2012-01-14  US_Martin Luther King Jr. Day_minus_2
8  2013-01-19  US_Martin Luther King Jr. Day_minus_2
9  2014-01-18  US_Martin Luther King Jr. Day_minus_2
10 2015-01-17  US_Martin Luther King Jr. Day_minus_2, 'Holiday_positive_group':          date event_name
0  2016-11-25      event
1  2017-11-24      event
2  2007-11-23      event
3  2008-11-28      event
4  2009-11-27      event
..        ...        ...
61 2010-12-30      event
62 2012-01-01      event
63 2012-12-31      event
64 2013-12-31      event
65 2014-12-31      event

[66 rows x 2 columns], 'Holiday_negative_group':          date event_name
0  2016-11-13      event
1  2017-11-12      event
2  2007-11-14      event
3  2008-11-13      event
4  2009-11-13      event
..        ...        ...
83 2011-12-28      event
84 2012-12-27      event
85 2013-12-27      event
86 2014-12-27      event
87 2015-12-27      event

[88 rows x 2 columns]}

Holiday Grouper

One step further, HolidayGrouper is a convenient tool that automatically groups similar holidays and their neighboring days together based on their estimated impact and clustering algorithms. This helps to (1) reduce the number of parameters to be estimated and have each group have sufficient data points to be reliably estimated; (2) make sure different holidays can be separately modeled to avoid confounding effects.

Also, we provide flexible diagnostics to help users choose the number of groups, as well as utility functions to spot check which group a holiday belongs to and what are the similar holidays within the same group.

How it works

First, we need to supply the algorithm a list of holidays and dates, as well as a time series of interest. In addition, we specify a dictionary of neighboring days that a holiday may have effect on. For example, for Thanksgiving that always falls on Thursday, we may expect a holiday effect that starts the day before and lasts till the coming Monday, then we can specify "Thanksgiving": (1, 4) as an item in the dictionary. All the neighboring days specified as such will be added to the events pool. Note that each neighboring day is also treated as a single event, and may not end up with the same group as its original holiday date. That is, "Thanksgiving_plus_4" (Monday) may have a very different impact than "Thanksgiving (Thursday) and they may not end up with being in the same group.

Second, we also note that holidays falling on weekdays may have a different impact than those on weekends. For example, "Christmas Day_WE" may have a different effect than "Christmas Day_WD". We included two built-in options (“wd_we”: weekday vs weekend, “dow_grouped”: weekday, Sat, Sun), but one can custom their own grouping via get_suffix_func parameter.

Next, each single event gets a score, the estimated (relative) impact that uses the same methodology as in the Holiday Inferrer (e.g. -0.1 means 10% lower than the baseline). For example, you can use baseline_offsets=[-7, 7]. The score will then be used for the clustering algorithm. Therefore, if an event only shows up once in the input time series, the estimated impact may not be accurate. One can set the minimal number of occurrences of an event by parameter min_n_days (set it to 1 if you are okay with including all events that appear only once on a single day in the input data). Also, you can specify the minimal average score of an event to be kept in consideration by min_abs_avg_score. If an event has an average score of -1% (across all its occurrences), it may not be worth including in the model. Absolute effects lower than min_abs_avg_score will be excluded before clustering. Also, if an event have inconsistent scores (e.g. two occurrences have -8%, +5% respectively), then this could be noise rather than signal. These events are excluded as well. This is handled automatically and user does not need to worry about it.

The last step of the grouper is to group events that have similar effects and generate daily_event_df_dict. We provide two options for clustering, Kernel Density Estimation (clustering_method="kde") and K-means (clustering_method="kmeans"). In K-means, you can specify n_clusters to your desired number of groups. In KDE clustering, you can change the default bandwidth parameter to adjust the number of groups you get. Depending on the length of the time series and the number of holidays considered, we recommend a range from 5 to 15 groups. You can check the visualization / diagnostics via attribute self.result_dict["kmean_plot"] or self.result_dict["kde_plot"], respectively. See group_holidays for more parameter details.

Example

Now we look at an example with the Peyton-Manning Wiki page view data.

351 import pandas as pd
352 import plotly
353 from greykite.algo.common.holiday_grouper import HolidayGrouper
354 from greykite.common.data_loader import DataLoader
355 from greykite.common.features.timeseries_features import get_holidays
356 from greykite.common import constants as cst
357
358 df = DataLoader().load_peyton_manning()
359 df[cst.TIME_COL] = pd.to_datetime(df[cst.TIME_COL])

Let’s generate a list of holidays in the United States, and we also specify the neighboring days we want to consider in the holiday model.

365 year_start = df[cst.TIME_COL].dt.year.min() - 1
366 year_end = df[cst.TIME_COL].dt.year.max() + 1
367 holiday_df = get_holidays(countries=["US"], year_start=year_start, year_end=year_end)["US"]
368
369 # Defines the number of pre / post days that a holiday has impact on.
370 # If not specified, (0, 0) will be used.
371 holiday_impact_dict = {
372     "Christmas Day": (4, 3),  # 12/25.
373     "Independence Day": (4, 4),  # 7/4.
374     "Juneteenth National Independence Day": (3, 3),  # 6/19.
375     "Labor Day": (3, 1),  # Monday.
376     "Martin Luther King Jr. Day": (3, 1),  # Monday.
377     "Memorial Day": (3, 1),  # Monday.
378     "New Year's Day": (3, 4),  # 1/1.
379     "Thanksgiving": (1, 4),  # Thursday.
380 }

Now we run the holiday grouper with K-means clustering.

385 # Instantiates `HolidayGrouper`.
386 hg = HolidayGrouper(
387     df=df,
388     time_col=cst.TIME_COL,
389     value_col=cst.VALUE_COL,
390     holiday_df=holiday_df,
391     holiday_date_col="date",
392     holiday_name_col="event_name",
393     holiday_impact_dict=holiday_impact_dict,
394     get_suffix_func="dow_grouped"
395 )
396
397 # Runs holiday grouper using k-means with diagnostics.
398 hg.group_holidays(
399     baseline_offsets=[-7, 7],
400     min_n_days=2,
401     min_abs_avg_score=0.03,
402     clustering_method="kmeans",
403     n_clusters=6,
404     include_diagnostics=True
405 )
406
407 result_dict = hg.result_dict
408 daily_event_df_dict = result_dict["daily_event_df_dict"]  # Can be directed used in events.

Check results. For example, we can check the score and grouping of New Year’s Day that falls on weekdays.

413 hg.check_scores("New Year's Day_WD")
414 hg.check_holiday_group("New Year's Day_WD")

Out:

New Year's Day_WD_plus_1_WD:
Dates: ['2008-01-02', '2009-01-02', '2013-01-02', '2014-01-02', '2015-01-02']
Scores: [0.03379344256327239, 0.04128768523374012, 0.03341019990536573, 0.04014992076091155, 0.018290402348765396]

New Year's Day_WD_plus_2_WD:
Dates: ['2008-01-03', '2013-01-03', '2014-01-03']
Scores: [0.022465804391486117, 0.041658084709366515, 0.036190130685515846]

New Year's Day_WD_minus_3_Sat:
Dates: ['2007-12-29', '2012-12-29']
Scores: [-0.019154513001846295, -0.09385493853788847]

New Year's Day_WD_minus_2_WD:
Dates: ['2008-12-30', '2009-12-30', '2013-12-30', '2014-12-30', '2015-12-30']
Scores: [0.022987608584582927, 0.04274070828630996, 0.09834689253916061, -0.0309835669755046, 0.03853585871691109]

New Year's Day_WD_plus_2_Sat:
Dates: ['2009-01-03', '2015-01-03']
Scores: [0.14825560720626918, 0.06583931841025273]

Average impact:
{"New Year's Day_WD_plus_1_WD": 0.033386330162411035, "New Year's Day_WD_plus_2_WD": 0.033438006595456156, "New Year's Day_WD_minus_3_Sat": -0.056504725769867384, "New Year's Day_WD_minus_2_WD": 0.034325500230291996, "New Year's Day_WD_plus_2_Sat": 0.10704746280826095}
`holiday_group_2` contains events matching the provided pattern.
This group includes 8 distinct events.

         date       event_name                    original_name  avg_score
0  2008-12-28  holiday_group_2      Christmas Day_WD_plus_3_Sun  -0.063512
1  2014-12-28  holiday_group_2      Christmas Day_WD_plus_3_Sun  -0.063512
2  2006-09-03  holiday_group_2         Labor Day_WD_minus_1_Sun  -0.060883
3  2007-09-02  holiday_group_2         Labor Day_WD_minus_1_Sun  -0.060883
4  2008-08-31  holiday_group_2         Labor Day_WD_minus_1_Sun  -0.060883
5  2009-09-06  holiday_group_2         Labor Day_WD_minus_1_Sun  -0.060883
6  2010-09-05  holiday_group_2         Labor Day_WD_minus_1_Sun  -0.060883
7  2011-09-04  holiday_group_2         Labor Day_WD_minus_1_Sun  -0.060883
8  2012-09-02  holiday_group_2         Labor Day_WD_minus_1_Sun  -0.060883
9  2013-09-01  holiday_group_2         Labor Day_WD_minus_1_Sun  -0.060883
10 2014-08-31  holiday_group_2         Labor Day_WD_minus_1_Sun  -0.060883
11 2015-09-06  holiday_group_2         Labor Day_WD_minus_1_Sun  -0.060883
12 2016-09-04  holiday_group_2         Labor Day_WD_minus_1_Sun  -0.060883
13 2017-09-03  holiday_group_2         Labor Day_WD_minus_1_Sun  -0.060883
14 2007-12-29  holiday_group_2    New Year's Day_WD_minus_3_Sat  -0.056505
15 2012-12-29  holiday_group_2    New Year's Day_WD_minus_3_Sat  -0.056505
16 2007-12-23  holiday_group_2     Christmas Day_WD_minus_2_Sun  -0.056145
17 2012-12-23  holiday_group_2     Christmas Day_WD_minus_2_Sun  -0.056145
18 2009-07-03  holiday_group_2   Independence Day (Observed)_WD  -0.055264
19 2009-07-03  holiday_group_2  Independence Day_Sat_minus_1_WD  -0.055264
20 2010-07-05  holiday_group_2   Independence Day (Observed)_WD  -0.055264
21 2015-07-03  holiday_group_2   Independence Day (Observed)_WD  -0.055264
22 2015-07-03  holiday_group_2  Independence Day_Sat_minus_1_WD  -0.055264
23 2006-12-25  holiday_group_2                 Christmas Day_WD  -0.054741
24 2007-12-25  holiday_group_2                 Christmas Day_WD  -0.054741
25 2008-12-25  holiday_group_2                 Christmas Day_WD  -0.054741
26 2009-12-25  holiday_group_2                 Christmas Day_WD  -0.054741
27 2012-12-25  holiday_group_2                 Christmas Day_WD  -0.054741
28 2013-12-25  holiday_group_2                 Christmas Day_WD  -0.054741
29 2014-12-25  holiday_group_2                 Christmas Day_WD  -0.054741
30 2015-12-25  holiday_group_2                 Christmas Day_WD  -0.054741
31 2017-12-25  holiday_group_2                 Christmas Day_WD  -0.054741
32 2009-12-26  holiday_group_2      Christmas Day_WD_plus_1_Sat  -0.052137
33 2015-12-26  holiday_group_2      Christmas Day_WD_plus_1_Sat  -0.052137
`holiday_group_4` contains events matching the provided pattern.
This group includes 4 distinct events.

         date       event_name                  original_name  avg_score
0  2007-01-02  holiday_group_4    New Year's Day_WD_plus_1_WD   0.033386
1  2008-01-02  holiday_group_4    New Year's Day_WD_plus_1_WD   0.033386
2  2009-01-02  holiday_group_4    New Year's Day_WD_plus_1_WD   0.033386
3  2013-01-02  holiday_group_4    New Year's Day_WD_plus_1_WD   0.033386
4  2014-01-02  holiday_group_4    New Year's Day_WD_plus_1_WD   0.033386
5  2015-01-02  holiday_group_4    New Year's Day_WD_plus_1_WD   0.033386
6  2007-01-03  holiday_group_4    New Year's Day_WD_plus_2_WD   0.033438
7  2008-01-03  holiday_group_4    New Year's Day_WD_plus_2_WD   0.033438
8  2013-01-03  holiday_group_4    New Year's Day_WD_plus_2_WD   0.033438
9  2014-01-03  holiday_group_4    New Year's Day_WD_plus_2_WD   0.033438
10 2008-12-30  holiday_group_4   New Year's Day_WD_minus_2_WD   0.034326
11 2009-12-30  holiday_group_4   New Year's Day_WD_minus_2_WD   0.034326
12 2013-12-30  holiday_group_4   New Year's Day_WD_minus_2_WD   0.034326
13 2014-12-30  holiday_group_4   New Year's Day_WD_minus_2_WD   0.034326
14 2015-12-30  holiday_group_4   New Year's Day_WD_minus_2_WD   0.034326
15 2006-01-16  holiday_group_4  Martin Luther King Jr. Day_WD   0.056942
16 2007-01-15  holiday_group_4  Martin Luther King Jr. Day_WD   0.056942
17 2008-01-21  holiday_group_4  Martin Luther King Jr. Day_WD   0.056942
18 2009-01-19  holiday_group_4  Martin Luther King Jr. Day_WD   0.056942
19 2010-01-18  holiday_group_4  Martin Luther King Jr. Day_WD   0.056942
20 2011-01-17  holiday_group_4  Martin Luther King Jr. Day_WD   0.056942
21 2012-01-16  holiday_group_4  Martin Luther King Jr. Day_WD   0.056942
22 2013-01-21  holiday_group_4  Martin Luther King Jr. Day_WD   0.056942
23 2014-01-20  holiday_group_4  Martin Luther King Jr. Day_WD   0.056942
24 2015-01-19  holiday_group_4  Martin Luther King Jr. Day_WD   0.056942
25 2016-01-18  holiday_group_4  Martin Luther King Jr. Day_WD   0.056942
26 2017-01-16  holiday_group_4  Martin Luther King Jr. Day_WD   0.056942
`holiday_group_5` contains events matching the provided pattern.
This group includes 1 distinct events.

        date       event_name                 original_name  avg_score
0 2009-01-03  holiday_group_5  New Year's Day_WD_plus_2_Sat   0.107047
1 2015-01-03  holiday_group_5  New Year's Day_WD_plus_2_Sat   0.107047

Check the diagnostics plot for K-means clustering.

419 plotly.io.show(result_dict["kmeans_plot"])

Now let’s try clustering using KDE and check the results.

424 hg.group_holidays(
425     baseline_offsets=[-7, 7],
426     min_n_days=1,
427     min_abs_avg_score=0.03,
428     bandwidth_multiplier=0.5,
429     clustering_method="kde"
430 )
431 result_dict = hg.result_dict
432 daily_event_df_dict = result_dict["daily_event_df_dict"]
433
434 plotly.io.show(result_dict["kde_plot"])
435 # Checks the number of events in each group.
436 for event_group, event_df in daily_event_df_dict.items():
437     print(f"{event_group}: contains {event_df.shape[0]} days.")

Out:

holiday_group_0: contains 33 days.
holiday_group_1: contains 43 days.
holiday_group_2: contains 66 days.
holiday_group_3: contains 1 days.
holiday_group_4: contains 34 days.
holiday_group_5: contains 4 days.
holiday_group_6: contains 5 days.

Total running time of the script: ( 0 minutes 11.099 seconds)

Gallery generated by Sphinx-Gallery