Note
Click here to download the full example code
Auto Configuration Tools
The Silverkite model has many hyperparameters to tune. Besides domain knowledge, we also have tools that can help find good choices for certain hyperparameters. In this tutorial, we will present
seasonality inferrer
holiday inferrer
holiday grouper
Note
If you use the model templates, you can specify the “auto” option for certain model components (growth, seasonality and holiday), and the auto configuration tool will be activated automatically. See auto seasonality, auto growth and auto holidays for the way to activate them. This doc explains how the “auto” options work behind the code. You can replay the “auto” options with the Seasonality Inferrer and Holiday Inferrer below. Please remember that if you are doing train-test split, running the inferrers on training data only is closer to the reality.
Seasonality Inferrer
The Silverkite model uses Fourier series to model seasonalities. It’s sometimes difficult to decide what orders we should use for each Fourier series. Larger orders tend to fit more closely to the curves, while having the risk of overfitting. Small orders tend to underfit the curve and may not learn the exact seasonality patterns.
SeasonalityInferrer
is a tool that can help you decide what order to use for a seasonality’s Fourier series.
Note that there are many ways to decide the orders,
and you don’t have to strictly stick to the results from Seasonality Inferrer.
How it works
The seasonality inferrer utilizes criteria including AIC and BIC to find the most appropriate Fourier series orders. For a specific seasonality, e.g. yearly seasonality, the steps are as follows:
Trend removal: seasonality inferrer provides 4 options for trend removal. They are listed in
TrendAdjustMethodEnum
. Specifically:"seasonal_average"
: given an indicator of seasonal period, the method subtracts the average within each seasonal period from the original time series. For example, given the columnyear
, the average is calculated on each different year."overall_average"
: subtracts the overall average from the original time series."spline_fit"
: fits a polynomial up to a given degree and subtract from the original time series."none"
: does not adjust the trend.
Typically “seasonal_average” is a good choice with appropriate columns. For example, we can use
year_quarter
for quarterly seasonality,year_month
for monthly seasonality,year_woy_iso
for weekly seasonality andyear_woy_dow_iso
for daily seasonality.Optional aggregation: sometimes we want to get rid of shorter fluctuations before fitting a longer seasonality period. We can do an optional aggregation beforehand. For example, when we model yearly seasonality, we can do a
"7D"
aggregation to eliminate weekly effects to make the result more stable.With a pre-specified maximum order
n
, we fit the de-trended (and aggregated) time series with Fourier series from 1 to n, and calculate the AIC/BIC for those fits. The most appropriate order is then decided by choosing the order with best AIC or BIC. The method also allows to slightly sacrifice the criterion and reduce the order for less risk of overfitting using thetolerance
parameter.Finally, an optional offset can be applied to any inferred orders to allow manual adjustments. For example, if one would like to use less yearly seasonality order, they may specify offset for yearly seasonality to be -2, and the final order will subtract 2 from the inferred result. This is useful when users tend to use more or less orders to model seasonality, and want a knob on top of the inferring results.
Example
Now we look at an example with the Peyton-Manning Wiki page view data.
83 import pandas as pd
84 import plotly
85 from greykite.common.data_loader import DataLoader
86 from greykite.algo.common.seasonality_inferrer import SeasonalityInferConfig
87 from greykite.algo.common.seasonality_inferrer import SeasonalityInferrer
88 from greykite.algo.common.seasonality_inferrer import TrendAdjustMethodEnum
89 from greykite.common import constants as cst
The SeasonalityInferrer
class uses
SeasonalityInferConfig
to specify configuration for a single seasonality component,
and it takes a list of such configurations to infer multiple seasonality
components together.
Now we specify seasonality inferring configs for yearly to weekly seasonalities.
In each of these configs, specify the parameters that are distinct for each component.
If there are parameters that are the same across all configs,
you can specify them in the function directly.
102 yearly_config = SeasonalityInferConfig(
103 seas_name="yearly", # name for seasonality
104 col_name="toy", # column to generate Fourier series, fixed for yearly
105 period=1.0, # seasonal period, fixed for yearly
106 max_order=30, # max number of orders to model
107 adjust_trend_param=dict(
108 trend_average_col="year"
109 ), # column to adjust trend for method "seasonal_average"
110 aggregation_period="W", # aggregation period,
111 offset=0 # add this to the inferred result, default 0
112 )
113 quarterly_config = SeasonalityInferConfig(
114 seas_name="quarterly", # name for seasonality
115 col_name="toq", # column to generate Fourier series, fixed for quarterly
116 period=1.0, # seasonal period, fixed for quarterly
117 max_order=20, # max number of orders to model
118 adjust_trend_param=dict(
119 trend_average_col="year_quarter"
120 ), # column to adjust trend for method "seasonal_average"
121 aggregation_period="2D", # aggregation period
122 )
123 monthly_config = SeasonalityInferConfig(
124 seas_name="monthly", # name for seasonality
125 col_name="tom", # column to generate Fourier series, fixed for monthly
126 period=1.0, # seasonal period, fixed for monthly
127 max_order=20, # max number of orders to model
128 adjust_trend_param=dict(
129 trend_average_col="year_month"
130 ), # column to adjust trend for method "seasonal_average"
131 aggregation_period="D" # aggregation period
132 )
133 weekly_config = SeasonalityInferConfig(
134 seas_name="weekly", # name for seasonality
135 col_name="tow", # column to generate Fourier series, fixed for weekly
136 period=7.0, # seasonal period, fixed for weekly
137 max_order=10, # max number of orders to model
138 adjust_trend_param=dict(
139 trend_average_col="year_woy_iso"
140 ), # column to adjust trend for method "seasonal_average"
141 aggregation_period="D",
142 tolerance=0.005, # allows 0.5% higher criterion for lower orders
143 )
Next, we put everything together to infer seasonality effects.
148 df = DataLoader().load_peyton_manning()
149 df[cst.TIME_COL] = pd.to_datetime((df[cst.TIME_COL]))
150
151 model = SeasonalityInferrer()
152 result = model.infer_fourier_series_order(
153 df=df,
154 time_col=cst.TIME_COL,
155 value_col=cst.VALUE_COL,
156 configs=[
157 yearly_config,
158 quarterly_config,
159 monthly_config,
160 weekly_config
161 ],
162 adjust_trend_method=TrendAdjustMethodEnum.seasonal_average.name,
163 fit_algorithm="linear",
164 plotting=True,
165 criterion="bic",
166 )
The method runs quickly and we can simply extract the inferred results from the output.
172 result["best_orders"]
Out:
{'yearly': 6, 'quarterly': 2, 'monthly': 1, 'weekly': 2}
We can also plot the results to see how different orders vary the criterion. Similar to other trade-off plots, the plot first goes down and then goes up, reaching the best at some appropriate value in the middle.
179 # The [0] extracts the first seasonality component from the results.
180 plotly.io.show(result["result"][0]["fig"])
Holiday Inferrer
The Silverkite model supports modeling holidays and their neighboring days as indicators. Significant days are modeled separately, while similar days can be grouped together as one indicator, assuming their effects are the same.
It’s sometimes difficult to decide which holidays to include,
to model separately or to model together.
HolidayInferrer
is a tool that can help you decide which holidays to model
and how to model them.
It can also automatically generate the holiday configuration parameters.
Note that there are many ways to decide the holiday configurations,
and you don’t have to strictly stick to the results from Holiday Inferrer.
How it works
The holiday inferrer estimates individual holiday or their neighboring days’ effects by comparing the observations on these days with some baseline prior to or after the holiday period. Then it ranks the effects by their magnitude. Depending on some thresholds, it decides whether to model a day independently, together with others or do not model it.
In detail, the first step is to unify the data frequency. For data whose frequency is greater than daily, holiday effect is automatically turned off. For data whose frequency is less than daily, it is aggregated into daily data, since holidays are daily events. From now on, we have daily data for the next step.
Given a list of countries, the tool automatically pulls candidate
holidays from the database. With a pre_search_days
and a post_search_days
parameters, those holidays’ neighboring days are included in the candidate pool
as well.
For every candidate holiday or neighboring day,
the baseline is the average of a configurable offsets.
For example, for data that exhibits strong weekly seasonality,
the offsets can be (-7, 7)
, where the baseline will be
the average of the last same day of week’s observation and the
next same day of week’s observation.
For example, if the holiday is New Year on 1/1 while 12/25 (7 days ago) is Christmas,
it will look at the value on 12/18 instead of 12/25 as baseline.
The day’s effect is the average of the signed difference between the true observation and the baseline across all occurrences in the time series. The effects are ranked from the highest to the lowest by their absolute effects.
To decide how each holiday is modeled, we rely on two parameters:
independent_holiday_thres
and together_holiday_thres
.
These parameters are between 0 and 1.
Starting from the largest effect,
we calculate the cumulative sum of effect of all candidates.
Once the cumulative effect reaches independend_holiday_thres
of the total effects,
these days will be modeled independently (i.e, each day has an individual coefficient).
We keep accumulating effects until the sum reaches together_holiday_thres
,
the days in the between are grouped into “positive_group” and “negative_group”,
with each group modeled together.
Example
Now we look at an example with the Peyton-Manning Wiki page view data.
252 import pandas as pd
253 import plotly
254 from greykite.algo.common.holiday_inferrer import HolidayInferrer
255 from greykite.common.data_loader import DataLoader
256 from greykite.common import constants as cst
257
258 df = DataLoader().load_peyton_manning()
259 df[cst.TIME_COL] = pd.to_datetime(df[cst.TIME_COL])
Let’s say we want to infer the holidays in the United States, with consideration on +/- 2 days of each holiday as potential candidates too.
265 hi = HolidayInferrer()
266 result = hi.infer_holidays(
267 df=df,
268 countries=["US"], # Search holidays in United States
269 plot=True, # Output a plot
270 pre_search_days=2, # Considers 2 days before each holiday
271 post_search_days=2, # Considers 2 days after each holiday
272 independent_holiday_thres=0.9, # The first 90% of effects are modeled separately
273 together_holiday_thres=0.99, # The 90% to 99% of effects are modeled together
274 baseline_offsets=[-7, 7] # The baseline is the average of -7/+7 observations
275 )
We can plot the inferred holiday results.
280 plotly.io.show(result["fig"])
The class also has a method to generate the holiday configuration based on the inferred results, that is consumable directly by the Silverkite model.
286 hi.generate_daily_event_dict()
Out:
{'US_Labor Day': date event_name
0 2016-09-05 US_Labor Day
1 2017-09-04 US_Labor Day
2 2007-09-03 US_Labor Day
3 2008-09-01 US_Labor Day
4 2009-09-07 US_Labor Day
5 2010-09-06 US_Labor Day
6 2011-09-05 US_Labor Day
7 2012-09-03 US_Labor Day
8 2013-09-02 US_Labor Day
9 2014-09-01 US_Labor Day
10 2015-09-07 US_Labor Day, 'US_Christmas Day': date event_name
0 2016-12-26 US_Christmas Day
1 2017-12-25 US_Christmas Day
2 2007-12-25 US_Christmas Day
3 2008-12-25 US_Christmas Day
4 2009-12-25 US_Christmas Day
5 2010-12-24 US_Christmas Day
6 2011-12-26 US_Christmas Day
7 2012-12-25 US_Christmas Day
8 2013-12-25 US_Christmas Day
9 2014-12-25 US_Christmas Day
10 2015-12-25 US_Christmas Day, 'US_Labor Day_minus_1': date event_name
0 2016-09-04 US_Labor Day_minus_1
1 2017-09-03 US_Labor Day_minus_1
2 2007-09-02 US_Labor Day_minus_1
3 2008-08-31 US_Labor Day_minus_1
4 2009-09-06 US_Labor Day_minus_1
5 2010-09-05 US_Labor Day_minus_1
6 2011-09-04 US_Labor Day_minus_1
7 2012-09-02 US_Labor Day_minus_1
8 2013-09-01 US_Labor Day_minus_1
9 2014-08-31 US_Labor Day_minus_1
10 2015-09-06 US_Labor Day_minus_1, 'US_Martin Luther King Jr. Day': date event_name
0 2016-01-18 US_Martin Luther King Jr. Day
1 2017-01-16 US_Martin Luther King Jr. Day
2 2007-01-15 US_Martin Luther King Jr. Day
3 2008-01-21 US_Martin Luther King Jr. Day
4 2009-01-19 US_Martin Luther King Jr. Day
5 2010-01-18 US_Martin Luther King Jr. Day
6 2011-01-17 US_Martin Luther King Jr. Day
7 2012-01-16 US_Martin Luther King Jr. Day
8 2013-01-21 US_Martin Luther King Jr. Day
9 2014-01-20 US_Martin Luther King Jr. Day
10 2015-01-19 US_Martin Luther King Jr. Day, 'US_Washingtons Birthday_minus_1': date event_name
0 2016-02-14 US_Washingtons Birthday_minus_1
1 2017-02-19 US_Washingtons Birthday_minus_1
2 2007-02-18 US_Washingtons Birthday_minus_1
3 2008-02-17 US_Washingtons Birthday_minus_1
4 2009-02-15 US_Washingtons Birthday_minus_1
5 2010-02-14 US_Washingtons Birthday_minus_1
6 2011-02-20 US_Washingtons Birthday_minus_1
7 2012-02-19 US_Washingtons Birthday_minus_1
8 2013-02-17 US_Washingtons Birthday_minus_1
9 2014-02-16 US_Washingtons Birthday_minus_1
10 2015-02-15 US_Washingtons Birthday_minus_1, 'US_Thanksgiving_minus_2': date event_name
0 2016-11-22 US_Thanksgiving_minus_2
1 2017-11-21 US_Thanksgiving_minus_2
2 2007-11-20 US_Thanksgiving_minus_2
3 2008-11-25 US_Thanksgiving_minus_2
4 2009-11-24 US_Thanksgiving_minus_2
5 2010-11-23 US_Thanksgiving_minus_2
6 2011-11-22 US_Thanksgiving_minus_2
7 2012-11-20 US_Thanksgiving_minus_2
8 2013-11-26 US_Thanksgiving_minus_2
9 2014-11-25 US_Thanksgiving_minus_2
10 2015-11-24 US_Thanksgiving_minus_2, 'US_Washingtons Birthday_plus_1': date event_name
0 2016-02-16 US_Washingtons Birthday_plus_1
1 2017-02-21 US_Washingtons Birthday_plus_1
2 2007-02-20 US_Washingtons Birthday_plus_1
3 2008-02-19 US_Washingtons Birthday_plus_1
4 2009-02-17 US_Washingtons Birthday_plus_1
5 2010-02-16 US_Washingtons Birthday_plus_1
6 2011-02-22 US_Washingtons Birthday_plus_1
7 2012-02-21 US_Washingtons Birthday_plus_1
8 2013-02-19 US_Washingtons Birthday_plus_1
9 2014-02-18 US_Washingtons Birthday_plus_1
10 2015-02-17 US_Washingtons Birthday_plus_1, 'US_New Years Day_plus_1': date event_name
0 2016-01-02 US_New Years Day_plus_1
1 2017-01-03 US_New Years Day_plus_1
2 2007-01-02 US_New Years Day_plus_1
3 2008-01-02 US_New Years Day_plus_1
4 2009-01-02 US_New Years Day_plus_1
5 2010-01-02 US_New Years Day_plus_1
6 2011-01-01 US_New Years Day_plus_1
7 2012-01-03 US_New Years Day_plus_1
8 2013-01-02 US_New Years Day_plus_1
9 2014-01-02 US_New Years Day_plus_1
10 2015-01-02 US_New Years Day_plus_1, 'US_Veterans Day_minus_2': date event_name
0 2016-11-09 US_Veterans Day_minus_2
1 2017-11-08 US_Veterans Day_minus_2
2 2007-11-10 US_Veterans Day_minus_2
3 2008-11-09 US_Veterans Day_minus_2
4 2009-11-09 US_Veterans Day_minus_2
5 2010-11-09 US_Veterans Day_minus_2
6 2011-11-09 US_Veterans Day_minus_2
7 2012-11-10 US_Veterans Day_minus_2
8 2013-11-09 US_Veterans Day_minus_2
9 2014-11-09 US_Veterans Day_minus_2
10 2015-11-09 US_Veterans Day_minus_2, 'US_Washingtons Birthday_plus_2': date event_name
0 2016-02-17 US_Washingtons Birthday_plus_2
1 2017-02-22 US_Washingtons Birthday_plus_2
2 2007-02-21 US_Washingtons Birthday_plus_2
3 2008-02-20 US_Washingtons Birthday_plus_2
4 2009-02-18 US_Washingtons Birthday_plus_2
5 2010-02-17 US_Washingtons Birthday_plus_2
6 2011-02-23 US_Washingtons Birthday_plus_2
7 2012-02-22 US_Washingtons Birthday_plus_2
8 2013-02-20 US_Washingtons Birthday_plus_2
9 2014-02-19 US_Washingtons Birthday_plus_2
10 2015-02-18 US_Washingtons Birthday_plus_2, 'US_Christmas Day_plus_1': date event_name
0 2016-12-27 US_Christmas Day_plus_1
1 2017-12-26 US_Christmas Day_plus_1
2 2007-12-26 US_Christmas Day_plus_1
3 2008-12-26 US_Christmas Day_plus_1
4 2009-12-26 US_Christmas Day_plus_1
5 2010-12-25 US_Christmas Day_plus_1
6 2011-12-27 US_Christmas Day_plus_1
7 2012-12-26 US_Christmas Day_plus_1
8 2013-12-26 US_Christmas Day_plus_1
9 2014-12-26 US_Christmas Day_plus_1
10 2015-12-26 US_Christmas Day_plus_1, 'US_Memorial Day': date event_name
0 2016-05-30 US_Memorial Day
1 2017-05-29 US_Memorial Day
2 2007-05-28 US_Memorial Day
3 2008-05-26 US_Memorial Day
4 2009-05-25 US_Memorial Day
5 2010-05-31 US_Memorial Day
6 2011-05-30 US_Memorial Day
7 2012-05-28 US_Memorial Day
8 2013-05-27 US_Memorial Day
9 2014-05-26 US_Memorial Day
10 2015-05-25 US_Memorial Day, 'US_Veterans Day': date event_name
0 2016-11-11 US_Veterans Day
1 2017-11-10 US_Veterans Day
2 2007-11-12 US_Veterans Day
3 2008-11-11 US_Veterans Day
4 2009-11-11 US_Veterans Day
5 2010-11-11 US_Veterans Day
6 2011-11-11 US_Veterans Day
7 2012-11-12 US_Veterans Day
8 2013-11-11 US_Veterans Day
9 2014-11-11 US_Veterans Day
10 2015-11-11 US_Veterans Day, 'US_Washingtons Birthday_minus_2': date event_name
0 2016-02-13 US_Washingtons Birthday_minus_2
1 2017-02-18 US_Washingtons Birthday_minus_2
2 2007-02-17 US_Washingtons Birthday_minus_2
3 2008-02-16 US_Washingtons Birthday_minus_2
4 2009-02-14 US_Washingtons Birthday_minus_2
5 2010-02-13 US_Washingtons Birthday_minus_2
6 2011-02-19 US_Washingtons Birthday_minus_2
7 2012-02-18 US_Washingtons Birthday_minus_2
8 2013-02-16 US_Washingtons Birthday_minus_2
9 2014-02-15 US_Washingtons Birthday_minus_2
10 2015-02-14 US_Washingtons Birthday_minus_2, 'US_Thanksgiving_minus_1': date event_name
0 2016-11-23 US_Thanksgiving_minus_1
1 2017-11-22 US_Thanksgiving_minus_1
2 2007-11-21 US_Thanksgiving_minus_1
3 2008-11-26 US_Thanksgiving_minus_1
4 2009-11-25 US_Thanksgiving_minus_1
5 2010-11-24 US_Thanksgiving_minus_1
6 2011-11-23 US_Thanksgiving_minus_1
7 2012-11-21 US_Thanksgiving_minus_1
8 2013-11-27 US_Thanksgiving_minus_1
9 2014-11-26 US_Thanksgiving_minus_1
10 2015-11-25 US_Thanksgiving_minus_1, 'US_Labor Day_minus_2': date event_name
0 2016-09-03 US_Labor Day_minus_2
1 2017-09-02 US_Labor Day_minus_2
2 2007-09-01 US_Labor Day_minus_2
3 2008-08-30 US_Labor Day_minus_2
4 2009-09-05 US_Labor Day_minus_2
5 2010-09-04 US_Labor Day_minus_2
6 2011-09-03 US_Labor Day_minus_2
7 2012-09-01 US_Labor Day_minus_2
8 2013-08-31 US_Labor Day_minus_2
9 2014-08-30 US_Labor Day_minus_2
10 2015-09-05 US_Labor Day_minus_2, 'US_Columbus Day': date event_name
0 2016-10-10 US_Columbus Day
1 2017-10-09 US_Columbus Day
2 2007-10-08 US_Columbus Day
3 2008-10-13 US_Columbus Day
4 2009-10-12 US_Columbus Day
5 2010-10-11 US_Columbus Day
6 2011-10-10 US_Columbus Day
7 2012-10-08 US_Columbus Day
8 2013-10-14 US_Columbus Day
9 2014-10-13 US_Columbus Day
10 2015-10-12 US_Columbus Day, 'US_Memorial Day_plus_1': date event_name
0 2016-05-31 US_Memorial Day_plus_1
1 2017-05-30 US_Memorial Day_plus_1
2 2007-05-29 US_Memorial Day_plus_1
3 2008-05-27 US_Memorial Day_plus_1
4 2009-05-26 US_Memorial Day_plus_1
5 2010-06-01 US_Memorial Day_plus_1
6 2011-05-31 US_Memorial Day_plus_1
7 2012-05-29 US_Memorial Day_plus_1
8 2013-05-28 US_Memorial Day_plus_1
9 2014-05-27 US_Memorial Day_plus_1
10 2015-05-26 US_Memorial Day_plus_1, 'US_Halloween': date event_name
0 2016-10-31 US_Halloween
1 2017-10-31 US_Halloween
2 2007-10-31 US_Halloween
3 2008-10-31 US_Halloween
4 2009-10-31 US_Halloween
5 2010-10-31 US_Halloween
6 2011-10-31 US_Halloween
7 2012-10-31 US_Halloween
8 2013-10-31 US_Halloween
9 2014-10-31 US_Halloween
10 2015-10-31 US_Halloween, 'US_Labor Day_plus_1': date event_name
0 2016-09-06 US_Labor Day_plus_1
1 2017-09-05 US_Labor Day_plus_1
2 2007-09-04 US_Labor Day_plus_1
3 2008-09-02 US_Labor Day_plus_1
4 2009-09-08 US_Labor Day_plus_1
5 2010-09-07 US_Labor Day_plus_1
6 2011-09-06 US_Labor Day_plus_1
7 2012-09-04 US_Labor Day_plus_1
8 2013-09-03 US_Labor Day_plus_1
9 2014-09-02 US_Labor Day_plus_1
10 2015-09-08 US_Labor Day_plus_1, 'US_Martin Luther King Jr. Day_minus_1': date event_name
0 2016-01-17 US_Martin Luther King Jr. Day_minus_1
1 2017-01-15 US_Martin Luther King Jr. Day_minus_1
2 2007-01-14 US_Martin Luther King Jr. Day_minus_1
3 2008-01-20 US_Martin Luther King Jr. Day_minus_1
4 2009-01-18 US_Martin Luther King Jr. Day_minus_1
5 2010-01-17 US_Martin Luther King Jr. Day_minus_1
6 2011-01-16 US_Martin Luther King Jr. Day_minus_1
7 2012-01-15 US_Martin Luther King Jr. Day_minus_1
8 2013-01-20 US_Martin Luther King Jr. Day_minus_1
9 2014-01-19 US_Martin Luther King Jr. Day_minus_1
10 2015-01-18 US_Martin Luther King Jr. Day_minus_1, 'US_Independence Day_minus_2': date event_name
0 2016-07-02 US_Independence Day_minus_2
1 2017-07-02 US_Independence Day_minus_2
2 2007-07-02 US_Independence Day_minus_2
3 2008-07-02 US_Independence Day_minus_2
4 2009-07-01 US_Independence Day_minus_2
5 2010-07-03 US_Independence Day_minus_2
6 2011-07-02 US_Independence Day_minus_2
7 2012-07-02 US_Independence Day_minus_2
8 2013-07-02 US_Independence Day_minus_2
9 2014-07-02 US_Independence Day_minus_2
10 2015-07-01 US_Independence Day_minus_2, 'US_Christmas Day_minus_1': date event_name
0 2016-12-25 US_Christmas Day_minus_1
1 2017-12-24 US_Christmas Day_minus_1
2 2007-12-24 US_Christmas Day_minus_1
3 2008-12-24 US_Christmas Day_minus_1
4 2009-12-24 US_Christmas Day_minus_1
5 2010-12-23 US_Christmas Day_minus_1
6 2011-12-25 US_Christmas Day_minus_1
7 2012-12-24 US_Christmas Day_minus_1
8 2013-12-24 US_Christmas Day_minus_1
9 2014-12-24 US_Christmas Day_minus_1
10 2015-12-24 US_Christmas Day_minus_1, 'US_Halloween_plus_2': date event_name
0 2016-11-02 US_Halloween_plus_2
1 2017-11-02 US_Halloween_plus_2
2 2007-11-02 US_Halloween_plus_2
3 2008-11-02 US_Halloween_plus_2
4 2009-11-02 US_Halloween_plus_2
5 2010-11-02 US_Halloween_plus_2
6 2011-11-02 US_Halloween_plus_2
7 2012-11-02 US_Halloween_plus_2
8 2013-11-02 US_Halloween_plus_2
9 2014-11-02 US_Halloween_plus_2
10 2015-11-02 US_Halloween_plus_2, 'US_Independence Day_minus_1': date event_name
0 2016-07-03 US_Independence Day_minus_1
1 2017-07-03 US_Independence Day_minus_1
2 2007-07-03 US_Independence Day_minus_1
3 2008-07-03 US_Independence Day_minus_1
4 2009-07-02 US_Independence Day_minus_1
5 2010-07-04 US_Independence Day_minus_1
6 2011-07-03 US_Independence Day_minus_1
7 2012-07-03 US_Independence Day_minus_1
8 2013-07-03 US_Independence Day_minus_1
9 2014-07-03 US_Independence Day_minus_1
10 2015-07-02 US_Independence Day_minus_1, 'US_Veterans Day_minus_1': date event_name
0 2016-11-10 US_Veterans Day_minus_1
1 2017-11-09 US_Veterans Day_minus_1
2 2007-11-11 US_Veterans Day_minus_1
3 2008-11-10 US_Veterans Day_minus_1
4 2009-11-10 US_Veterans Day_minus_1
5 2010-11-10 US_Veterans Day_minus_1
6 2011-11-10 US_Veterans Day_minus_1
7 2012-11-11 US_Veterans Day_minus_1
8 2013-11-10 US_Veterans Day_minus_1
9 2014-11-10 US_Veterans Day_minus_1
10 2015-11-10 US_Veterans Day_minus_1, 'US_Martin Luther King Jr. Day_plus_1': date event_name
0 2016-01-19 US_Martin Luther King Jr. Day_plus_1
1 2017-01-17 US_Martin Luther King Jr. Day_plus_1
2 2007-01-16 US_Martin Luther King Jr. Day_plus_1
3 2008-01-22 US_Martin Luther King Jr. Day_plus_1
4 2009-01-20 US_Martin Luther King Jr. Day_plus_1
5 2010-01-19 US_Martin Luther King Jr. Day_plus_1
6 2011-01-18 US_Martin Luther King Jr. Day_plus_1
7 2012-01-17 US_Martin Luther King Jr. Day_plus_1
8 2013-01-22 US_Martin Luther King Jr. Day_plus_1
9 2014-01-21 US_Martin Luther King Jr. Day_plus_1
10 2015-01-20 US_Martin Luther King Jr. Day_plus_1, 'US_Halloween_minus_2': date event_name
0 2016-10-29 US_Halloween_minus_2
1 2017-10-29 US_Halloween_minus_2
2 2007-10-29 US_Halloween_minus_2
3 2008-10-29 US_Halloween_minus_2
4 2009-10-29 US_Halloween_minus_2
5 2010-10-29 US_Halloween_minus_2
6 2011-10-29 US_Halloween_minus_2
7 2012-10-29 US_Halloween_minus_2
8 2013-10-29 US_Halloween_minus_2
9 2014-10-29 US_Halloween_minus_2
10 2015-10-29 US_Halloween_minus_2, 'US_Independence Day_plus_1': date event_name
0 2016-07-05 US_Independence Day_plus_1
1 2017-07-05 US_Independence Day_plus_1
2 2007-07-05 US_Independence Day_plus_1
3 2008-07-05 US_Independence Day_plus_1
4 2009-07-04 US_Independence Day_plus_1
5 2010-07-06 US_Independence Day_plus_1
6 2011-07-05 US_Independence Day_plus_1
7 2012-07-05 US_Independence Day_plus_1
8 2013-07-05 US_Independence Day_plus_1
9 2014-07-05 US_Independence Day_plus_1
10 2015-07-04 US_Independence Day_plus_1, 'US_Martin Luther King Jr. Day_plus_2': date event_name
0 2016-01-20 US_Martin Luther King Jr. Day_plus_2
1 2017-01-18 US_Martin Luther King Jr. Day_plus_2
2 2007-01-17 US_Martin Luther King Jr. Day_plus_2
3 2008-01-23 US_Martin Luther King Jr. Day_plus_2
4 2009-01-21 US_Martin Luther King Jr. Day_plus_2
5 2010-01-20 US_Martin Luther King Jr. Day_plus_2
6 2011-01-19 US_Martin Luther King Jr. Day_plus_2
7 2012-01-18 US_Martin Luther King Jr. Day_plus_2
8 2013-01-23 US_Martin Luther King Jr. Day_plus_2
9 2014-01-22 US_Martin Luther King Jr. Day_plus_2
10 2015-01-21 US_Martin Luther King Jr. Day_plus_2, 'US_Independence Day': date event_name
0 2016-07-04 US_Independence Day
1 2017-07-04 US_Independence Day
2 2007-07-04 US_Independence Day
3 2008-07-04 US_Independence Day
4 2009-07-03 US_Independence Day
5 2010-07-05 US_Independence Day
6 2011-07-04 US_Independence Day
7 2012-07-04 US_Independence Day
8 2013-07-04 US_Independence Day
9 2014-07-04 US_Independence Day
10 2015-07-03 US_Independence Day, 'US_Labor Day_plus_2': date event_name
0 2016-09-07 US_Labor Day_plus_2
1 2017-09-06 US_Labor Day_plus_2
2 2007-09-05 US_Labor Day_plus_2
3 2008-09-03 US_Labor Day_plus_2
4 2009-09-09 US_Labor Day_plus_2
5 2010-09-08 US_Labor Day_plus_2
6 2011-09-07 US_Labor Day_plus_2
7 2012-09-05 US_Labor Day_plus_2
8 2013-09-04 US_Labor Day_plus_2
9 2014-09-03 US_Labor Day_plus_2
10 2015-09-09 US_Labor Day_plus_2, 'US_New Years Day': date event_name
0 2016-01-01 US_New Years Day
1 2017-01-02 US_New Years Day
2 2007-01-01 US_New Years Day
3 2008-01-01 US_New Years Day
4 2009-01-01 US_New Years Day
5 2010-01-01 US_New Years Day
6 2010-12-31 US_New Years Day
7 2012-01-02 US_New Years Day
8 2013-01-01 US_New Years Day
9 2014-01-01 US_New Years Day
10 2015-01-01 US_New Years Day, 'US_Columbus Day_plus_1': date event_name
0 2016-10-11 US_Columbus Day_plus_1
1 2017-10-10 US_Columbus Day_plus_1
2 2007-10-09 US_Columbus Day_plus_1
3 2008-10-14 US_Columbus Day_plus_1
4 2009-10-13 US_Columbus Day_plus_1
5 2010-10-12 US_Columbus Day_plus_1
6 2011-10-11 US_Columbus Day_plus_1
7 2012-10-09 US_Columbus Day_plus_1
8 2013-10-15 US_Columbus Day_plus_1
9 2014-10-14 US_Columbus Day_plus_1
10 2015-10-13 US_Columbus Day_plus_1, 'US_Martin Luther King Jr. Day_minus_2': date event_name
0 2016-01-16 US_Martin Luther King Jr. Day_minus_2
1 2017-01-14 US_Martin Luther King Jr. Day_minus_2
2 2007-01-13 US_Martin Luther King Jr. Day_minus_2
3 2008-01-19 US_Martin Luther King Jr. Day_minus_2
4 2009-01-17 US_Martin Luther King Jr. Day_minus_2
5 2010-01-16 US_Martin Luther King Jr. Day_minus_2
6 2011-01-15 US_Martin Luther King Jr. Day_minus_2
7 2012-01-14 US_Martin Luther King Jr. Day_minus_2
8 2013-01-19 US_Martin Luther King Jr. Day_minus_2
9 2014-01-18 US_Martin Luther King Jr. Day_minus_2
10 2015-01-17 US_Martin Luther King Jr. Day_minus_2, 'Holiday_positive_group': date event_name
0 2016-11-25 event
1 2017-11-24 event
2 2007-11-23 event
3 2008-11-28 event
4 2009-11-27 event
.. ... ...
61 2010-12-30 event
62 2012-01-01 event
63 2012-12-31 event
64 2013-12-31 event
65 2014-12-31 event
[66 rows x 2 columns], 'Holiday_negative_group': date event_name
0 2016-11-13 event
1 2017-11-12 event
2 2007-11-14 event
3 2008-11-13 event
4 2009-11-13 event
.. ... ...
83 2011-12-28 event
84 2012-12-27 event
85 2013-12-27 event
86 2014-12-27 event
87 2015-12-27 event
[88 rows x 2 columns]}
Holiday Grouper
One step further, HolidayGrouper
is a convenient tool that automatically groups similar holidays and their neighboring days
together based on their estimated impact and clustering algorithms.
This helps to (1) reduce the number of parameters to be estimated
and have each group have sufficient data points to be reliably estimated;
(2) make sure different holidays can be separately modeled to avoid confounding effects.
Also, we provide flexible diagnostics to help users choose the number of groups, as well as utility functions to spot check which group a holiday belongs to and what are the similar holidays within the same group.
How it works
First, we need to supply the algorithm a list of holidays and dates, as well as a time series of interest.
In addition, we specify a dictionary of neighboring days that a holiday may have effect on.
For example, for Thanksgiving that always falls on Thursday, we may expect a holiday effect
that starts the day before and lasts till the coming Monday, then we can specify
"Thanksgiving": (1, 4)
as an item in the dictionary.
All the neighboring days specified as such will be added to the events pool.
Note that each neighboring day is also treated as a single event, and may not end up with the same group
as its original holiday date.
That is, "Thanksgiving_plus_4"
(Monday) may have a very different impact than
"Thanksgiving
(Thursday) and they may not end up with being in the same group.
Second, we also note that holidays falling on weekdays may have a different impact than those on weekends.
For example, "Christmas Day_WE"
may have a different effect than "Christmas Day_WD"
.
We included two built-in options (“wd_we”: weekday vs weekend, “dow_grouped”: weekday, Sat, Sun), but one
can custom their own grouping via get_suffix_func
parameter.
Next, each single event gets a score, the estimated (relative) impact that uses the same methodology
as in the Holiday Inferrer (e.g. -0.1 means 10% lower than the baseline).
For example, you can use baseline_offsets=[-7, 7]
.
The score will then be used for the clustering algorithm. Therefore, if an event only shows up once
in the input time series, the estimated impact may not be accurate.
One can set the minimal number of occurrences of an event by parameter min_n_days
(set it to 1 if
you are okay with including all events that appear only once on a single day in the input data).
Also, you can specify the minimal average score of an event to be kept in consideration by min_abs_avg_score
.
If an event has an average score of -1% (across all its occurrences), it may not be worth including in the model.
Absolute effects lower than min_abs_avg_score
will be excluded before clustering.
Also, if an event have inconsistent scores (e.g. two occurrences have -8%, +5% respectively), then this could be
noise rather than signal. These events are excluded as well.
This is handled automatically and user does not need to worry about it.
The last step of the grouper is to group events that have similar effects and generate daily_event_df_dict
.
We provide two options for clustering, Kernel Density Estimation (clustering_method="kde"
)
and K-means (clustering_method="kmeans"
).
In K-means, you can specify n_clusters
to your desired number of groups.
In KDE clustering, you can change the default bandwidth parameter to adjust the number of groups you get.
Depending on the length of the time series and the number of holidays considered, we recommend a range from 5 to
15 groups. You can check the visualization / diagnostics via attribute self.result_dict["kmean_plot"]
or self.result_dict["kde_plot"]
, respectively.
See group_holidays
for more parameter details.
Example
Now we look at an example with the Peyton-Manning Wiki page view data.
351 import pandas as pd
352 import plotly
353 from greykite.algo.common.holiday_grouper import HolidayGrouper
354 from greykite.common.data_loader import DataLoader
355 from greykite.common.features.timeseries_features import get_holidays
356 from greykite.common import constants as cst
357
358 df = DataLoader().load_peyton_manning()
359 df[cst.TIME_COL] = pd.to_datetime(df[cst.TIME_COL])
Let’s generate a list of holidays in the United States, and we also specify the neighboring days we want to consider in the holiday model.
365 year_start = df[cst.TIME_COL].dt.year.min() - 1
366 year_end = df[cst.TIME_COL].dt.year.max() + 1
367 holiday_df = get_holidays(countries=["US"], year_start=year_start, year_end=year_end)["US"]
368
369 # Defines the number of pre / post days that a holiday has impact on.
370 # If not specified, numbers specified by ``holiday_impact_pre_num_days`` and
371 # ``holiday_impact_post_num_days`` will be used.
372 holiday_impact_dict = {
373 "Christmas Day": (4, 3), # 12/25.
374 "Independence Day": (4, 4), # 7/4.
375 "Juneteenth National Independence Day": (3, 3), # 6/19.
376 "Labor Day": (3, 1), # Monday.
377 "Martin Luther King Jr. Day": (3, 1), # Monday.
378 "Memorial Day": (3, 1), # Monday.
379 "New Year's Day": (3, 4), # 1/1.
380 "Thanksgiving": (1, 4), # Thursday.
381 }
Now we run the holiday grouper with K-means clustering.
386 # Instantiates `HolidayGrouper`.
387 hg = HolidayGrouper(
388 df=df,
389 time_col=cst.TIME_COL,
390 value_col=cst.VALUE_COL,
391 holiday_df=holiday_df,
392 holiday_date_col="date",
393 holiday_name_col="event_name",
394 holiday_impact_pre_num_days=0,
395 holiday_impact_post_num_days=0,
396 holiday_impact_dict=holiday_impact_dict,
397 get_suffix_func="wd_we"
398 )
399
400 # Runs holiday grouper using k-means with diagnostics.
401 hg.group_holidays(
402 baseline_offsets=[-7, 7],
403 min_n_days=2,
404 min_abs_avg_score=0.03,
405 clustering_method="kmeans",
406 n_clusters=6,
407 include_diagnostics=True
408 )
409
410 result_dict = hg.result_dict
411 daily_event_df_dict = result_dict["daily_event_df_dict"] # Can be directed used in events.
Check results. For example, we can check the score and grouping of New Year’s Day that falls on weekdays.
416 hg.check_scores("New Year's Day_WD")
417 hg.check_holiday_group("New Year's Day_WD")
Out:
New Year's Day_WD_minus_3_WD:
Dates: ['2008-12-29', '2009-12-29', '2014-12-29', '2015-12-29']
Scores: [-0.0009702090442557211, 0.024248937252649764, -0.10479055326606326, 0.12974254007288447]
New Year's Day_WD_minus_2_WE:
Dates: ['2007-12-30', '2012-12-30']
Scores: [0.08274015952385388, -0.023946171584842288]
New Year's Day_WD_minus_1_WE:
Dates: []
Scores: []
New Year's Day_WD:
Dates: ['2008-01-01', '2009-01-01', '2010-01-01', '2013-01-01', '2014-01-01', '2015-01-01', '2016-01-01']
Scores: [-0.013410075609930556, -0.018898993743965527, -0.09384016225769304, 0.01784407954073544, 0.0126921774250659, -0.05008069433847777, -0.04048825138050534]
New Year's Day_WD_plus_1_WD:
Dates: ['2008-01-02', '2009-01-02', '2013-01-02', '2014-01-02', '2015-01-02']
Scores: [0.03379344256327239, 0.04128768523374012, 0.03341019990536573, 0.04014992076091155, 0.018290402348765396]
New Year's Day_WD_plus_2_WD:
Dates: ['2008-01-03', '2013-01-03', '2014-01-03']
Scores: [0.022465804391486117, 0.041658084709366515, 0.036190130685515846]
New Year's Day_WD_plus_3_WD:
Dates: ['2008-01-04', '2010-01-04', '2013-01-04', '2016-01-04']
Scores: [0.0016842037232343964, -0.07969767642916929, 0.06184876691831063, 0.10721000988887806]
New Year's Day_WD_plus_4_WD:
Dates: ['2009-01-05', '2010-01-05', '2015-01-05', '2016-01-05']
Scores: [0.08224217380856882, -0.04863030725885085, -0.09041580119636108, 0.04671426053581243]
New Year's Day_WD_minus_3_WE:
Dates: ['2007-12-29', '2012-12-29', '2013-12-29']
Scores: [-0.019154513001846295, -0.09385493853788847, -0.05677864059046168]
New Year's Day_WD_minus_1_WD:
Dates: ['2007-12-31', '2008-12-31', '2009-12-31', '2012-12-31', '2013-12-31', '2014-12-31', '2015-12-31']
Scores: [0.06200125459401423, 0.024373149357152256, -0.016318770618867922, 0.07678515136997574, 0.026146406722707457, -0.04962700628925161, 0.007662500135949873]
New Year's Day_WD_plus_4_WE:
Dates: ['2008-01-05', '2013-01-05', '2014-01-05']
Scores: [-0.005163677535586579, -0.01878048365306861, -0.12751479281551448]
New Year's Day_WD_minus_2_WD:
Dates: ['2008-12-30', '2009-12-30', '2013-12-30', '2014-12-30', '2015-12-30']
Scores: [0.022987608584582927, 0.04274070828630996, 0.09834689253916061, -0.0309835669755046, 0.03853585871691109]
New Year's Day_WD_plus_2_WE:
Dates: ['2009-01-03', '2010-01-03', '2015-01-03', '2016-01-03']
Scores: [0.14825560720626918, -0.06679836521274966, 0.06583931841025273, 0.020130565902325002]
New Year's Day_WD_plus_3_WE:
Dates: ['2009-01-04', '2014-01-04', '2015-01-04']
Scores: [0.177458256093718, 0.0016879988312680538, -0.024176589724767016]
New Year's Day_WD_plus_1_WE:
Dates: ['2010-01-02', '2016-01-02']
Scores: [-0.029311229351951452, 0.046339196386403915]
Average impact:
{"New Year's Day_WD_minus_3_WD": 0.012057678753803813, "New Year's Day_WD_minus_2_WE": 0.0293969939695058, "New Year's Day_WD_minus_1_WE": nan, "New Year's Day_WD": -0.02659741719496727, "New Year's Day_WD_plus_1_WD": 0.033386330162411035, "New Year's Day_WD_plus_2_WD": 0.033438006595456156, "New Year's Day_WD_plus_3_WD": 0.02276132602531345, "New Year's Day_WD_plus_4_WD": -0.0025224185277076695, "New Year's Day_WD_minus_3_WE": -0.05659603071006548, "New Year's Day_WD_minus_1_WD": 0.01871752646738286, "New Year's Day_WD_plus_4_WE": -0.05048631800138989, "New Year's Day_WD_minus_2_WD": 0.034325500230291996, "New Year's Day_WD_plus_2_WE": 0.04185678157652432, "New Year's Day_WD_plus_3_WE": 0.05165655506673967, "New Year's Day_WD_plus_1_WE": 0.008513983517226232}
`holiday_group_2` contains events matching the provided pattern.
This group includes 9 distinct events.
date event_name original_name avg_score
0 2006-09-03 holiday_group_2 Labor Day_WD_minus_1_WE -0.060883
1 2007-09-02 holiday_group_2 Labor Day_WD_minus_1_WE -0.060883
2 2008-08-31 holiday_group_2 Labor Day_WD_minus_1_WE -0.060883
3 2009-09-06 holiday_group_2 Labor Day_WD_minus_1_WE -0.060883
4 2010-09-05 holiday_group_2 Labor Day_WD_minus_1_WE -0.060883
5 2011-09-04 holiday_group_2 Labor Day_WD_minus_1_WE -0.060883
6 2012-09-02 holiday_group_2 Labor Day_WD_minus_1_WE -0.060883
7 2013-09-01 holiday_group_2 Labor Day_WD_minus_1_WE -0.060883
8 2014-08-31 holiday_group_2 Labor Day_WD_minus_1_WE -0.060883
9 2015-09-06 holiday_group_2 Labor Day_WD_minus_1_WE -0.060883
10 2016-09-04 holiday_group_2 Labor Day_WD_minus_1_WE -0.060883
11 2017-09-03 holiday_group_2 Labor Day_WD_minus_1_WE -0.060883
12 2007-12-29 holiday_group_2 New Year's Day_WD_minus_3_WE -0.056596
13 2012-12-29 holiday_group_2 New Year's Day_WD_minus_3_WE -0.056596
14 2013-12-29 holiday_group_2 New Year's Day_WD_minus_3_WE -0.056596
15 2008-12-28 holiday_group_2 Christmas Day_WD_plus_3_WE -0.056330
16 2013-12-28 holiday_group_2 Christmas Day_WD_plus_3_WE -0.056330
17 2014-12-28 holiday_group_2 Christmas Day_WD_plus_3_WE -0.056330
18 2006-12-23 holiday_group_2 Christmas Day_WD_minus_2_WE -0.056145
19 2007-12-23 holiday_group_2 Christmas Day_WD_minus_2_WE -0.056145
20 2012-12-23 holiday_group_2 Christmas Day_WD_minus_2_WE -0.056145
21 2017-12-23 holiday_group_2 Christmas Day_WD_minus_2_WE -0.056145
22 2009-07-03 holiday_group_2 Independence Day (Observed)_WD -0.055264
23 2009-07-03 holiday_group_2 Independence Day_WE_minus_1_WD -0.055264
24 2010-07-05 holiday_group_2 Independence Day (Observed)_WD -0.055264
25 2015-07-03 holiday_group_2 Independence Day (Observed)_WD -0.055264
26 2015-07-03 holiday_group_2 Independence Day_WE_minus_1_WD -0.055264
27 2006-12-25 holiday_group_2 Christmas Day_WD -0.054741
28 2007-12-25 holiday_group_2 Christmas Day_WD -0.054741
29 2008-12-25 holiday_group_2 Christmas Day_WD -0.054741
30 2009-12-25 holiday_group_2 Christmas Day_WD -0.054741
31 2012-12-25 holiday_group_2 Christmas Day_WD -0.054741
32 2013-12-25 holiday_group_2 Christmas Day_WD -0.054741
33 2014-12-25 holiday_group_2 Christmas Day_WD -0.054741
34 2015-12-25 holiday_group_2 Christmas Day_WD -0.054741
35 2017-12-25 holiday_group_2 Christmas Day_WD -0.054741
36 2009-12-26 holiday_group_2 Christmas Day_WD_plus_1_WE -0.052137
37 2015-12-26 holiday_group_2 Christmas Day_WD_plus_1_WE -0.052137
38 2008-01-05 holiday_group_2 New Year's Day_WD_plus_4_WE -0.050486
39 2013-01-05 holiday_group_2 New Year's Day_WD_plus_4_WE -0.050486
40 2014-01-05 holiday_group_2 New Year's Day_WD_plus_4_WE -0.050486
`holiday_group_4` contains events matching the provided pattern.
This group includes 6 distinct events.
date event_name original_name avg_score
0 2007-01-02 holiday_group_4 New Year's Day_WD_plus_1_WD 0.033386
1 2008-01-02 holiday_group_4 New Year's Day_WD_plus_1_WD 0.033386
2 2009-01-02 holiday_group_4 New Year's Day_WD_plus_1_WD 0.033386
3 2013-01-02 holiday_group_4 New Year's Day_WD_plus_1_WD 0.033386
4 2014-01-02 holiday_group_4 New Year's Day_WD_plus_1_WD 0.033386
5 2015-01-02 holiday_group_4 New Year's Day_WD_plus_1_WD 0.033386
6 2007-01-03 holiday_group_4 New Year's Day_WD_plus_2_WD 0.033438
7 2008-01-03 holiday_group_4 New Year's Day_WD_plus_2_WD 0.033438
8 2013-01-03 holiday_group_4 New Year's Day_WD_plus_2_WD 0.033438
9 2014-01-03 holiday_group_4 New Year's Day_WD_plus_2_WD 0.033438
10 2008-12-30 holiday_group_4 New Year's Day_WD_minus_2_WD 0.034326
11 2009-12-30 holiday_group_4 New Year's Day_WD_minus_2_WD 0.034326
12 2013-12-30 holiday_group_4 New Year's Day_WD_minus_2_WD 0.034326
13 2014-12-30 holiday_group_4 New Year's Day_WD_minus_2_WD 0.034326
14 2015-12-30 holiday_group_4 New Year's Day_WD_minus_2_WD 0.034326
15 2009-01-03 holiday_group_4 New Year's Day_WD_plus_2_WE 0.041857
16 2010-01-03 holiday_group_4 New Year's Day_WD_plus_2_WE 0.041857
17 2015-01-03 holiday_group_4 New Year's Day_WD_plus_2_WE 0.041857
18 2016-01-03 holiday_group_4 New Year's Day_WD_plus_2_WE 0.041857
19 2009-01-04 holiday_group_4 New Year's Day_WD_plus_3_WE 0.051657
20 2014-01-04 holiday_group_4 New Year's Day_WD_plus_3_WE 0.051657
21 2015-01-04 holiday_group_4 New Year's Day_WD_plus_3_WE 0.051657
22 2006-01-16 holiday_group_4 Martin Luther King Jr. Day_WD 0.056942
23 2007-01-15 holiday_group_4 Martin Luther King Jr. Day_WD 0.056942
24 2008-01-21 holiday_group_4 Martin Luther King Jr. Day_WD 0.056942
25 2009-01-19 holiday_group_4 Martin Luther King Jr. Day_WD 0.056942
26 2010-01-18 holiday_group_4 Martin Luther King Jr. Day_WD 0.056942
27 2011-01-17 holiday_group_4 Martin Luther King Jr. Day_WD 0.056942
28 2012-01-16 holiday_group_4 Martin Luther King Jr. Day_WD 0.056942
29 2013-01-21 holiday_group_4 Martin Luther King Jr. Day_WD 0.056942
30 2014-01-20 holiday_group_4 Martin Luther King Jr. Day_WD 0.056942
31 2015-01-19 holiday_group_4 Martin Luther King Jr. Day_WD 0.056942
32 2016-01-18 holiday_group_4 Martin Luther King Jr. Day_WD 0.056942
33 2017-01-16 holiday_group_4 Martin Luther King Jr. Day_WD 0.056942
Check the diagnostics plot for K-means clustering.
422 plotly.io.show(result_dict["kmeans_plot"])
Now let’s try clustering using KDE and check the results.
427 hg.group_holidays(
428 baseline_offsets=[-7, 7],
429 min_n_days=1,
430 min_abs_avg_score=0.03,
431 bandwidth_multiplier=0.5,
432 clustering_method="kde"
433 )
434 result_dict = hg.result_dict
435 daily_event_df_dict = result_dict["daily_event_df_dict"]
436
437 plotly.io.show(result_dict["kde_plot"])
438 # Checks the number of events in each group.
439 for event_group, event_df in daily_event_df_dict.items():
440 print(f"{event_group}: contains {event_df.shape[0]} days.")
Out:
holiday_group_0: contains 7 days.
holiday_group_1: contains 5 days.
holiday_group_2: contains 19 days.
holiday_group_3: contains 48 days.
holiday_group_4: contains 64 days.
holiday_group_5: contains 6 days.
holiday_group_6: contains 35 days.
Total running time of the script: ( 0 minutes 11.330 seconds)