Note
Click here to download the full example code
Changepoint Detection¶
You can detect trend and seasonality changepoints with just a few lines of code.
Provide your timeseries as a pandas dataframe with timestamp and value.
For example, to work with daily sessions data, your dataframe could look like this:
import pandas as pd
df = pd.DataFrame({
"datepartition": ["2020-01-08-00", "2020-01-09-00", "2020-01-10-00"],
"macrosessions": [10231.0, 12309.0, 12104.0]
})
The time column can be any format recognized by pd.to_datetime
.
In this example, we’ll load a dataset representing log(daily page views)
on the Wikipedia page for Peyton Manning.
It contains values from 2007-12-10 to 2016-01-20. More dataset info
here.
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import plotly
from greykite.algo.changepoint.adalasso.changepoint_detector import ChangepointDetector
from greykite.framework.benchmark.data_loader_ts import DataLoaderTS
from greykite.framework.templates.autogen.forecast_config import ForecastConfig
from greykite.framework.templates.forecaster import Forecaster
from greykite.framework.templates.model_templates import ModelTemplateEnum
# Loads dataset into UnivariateTimeSeries
dl = DataLoaderTS()
ts = dl.load_peyton_manning_ts()
df = ts.df # cleaned pandas.DataFrame
|
Detect trend change points¶
Let’s plot the original timeseries.
There are actually trend changes within this data set.
The UnivariateTimeSeries
class is used to store a timeseries and to provide basic description and plotting functions.
The load_peyton_manning
function automatically returns a UnivariateTimeSeries
instance,
however, for any df
, you can always initialize a UnivariateTimeSeries
instance and
do further explorations.
(The interactive plot is generated by plotly
: click to zoom!)
56 57 | fig = ts.plot()
plotly.io.show(fig)
|
ChangepointDetector
utilizes pre-filters, regularization with regression based models, and
post-filters to find time points where trend changes.
To create a simple trend changepoint detection model, we first initialize the
ChangepointDetector
class,
then run its attribute function find_trend_changepoints
.
67 68 69 70 71 72 | model = ChangepointDetector()
res = model.find_trend_changepoints(
df=df, # data df
time_col="ts", # time column name
value_col="y") # value column name
pd.DataFrame({"trend_changepoints": res["trend_changepoints"]}) # prints a dataframe showing the result
|
trend_changepoints | |
---|---|
0 | 2008-02-06 |
1 | 2008-07-06 |
2 | 2008-09-20 |
3 | 2008-12-18 |
4 | 2009-02-13 |
5 | 2009-06-08 |
6 | 2009-09-03 |
7 | 2009-12-07 |
8 | 2010-02-04 |
9 | 2010-07-02 |
10 | 2010-10-30 |
11 | 2011-01-24 |
12 | 2011-04-21 |
13 | 2011-07-16 |
14 | 2011-10-11 |
15 | 2011-12-09 |
16 | 2012-02-06 |
17 | 2013-02-15 |
18 | 2013-08-08 |
19 | 2014-01-28 |
20 | 2014-03-27 |
21 | 2014-12-12 |
22 | 2015-06-03 |
The code above runs trend changepoint detection with the default parameters.
We may visualize the detection results by plotting it with the attribute
function plot
.
79 80 | fig = model.plot(plot=False) # plot = False returns a plotly figure object.
plotly.io.show(fig)
|
There might be too many changepoints with the default parameters. We could customize the parameters to meet individual requirements.
To understand the parameters, we introduce a little bit of the background
knowledge. The algorithm first does a mean aggregation to eliminate small
fluctuations/seasonality effects (resample_freq
). This avoids the trend
picking up small fluctuations/seasonality effects.
Then a great number of potential changepoints are placed uniformly over the
whole time span (specified by time between changepoints potential_changepoint_distance
or number of potential changepoints potential_changepoint_n
, the former overrides the latter).
The adaptive lasso (more info
at adalasso)
is used to shrink insignificant changepoints’ coefficients to zero.
The initial estimator for adaptive lasso could be one of “ols”, “ridge”
and “lasso” (adaptive_lasso_initial_estimator
). The regularization
strength of adaptive lasso is also controllable by users
(regularization_strength
, between 0.0 and 1.0, greater values imply
fewer changepoints. None
triggers cross-validation to select the best
tuning parameter based on prediction performance).
Yearly seasonality effect is too long to be eliminated by aggregation, so
fitting it with trend is recommended (yearly_seasonality_order
).
This allows changepoints to distinguish trend from yearly seasonality.
Putting changepoints too close to the end of data is not recommended,
because we may not have enough data to fit the final trend,
especially in forecasting tasks. Therefore, one could specify how far
from the end changepoints are not allowed (specified by the time from the end
of data no_changepoint_distance_from_end
or proportion of data from the end
no_changepoint_proportion_from_end
, the former overrides the latter).
Finally, a post-filter is applied to eliminate changepoints that are too close
(actual_changepoint_min_distance
).
The following parameter combination uses longer aggregation with less potential changepoints placed and higher yearly seasonality order. Changepoints are not allowed in the last 20% of the data
124 125 126 127 128 129 130 131 132 133 134 | model = ChangepointDetector() # it's also okay to omit this and re-use the old instance
res = model.find_trend_changepoints(
df=df, # data df
time_col="ts", # time column name
value_col="y", # value column name
yearly_seasonality_order=15, # yearly seasonality order, fit along with trend
regularization_strength=0.5, # between 0.0 and 1.0, greater values imply fewer changepoints, and 1.0 implies no changepoints
resample_freq="7D", # data aggregation frequency, eliminate small fluctuation/seasonality
potential_changepoint_n=25, # the number of potential changepoints
no_changepoint_proportion_from_end=0.2) # the proportion of data from end where changepoints are not allowed
pd.DataFrame({"trend_changepoints": res["trend_changepoints"]})
|
trend_changepoints | |
---|---|
0 | 2008-03-31 |
1 | 2008-08-04 |
2 | 2008-11-24 |
3 | 2009-03-16 |
4 | 2009-07-13 |
5 | 2009-11-02 |
6 | 2010-02-22 |
7 | 2010-06-14 |
8 | 2010-10-11 |
9 | 2011-01-31 |
10 | 2011-09-12 |
11 | 2012-01-09 |
12 | 2012-04-30 |
13 | 2013-04-01 |
14 | 2013-11-18 |
We may also plot the detection result.
139 140 | fig = model.plot(plot=False)
plotly.io.show(fig)
|
Now the detected trend changepoints look better! Similarly, we could also
specify potential_changepoint_distance
and no_changepoint_distance_from_end
instead of potential_changepoint_n
and no_changepoint_proportion_from_end
.
For example potential_changepoint_distance="60D" and
``no_changepoint_distance_from_end="730D"
. Remeber these will override
potential_changepoint_n
and no_changepoint_proportion_from_end
.
Moreover, one could also control what components to be plotted. For example
152 153 154 155 156 157 158 159 160 161 162 163 | fig = model.plot(
observation=True, # whether to plot the observations
observation_original=True, # whether to plot the unaggregated values
trend_estimate=True, # whether to plot the trend estimation
trend_change=True, # whether to plot detected trend changepoints
yearly_seasonality_estimate=True, # whether to plot estimated yearly seasonality
adaptive_lasso_estimate=True, # whether to plot the adaptive lasso estimated trend
seasonality_change=False, # detected seasonality change points, discussed in next section
seasonality_change_by_component=True, # plot seasonality by component (daily, weekly, etc.), discussed in next section
seasonality_estimate=False, # plot estimated trend+seasonality, discussed in next section
plot=False) # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
plotly.io.show(fig)
|
Detect seasonality change points¶
By seasonality change points, we mean the time points where the shape of seasonality effects change, i.e., the seasonal shape may become “fatter” or “thinner”. Similar to trend changepoint detection, we also have pre-filtering, regularization with regression based model and post-filtering in seasonality change point detection.
To create a simple seasonality changepoint detection model, we could either use
the previous ChangepointDetector
object which already has the trend changepoint
information, or initialize a new ChangepointDetector
object. Then one could run
the find_seasonality_changepoints
function.
Note that because we first remove trend effect from the timeseries before detecting
seasonality changepoints, using the old ChangepointDetector
object with trend changepoint
detection results on the same df will pass the existing trend information and save time.
If a new class object is initialized and one runs find_seasonality_changepoints
directly,
the model will first run find_trend_changepoints
to get trend changepoint information.
In this case, it will run with the default trend changepoint detection parameters.
However, it is recommended that user runs find_trend_changepoints
and check the result
before running find_seasonality_changepoints
.
Here we use the old object which already contains trend changepoint information.
190 191 192 193 194 195 | res = model.find_seasonality_changepoints(
df=df, # data df
time_col="ts", # time column name
value_col="y") # value column name
pd.DataFrame(dict([(k, pd.Series(v)) for k, v in res["seasonality_changepoints"].items()])) # view result
# one could also print res["seasonality_changepoints"] directly to view the result
|
weekly | yearly | |
---|---|---|
0 | NaN | 2008-02-06 |
1 | NaN | 2013-05-08 |
We can also plot the detection results, simply set seasonality_change
and
seasonality_estimate
to be True.
201 202 203 204 205 206 | fig = model.plot(
seasonality_change=True, # detected seasonality change points, discussed in next section
seasonality_change_by_component=True, # plot seasonality by component (daily, weekly, etc.), discussed in next section
seasonality_estimate=True, # plot estimated trend+seasonality, discussed in next section
plot=False) # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
plotly.io.show(fig)
|
In this example, there is not too much seasonality change, thus we only see one yearly seasonality change point, however, we could also customize parameters to increase the seasonality changepoint detection sensitivity.
The only parameter that differs from trend changepoint detection is seasonality_components_df
,
which configures the seasonality components. Supplying daily, weekly and yearly seasonality
works well for most cases. Users can also include monthly and quarterly seasonality.
The full df is:
218 219 220 221 222 | seasonality_components_df = pd.DataFrame({
"name": ["tod", "tow", "conti_year"], # component value column name used to create seasonality component
"period": [24.0, 7.0, 1.0], # period for seasonality component
"order": [3, 3, 5], # Fourier series order
"seas_names": ["daily", "weekly", "yearly"]}) # seasonality component name
|
However, if the inferred data frequency is at least one day, the daily component will be removed.
Another optional parameter is trend_changepoints
that allows users to provide
a list of trend changepoints to skip calling find_trend_changepoints
.
Now we run find_seasonality_changepoints
with a smaller regularization_strength
,
and restrict changepoints to the first 80% data. As recommended, we use our previous
detected trend change points (use the same object after running find_trend_changepoints
).
234 235 236 237 238 239 240 241 242 243 244 245 246 247 | res = model.find_seasonality_changepoints(
df=df, # data df
time_col="ts", # time column name
value_col="y", # value column name
seasonality_components_df=pd.DataFrame({ # seasonality config df
"name": ["tow", "conti_year"], # component value column name used to create seasonality component
"period": [7.0, 1.0], # period for seasonality component
"order": [3, 5], # Fourier series order
"seas_names": ["weekly", "yearly"]}), # seasonality component name
regularization_strength=0.4, # between 0.0 and 1.0, greater values imply fewer changepoints, and 1.0 implies no changepoints
no_changepoint_proportion_from_end=0.2, # no changepoint in the last 20% data
trend_changepoints=None) # optionally specify trend changepoints to avoid calling find_trend_changepoints
pd.DataFrame(dict([(k, pd.Series(v)) for k, v in res["seasonality_changepoints"].items()])) # view result
# one could also print res["seasonality_changepoints"] directly to view the result
|
weekly | yearly | |
---|---|---|
0 | 2008-02-06 | 2008-02-06 |
1 | NaT | 2011-04-13 |
2 | NaT | 2012-03-27 |
3 | NaT | 2013-05-08 |
We can also plot the detection results.
252 253 254 255 256 257 | fig = model.plot(
seasonality_change=True, # detected seasonality change points, discussed in next section
seasonality_change_by_component=True, # plot seasonality by component (daily, weekly, etc.), discussed in next section
seasonality_estimate=True, # plot estimated trend+seasonality, discussed in next section
plot=False) # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
plotly.io.show(fig)
|
Create a forecast with changepoints¶
Both trend changepoint detection and seasonality changepoint detection algorithms
have been integrated with SILVERKITE
, so one is able to invoke the algorithm by
passing corresponding parameters.
It will first detect changepoints with the given parameters,
then feed the detected changepoints to the forecasting model.
268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 | # specify dataset information
metadata = dict(
time_col="ts", # name of the time column ("datepartition" in example above)
value_col="y", # name of the value column ("macrosessions" in example above)
freq="D" # "H" for hourly, "D" for daily, "W" for weekly, etc.
# Any format accepted by ``pd.date_range``
)
# specify changepoint parameters in model_components
model_components = dict(
changepoints={
# it's ok to provide one of ``changepoints_dict`` or ``seasonality_changepoints_dict`` by itself
"changepoints_dict": {
"method": "auto",
"yearly_seasonality_order": 15,
"regularization_strength": 0.5,
"resample_freq": "7D",
"potential_changepoint_n": 25,
"no_changepoint_proportion_from_end": 0.2
},
"seasonality_changepoints_dict": {
"potential_changepoint_distance": "60D",
"regularization_strength": 0.5,
"no_changepoint_proportion_from_end": 0.2
}
},
custom={
"fit_algorithm_dict": {
"fit_algorithm": "ridge"}}) # use ridge to prevent overfitting when there many changepoints
# Generates model config
config = ForecastConfig.from_dict(
dict(
model_template=ModelTemplateEnum.SILVERKITE.name,
forecast_horizon=365, # forecast 1 year
coverage=0.95, # 95% prediction intervals
metadata_param=metadata,
model_components_param=model_components))
# Then run with changepoint parameters
forecaster = Forecaster()
result = forecaster.run_forecast_config(
df=df,
config=config)
|
Out:
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Note
The automatic trend changepoint detection algorithm also supports adding additional custom trend
changepoints in forecasts. In the changepoints_dict
parameter above, you may add the following
parameters to include additional trend changepoints besides the detected ones:
dates
: a list of custom trend changepoint dates, parsable bypandas.to_datetime
. For example, [“2020-01-01”, “2020-02-15”].
combine_changepoint_min_distance
: the minimum distance allowed between a detected changepoint and a custom changepoint, default is None. For example, “5D”. If violated, one of them will be dropped according to the next parameterkeep_detected
.
keep_detected
: True or False, default False. Decides whether to keep the detected changepoint or the custom changepoint when they are too close. If set to True, keeps the detected changepoint, otherwise keeps the custom changepoint.
Check results¶
Details of the results are given in the Simple forecast example. We just show a few specific results here.
The original trend changepoint detection plot is accessible.
One could pass the same parameters in a dictionary as they are using
the plot
function in ChangepointDetector
.
337 338 | fig = result.model[-1].plot_trend_changepoint_detection(dict(plot=False)) # -1 gets the estimator from the pipeline
plotly.io.show(fig)
|
Let’s plot the historical forecast on the holdout test set.
342 343 344 | backtest = result.backtest
fig = backtest.plot()
plotly.io.show(fig)
|
Let’s plot the forecast (trained on all data):
348 349 350 | forecast = result.forecast
fig = forecast.plot()
plotly.io.show(fig)
|
Check out the component plot, trend changepoints are marked in the trend component plot.
355 356 | fig = backtest.plot_components()
plotly.io.show(fig) # fig.show() if you are using "PROPHET" template
|
Total running time of the script: ( 2 minutes 21.257 seconds)