Changepoint Detection

You can detect trend and seasonality changepoints with just a few lines of code.

Provide your timeseries as a pandas dataframe with timestamp and value.

For example, to work with daily sessions data, your dataframe could look like this:

import pandas as pd
df = pd.DataFrame({
    "datepartition": ["2020-01-08-00", "2020-01-09-00", "2020-01-10-00"],
    "macrosessions": [10231.0, 12309.0, 12104.0]
})

The time column can be any format recognized by pd.to_datetime.

In this example, we’ll load a dataset representing log(daily page views) on the Wikipedia page for Peyton Manning. It contains values from 2007-12-10 to 2016-01-20. More dataset info here.

27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
 import warnings

 warnings.filterwarnings("ignore")

 import pandas as pd
 import plotly

 from greykite.algo.changepoint.adalasso.changepoint_detector import ChangepointDetector
 from greykite.framework.benchmark.data_loader_ts import DataLoaderTS
 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
 from greykite.framework.templates.forecaster import Forecaster
 from greykite.framework.templates.model_templates import ModelTemplateEnum

 # Loads dataset into UnivariateTimeSeries
 dl = DataLoaderTS()
 ts = dl.load_peyton_manning_ts()
 df = ts.df  # cleaned pandas.DataFrame

Detect trend change points

Let’s plot the original timeseries. There are actually trend changes within this data set. The UnivariateTimeSeries class is used to store a timeseries and to provide basic description and plotting functions. The load_peyton_manning function automatically returns a UnivariateTimeSeries instance, however, for any df, you can always initialize a UnivariateTimeSeries instance and do further explorations. (The interactive plot is generated by plotly: click to zoom!)

56
57
 fig = ts.plot()
 plotly.io.show(fig)

ChangepointDetector utilizes pre-filters, regularization with regression based models, and post-filters to find time points where trend changes.

To create a simple trend changepoint detection model, we first initialize the ChangepointDetector class, then run its attribute function find_trend_changepoints.

67
68
69
70
71
72
 model = ChangepointDetector()
 res = model.find_trend_changepoints(
     df=df,            # data df
     time_col="ts",    # time column name
     value_col="y")    # value column name
 pd.DataFrame({"trend_changepoints": res["trend_changepoints"]})  # prints a dataframe showing the result
trend_changepoints
0 2008-02-06
1 2008-07-06
2 2008-09-20
3 2008-12-18
4 2009-02-13
5 2009-06-08
6 2009-09-03
7 2009-12-07
8 2010-02-04
9 2010-07-02
10 2010-10-30
11 2011-01-24
12 2011-04-21
13 2011-07-16
14 2011-10-11
15 2011-12-09
16 2012-02-06
17 2013-02-15
18 2013-08-08
19 2014-01-28
20 2014-03-27
21 2014-12-12
22 2015-06-03


The code above runs trend changepoint detection with the default parameters. We may visualize the detection results by plotting it with the attribute function plot.

79
80
 fig = model.plot(plot=False)  # plot = False returns a plotly figure object.
 plotly.io.show(fig)

There might be too many changepoints with the default parameters. We could customize the parameters to meet individual requirements.

To understand the parameters, we introduce a little bit of the background knowledge. The algorithm first does a mean aggregation to eliminate small fluctuations/seasonality effects (resample_freq). This avoids the trend picking up small fluctuations/seasonality effects.

Then a great number of potential changepoints are placed uniformly over the whole time span (specified by time between changepoints potential_changepoint_distance or number of potential changepoints potential_changepoint_n , the former overrides the latter).

The adaptive lasso (more info at adalasso) is used to shrink insignificant changepoints’ coefficients to zero. The initial estimator for adaptive lasso could be one of “ols”, “ridge” and “lasso” (adaptive_lasso_initial_estimator). The regularization strength of adaptive lasso is also controllable by users (regularization_strength, between 0.0 and 1.0, greater values imply fewer changepoints. None triggers cross-validation to select the best tuning parameter based on prediction performance).

Yearly seasonality effect is too long to be eliminated by aggregation, so fitting it with trend is recommended (yearly_seasonality_order). This allows changepoints to distinguish trend from yearly seasonality.

Putting changepoints too close to the end of data is not recommended, because we may not have enough data to fit the final trend, especially in forecasting tasks. Therefore, one could specify how far from the end changepoints are not allowed (specified by the time from the end of data no_changepoint_distance_from_end or proportion of data from the end no_changepoint_proportion_from_end, the former overrides the latter).

Finally, a post-filter is applied to eliminate changepoints that are too close (actual_changepoint_min_distance).

The following parameter combination uses longer aggregation with less potential changepoints placed and higher yearly seasonality order. Changepoints are not allowed in the last 20% of the data

124
125
126
127
128
129
130
131
132
133
134
 model = ChangepointDetector()  # it's also okay to omit this and re-use the old instance
 res = model.find_trend_changepoints(
     df=df,                                      # data df
     time_col="ts",                              # time column name
     value_col="y",                              # value column name
     yearly_seasonality_order=15,                # yearly seasonality order, fit along with trend
     regularization_strength=0.5,                # between 0.0 and 1.0, greater values imply fewer changepoints, and 1.0 implies no changepoints
     resample_freq="7D",                         # data aggregation frequency, eliminate small fluctuation/seasonality
     potential_changepoint_n=25,                 # the number of potential changepoints
     no_changepoint_proportion_from_end=0.2)     # the proportion of data from end where changepoints are not allowed
 pd.DataFrame({"trend_changepoints": res["trend_changepoints"]})
trend_changepoints
0 2008-03-31
1 2008-08-04
2 2008-11-24
3 2009-03-16
4 2009-07-13
5 2009-11-02
6 2010-02-22
7 2010-06-14
8 2010-10-11
9 2011-01-31
10 2011-09-12
11 2012-01-09
12 2012-04-30
13 2013-04-01
14 2013-11-18


We may also plot the detection result.

139
140
 fig = model.plot(plot=False)
 plotly.io.show(fig)

Now the detected trend changepoints look better! Similarly, we could also specify potential_changepoint_distance and no_changepoint_distance_from_end instead of potential_changepoint_n and no_changepoint_proportion_from_end. For example potential_changepoint_distance="60D" and ``no_changepoint_distance_from_end="730D". Remeber these will override potential_changepoint_n and no_changepoint_proportion_from_end.

Moreover, one could also control what components to be plotted. For example

152
153
154
155
156
157
158
159
160
161
162
163
 fig = model.plot(
     observation=True,                       # whether to plot the observations
     observation_original=True,              # whether to plot the unaggregated values
     trend_estimate=True,                    # whether to plot the trend estimation
     trend_change=True,                      # whether to plot detected trend changepoints
     yearly_seasonality_estimate=True,       # whether to plot estimated yearly seasonality
     adaptive_lasso_estimate=True,           # whether to plot the adaptive lasso estimated trend
     seasonality_change=False,               # detected seasonality change points, discussed in next section
     seasonality_change_by_component=True,   # plot seasonality by component (daily, weekly, etc.), discussed in next section
     seasonality_estimate=False,             # plot estimated trend+seasonality, discussed in next section
     plot=False)                             # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
 plotly.io.show(fig)

Detect seasonality change points

By seasonality change points, we mean the time points where the shape of seasonality effects change, i.e., the seasonal shape may become “fatter” or “thinner”. Similar to trend changepoint detection, we also have pre-filtering, regularization with regression based model and post-filtering in seasonality change point detection.

To create a simple seasonality changepoint detection model, we could either use the previous ChangepointDetector object which already has the trend changepoint information, or initialize a new ChangepointDetector object. Then one could run the find_seasonality_changepoints function.

Note that because we first remove trend effect from the timeseries before detecting seasonality changepoints, using the old ChangepointDetector object with trend changepoint detection results on the same df will pass the existing trend information and save time. If a new class object is initialized and one runs find_seasonality_changepoints directly, the model will first run find_trend_changepoints to get trend changepoint information. In this case, it will run with the default trend changepoint detection parameters. However, it is recommended that user runs find_trend_changepoints and check the result before running find_seasonality_changepoints.

Here we use the old object which already contains trend changepoint information.

190
191
192
193
194
195
 res = model.find_seasonality_changepoints(
     df=df,            # data df
     time_col="ts",    # time column name
     value_col="y")    # value column name
 pd.DataFrame(dict([(k, pd.Series(v)) for k, v in res["seasonality_changepoints"].items()]))  # view result
 # one could also print res["seasonality_changepoints"] directly to view the result
weekly yearly
0 NaN 2008-02-06
1 NaN 2013-05-08


We can also plot the detection results, simply set seasonality_change and seasonality_estimate to be True.

201
202
203
204
205
206
 fig = model.plot(
     seasonality_change=True,                # detected seasonality change points, discussed in next section
     seasonality_change_by_component=True,   # plot seasonality by component (daily, weekly, etc.), discussed in next section
     seasonality_estimate=True,              # plot estimated trend+seasonality, discussed in next section
     plot=False)                             # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
 plotly.io.show(fig)

In this example, there is not too much seasonality change, thus we only see one yearly seasonality change point, however, we could also customize parameters to increase the seasonality changepoint detection sensitivity.

The only parameter that differs from trend changepoint detection is seasonality_components_df, which configures the seasonality components. Supplying daily, weekly and yearly seasonality works well for most cases. Users can also include monthly and quarterly seasonality. The full df is:

218
219
220
221
222
 seasonality_components_df = pd.DataFrame({
     "name": ["tod", "tow", "conti_year"],           # component value column name used to create seasonality component
     "period": [24.0, 7.0, 1.0],                     # period for seasonality component
     "order": [3, 3, 5],                             # Fourier series order
     "seas_names": ["daily", "weekly", "yearly"]})   # seasonality component name

However, if the inferred data frequency is at least one day, the daily component will be removed.

Another optional parameter is trend_changepoints that allows users to provide a list of trend changepoints to skip calling find_trend_changepoints.

Now we run find_seasonality_changepoints with a smaller regularization_strength, and restrict changepoints to the first 80% data. As recommended, we use our previous detected trend change points (use the same object after running find_trend_changepoints).

234
235
236
237
238
239
240
241
242
243
244
245
246
247
 res = model.find_seasonality_changepoints(
     df=df,                                          # data df
     time_col="ts",                                  # time column name
     value_col="y",                                  # value column name
     seasonality_components_df=pd.DataFrame({        # seasonality config df
         "name": ["tow", "conti_year"],              # component value column name used to create seasonality component
         "period": [7.0, 1.0],                       # period for seasonality component
         "order": [3, 5],                            # Fourier series order
         "seas_names": ["weekly", "yearly"]}),       # seasonality component name
     regularization_strength=0.4,                    # between 0.0 and 1.0, greater values imply fewer changepoints, and 1.0 implies no changepoints
     no_changepoint_proportion_from_end=0.2,         # no changepoint in the last 20% data
     trend_changepoints=None)                        # optionally specify trend changepoints to avoid calling find_trend_changepoints
 pd.DataFrame(dict([(k, pd.Series(v)) for k, v in res["seasonality_changepoints"].items()]))  # view result
 # one could also print res["seasonality_changepoints"] directly to view the result
weekly yearly
0 2008-02-06 2008-02-06
1 NaT 2011-04-13
2 NaT 2012-03-27
3 NaT 2013-05-08


We can also plot the detection results.

252
253
254
255
256
257
 fig = model.plot(
     seasonality_change=True,                # detected seasonality change points, discussed in next section
     seasonality_change_by_component=True,   # plot seasonality by component (daily, weekly, etc.), discussed in next section
     seasonality_estimate=True,              # plot estimated trend+seasonality, discussed in next section
     plot=False)                             # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
 plotly.io.show(fig)

Create a forecast with changepoints

Both trend changepoint detection and seasonality changepoint detection algorithms have been integrated with SILVERKITE, so one is able to invoke the algorithm by passing corresponding parameters. It will first detect changepoints with the given parameters, then feed the detected changepoints to the forecasting model.

268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
 # specify dataset information
 metadata = dict(
     time_col="ts",  # name of the time column ("datepartition" in example above)
     value_col="y",  # name of the value column ("macrosessions" in example above)
     freq="D"        # "H" for hourly, "D" for daily, "W" for weekly, etc.
     # Any format accepted by ``pd.date_range``
 )
 # specify changepoint parameters in model_components
 model_components = dict(
     changepoints={
         # it's ok to provide one of ``changepoints_dict`` or ``seasonality_changepoints_dict`` by itself
         "changepoints_dict": {
             "method": "auto",
             "yearly_seasonality_order": 15,
             "regularization_strength": 0.5,
             "resample_freq": "7D",
             "potential_changepoint_n": 25,
             "no_changepoint_proportion_from_end": 0.2
         },
         "seasonality_changepoints_dict": {
             "potential_changepoint_distance": "60D",
             "regularization_strength": 0.5,
             "no_changepoint_proportion_from_end": 0.2
         }
     },
     custom={
         "fit_algorithm_dict": {
             "fit_algorithm": "ridge"}})  # use ridge to prevent overfitting when there many changepoints

 # Generates model config
 config = ForecastConfig.from_dict(
     dict(
         model_template=ModelTemplateEnum.SILVERKITE.name,
         forecast_horizon=365,  # forecast 1 year
         coverage=0.95,  # 95% prediction intervals
         metadata_param=metadata,
         model_components_param=model_components))

 # Then run with changepoint parameters
 forecaster = Forecaster()
 result = forecaster.run_forecast_config(
     df=df,
     config=config)

Out:

Fitting 3 folds for each of 1 candidates, totalling 3 fits

Note

The automatic trend changepoint detection algorithm also supports adding additional custom trend changepoints in forecasts. In the changepoints_dict parameter above, you may add the following parameters to include additional trend changepoints besides the detected ones:

  • dates: a list of custom trend changepoint dates, parsable by pandas.to_datetime. For example, [“2020-01-01”, “2020-02-15”].

  • combine_changepoint_min_distance: the minimum distance allowed between a detected changepoint and a custom changepoint, default is None. For example, “5D”. If violated, one of them will be dropped according to the next parameter keep_detected.

  • keep_detected: True or False, default False. Decides whether to keep the detected changepoint or the custom changepoint when they are too close. If set to True, keeps the detected changepoint, otherwise keeps the custom changepoint.

Check results

Details of the results are given in the Simple forecast example. We just show a few specific results here.

The original trend changepoint detection plot is accessible. One could pass the same parameters in a dictionary as they are using the plot function in ChangepointDetector.

337
338
 fig = result.model[-1].plot_trend_changepoint_detection(dict(plot=False))  # -1 gets the estimator from the pipeline
 plotly.io.show(fig)

Let’s plot the historical forecast on the holdout test set.

342
343
344
 backtest = result.backtest
 fig = backtest.plot()
 plotly.io.show(fig)

Let’s plot the forecast (trained on all data):

348
349
350
 forecast = result.forecast
 fig = forecast.plot()
 plotly.io.show(fig)

Check out the component plot, trend changepoints are marked in the trend component plot.

355
356
 fig = backtest.plot_components()
 plotly.io.show(fig)  # fig.show() if you are using "PROPHET" template

Total running time of the script: ( 3 minutes 4.540 seconds)

Gallery generated by Sphinx-Gallery