Changepoint Detection

You can detect trend and seasonality changepoints with just a few lines of code.

Provide your timeseries as a pandas dataframe with timestamp and value.

For example, to work with daily sessions data, your dataframe could look like this:

import pandas as pd
df = pd.DataFrame({
    "datepartition": ["2020-01-08-00", "2020-01-09-00", "2020-01-10-00"],
    "macrosessions": [10231.0, 12309.0, 12104.0]
})

The time column can be any format recognized by pd.to_datetime.

In this example, we’ll load a dataset representing log(daily page views) on the Wikipedia page for Peyton Manning. It contains values from 2007-12-10 to 2016-01-20. More dataset info here.

27 import warnings
28
29 warnings.filterwarnings("ignore")
30
31 import pandas as pd
32 import plotly
33
34 from greykite.algo.changepoint.adalasso.changepoint_detector import ChangepointDetector
35 from greykite.framework.benchmark.data_loader_ts import DataLoaderTS
36 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
37 from greykite.framework.templates.forecaster import Forecaster
38 from greykite.framework.templates.model_templates import ModelTemplateEnum
39
40 # Loads dataset into UnivariateTimeSeries
41 dl = DataLoaderTS()
42 ts = dl.load_peyton_manning_ts()
43 df = ts.df  # cleaned pandas.DataFrame

Detect trend change points

Let’s plot the original timeseries. There are actually trend changes within this data set. The UnivariateTimeSeries class is used to store a timeseries and to provide basic description and plotting functions. The load_peyton_manning function automatically returns a UnivariateTimeSeries instance, however, for any df, you can always initialize a UnivariateTimeSeries instance and do further explorations. (The interactive plot is generated by plotly: click to zoom!)

56 fig = ts.plot()
57 plotly.io.show(fig)

ChangepointDetector utilizes pre-filters, regularization with regression based models, and post-filters to find time points where trend changes.

To create a simple trend changepoint detection model, we first initialize the ChangepointDetector class, then run its attribute function find_trend_changepoints.

67 model = ChangepointDetector()
68 res = model.find_trend_changepoints(
69     df=df,            # data df
70     time_col="ts",    # time column name
71     value_col="y")    # value column name
72 pd.DataFrame({"trend_changepoints": res["trend_changepoints"]})  # prints a dataframe showing the result
trend_changepoints
0 2008-02-06
1 2008-07-06
2 2008-09-20
3 2008-12-18
4 2009-02-13
5 2009-06-08
6 2009-09-03
7 2009-12-07
8 2010-02-04
9 2010-07-02
10 2010-10-30
11 2011-01-24
12 2011-04-21
13 2011-07-16
14 2011-10-11
15 2011-12-09
16 2012-02-06
17 2013-02-15
18 2013-08-08
19 2014-01-28
20 2014-03-27
21 2014-12-12
22 2015-06-03


The code above runs trend changepoint detection with the default parameters. We may visualize the detection results by plotting it with the attribute function plot.

79 fig = model.plot(plot=False)  # plot = False returns a plotly figure object.
80 plotly.io.show(fig)

There might be too many changepoints with the default parameters. We could customize the parameters to meet individual requirements.

To understand the parameters, we introduce a little bit of the background knowledge. The algorithm first does a mean aggregation to eliminate small fluctuations/seasonality effects (resample_freq). This avoids the trend picking up small fluctuations/seasonality effects.

Then a great number of potential changepoints are placed uniformly over the whole time span (specified by time between changepoints potential_changepoint_distance or number of potential changepoints potential_changepoint_n , the former overrides the latter).

The adaptive lasso (more info at adalasso) is used to shrink insignificant changepoints’ coefficients to zero. The initial estimator for adaptive lasso could be one of “ols”, “ridge” and “lasso” (adaptive_lasso_initial_estimator). The regularization strength of adaptive lasso is also controllable by users (regularization_strength, between 0.0 and 1.0, greater values imply fewer changepoints. None triggers cross-validation to select the best tuning parameter based on prediction performance).

Yearly seasonality effect is too long to be eliminated by aggregation, so fitting it with trend is recommended (yearly_seasonality_order). This allows changepoints to distinguish trend from yearly seasonality.

Putting changepoints too close to the end of data is not recommended, because we may not have enough data to fit the final trend, especially in forecasting tasks. Therefore, one could specify how far from the end changepoints are not allowed (specified by the time from the end of data no_changepoint_distance_from_end or proportion of data from the end no_changepoint_proportion_from_end, the former overrides the latter).

Finally, a post-filter is applied to eliminate changepoints that are too close (actual_changepoint_min_distance).

The following parameter combination uses longer aggregation with less potential changepoints placed and higher yearly seasonality order. Changepoints are not allowed in the last 20% of the data

124 model = ChangepointDetector()  # it's also okay to omit this and re-use the old instance
125 res = model.find_trend_changepoints(
126     df=df,                                      # data df
127     time_col="ts",                              # time column name
128     value_col="y",                              # value column name
129     yearly_seasonality_order=15,                # yearly seasonality order, fit along with trend
130     regularization_strength=0.5,                # between 0.0 and 1.0, greater values imply fewer changepoints, and 1.0 implies no changepoints
131     resample_freq="7D",                         # data aggregation frequency, eliminate small fluctuation/seasonality
132     potential_changepoint_n=25,                 # the number of potential changepoints
133     no_changepoint_proportion_from_end=0.2)     # the proportion of data from end where changepoints are not allowed
134 pd.DataFrame({"trend_changepoints": res["trend_changepoints"]})
trend_changepoints
0 2008-03-31
1 2008-08-04
2 2008-11-24
3 2009-03-16
4 2009-07-13
5 2009-11-02
6 2010-02-22
7 2010-06-14
8 2010-10-11
9 2011-01-31
10 2011-09-12
11 2012-01-09
12 2012-04-30
13 2013-04-01
14 2013-11-18


We may also plot the detection result.

139 fig = model.plot(plot=False)
140 plotly.io.show(fig)

Now the detected trend changepoints look better! Similarly, we could also specify potential_changepoint_distance and no_changepoint_distance_from_end instead of potential_changepoint_n and no_changepoint_proportion_from_end. For example potential_changepoint_distance="60D" and ``no_changepoint_distance_from_end="730D". Remeber these will override potential_changepoint_n and no_changepoint_proportion_from_end.

Moreover, one could also control what components to be plotted. For example

152 fig = model.plot(
153     observation=True,                       # whether to plot the observations
154     observation_original=True,              # whether to plot the unaggregated values
155     trend_estimate=True,                    # whether to plot the trend estimation
156     trend_change=True,                      # whether to plot detected trend changepoints
157     yearly_seasonality_estimate=True,       # whether to plot estimated yearly seasonality
158     adaptive_lasso_estimate=True,           # whether to plot the adaptive lasso estimated trend
159     seasonality_change=False,               # detected seasonality change points, discussed in next section
160     seasonality_change_by_component=True,   # plot seasonality by component (daily, weekly, etc.), discussed in next section
161     seasonality_estimate=False,             # plot estimated trend+seasonality, discussed in next section
162     plot=False)                             # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
163 plotly.io.show(fig)

Detect seasonality change points

By seasonality change points, we mean the time points where the shape of seasonality effects change, i.e., the seasonal shape may become “fatter” or “thinner”. Similar to trend changepoint detection, we also have pre-filtering, regularization with regression based model and post-filtering in seasonality change point detection.

To create a simple seasonality changepoint detection model, we could either use the previous ChangepointDetector object which already has the trend changepoint information, or initialize a new ChangepointDetector object. Then one could run the find_seasonality_changepoints function.

Note that because we first remove trend effect from the timeseries before detecting seasonality changepoints, using the old ChangepointDetector object with trend changepoint detection results on the same df will pass the existing trend information and save time. If a new class object is initialized and one runs find_seasonality_changepoints directly, the model will first run find_trend_changepoints to get trend changepoint information. In this case, it will run with the default trend changepoint detection parameters. However, it is recommended that user runs find_trend_changepoints and check the result before running find_seasonality_changepoints.

Here we use the old object which already contains trend changepoint information.

190 res = model.find_seasonality_changepoints(
191     df=df,            # data df
192     time_col="ts",    # time column name
193     value_col="y")    # value column name
194 pd.DataFrame(dict([(k, pd.Series(v)) for k, v in res["seasonality_changepoints"].items()]))  # view result
195 # one could also print res["seasonality_changepoints"] directly to view the result
weekly yearly
0 NaN 2008-02-06
1 NaN 2008-04-04
2 NaN 2008-06-01
3 NaN 2009-03-18
4 NaN 2015-06-02


We can also plot the detection results, simply set seasonality_change and seasonality_estimate to be True.

201 fig = model.plot(
202     seasonality_change=True,                # detected seasonality change points, discussed in next section
203     seasonality_change_by_component=True,   # plot seasonality by component (daily, weekly, etc.), discussed in next section
204     seasonality_estimate=True,              # plot estimated trend+seasonality, discussed in next section
205     plot=False)                             # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
206 plotly.io.show(fig)

In this example, there is not too much seasonality change, thus we only see one yearly seasonality change point, however, we could also customize parameters to increase the seasonality changepoint detection sensitivity.

The only parameter that differs from trend changepoint detection is seasonality_components_df, which configures the seasonality components. Supplying daily, weekly and yearly seasonality works well for most cases. Users can also include monthly and quarterly seasonality. The full df is:

218 seasonality_components_df = pd.DataFrame({
219     "name": ["tod", "tow", "conti_year"],           # component value column name used to create seasonality component
220     "period": [24.0, 7.0, 1.0],                     # period for seasonality component
221     "order": [3, 3, 5],                             # Fourier series order
222     "seas_names": ["daily", "weekly", "yearly"]})   # seasonality component name

However, if the inferred data frequency is at least one day, the daily component will be removed.

Another optional parameter is trend_changepoints that allows users to provide a list of trend changepoints to skip calling find_trend_changepoints.

Now we run find_seasonality_changepoints with a smaller regularization_strength, and restrict changepoints to the first 80% data. As recommended, we use our previous detected trend change points (use the same object after running find_trend_changepoints).

234 res = model.find_seasonality_changepoints(
235     df=df,                                          # data df
236     time_col="ts",                                  # time column name
237     value_col="y",                                  # value column name
238     seasonality_components_df=pd.DataFrame({        # seasonality config df
239         "name": ["tow", "conti_year"],              # component value column name used to create seasonality component
240         "period": [7.0, 1.0],                       # period for seasonality component
241         "order": [3, 5],                            # Fourier series order
242         "seas_names": ["weekly", "yearly"]}),       # seasonality component name
243     regularization_strength=0.4,                    # between 0.0 and 1.0, greater values imply fewer changepoints, and 1.0 implies no changepoints
244     no_changepoint_proportion_from_end=0.2,         # no changepoint in the last 20% data
245     trend_changepoints=None)                        # optionally specify trend changepoints to avoid calling find_trend_changepoints
246 pd.DataFrame(dict([(k, pd.Series(v)) for k, v in res["seasonality_changepoints"].items()]))  # view result
247 # one could also print res["seasonality_changepoints"] directly to view the result
weekly yearly
0 2008-02-06 2008-02-06
1 NaT 2008-04-04
2 NaT 2008-06-01
3 NaT 2009-03-18
4 NaT 2010-01-03
5 NaT 2013-01-11


We can also plot the detection results.

252 fig = model.plot(
253     seasonality_change=True,                # detected seasonality change points, discussed in next section
254     seasonality_change_by_component=True,   # plot seasonality by component (daily, weekly, etc.), discussed in next section
255     seasonality_estimate=True,              # plot estimated trend+seasonality, discussed in next section
256     plot=False)                             # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
257 plotly.io.show(fig)

Create a forecast with changepoints

Both trend changepoint detection and seasonality changepoint detection algorithms have been integrated with Silverkite, so one is able to invoke the algorithm by passing corresponding parameters. It will first detect changepoints with the given parameters, then feed the detected changepoints to the forecasting model.

268 # specify dataset information
269 metadata = dict(
270     time_col="ts",  # name of the time column ("datepartition" in example above)
271     value_col="y",  # name of the value column ("macrosessions" in example above)
272     freq="D"        # "H" for hourly, "D" for daily, "W" for weekly, etc.
273     # Any format accepted by ``pd.date_range``
274 )
275 # specify changepoint parameters in model_components
276 model_components = dict(
277     changepoints={
278         # it's ok to provide one of ``changepoints_dict`` or ``seasonality_changepoints_dict`` by itself
279         "changepoints_dict": {
280             "method": "auto",
281             "yearly_seasonality_order": 15,
282             "regularization_strength": 0.5,
283             "resample_freq": "7D",
284             "potential_changepoint_n": 25,
285             "no_changepoint_proportion_from_end": 0.2
286         },
287         "seasonality_changepoints_dict": {
288             "potential_changepoint_distance": "60D",
289             "regularization_strength": 0.5,
290             "no_changepoint_proportion_from_end": 0.2
291         }
292     },
293     custom={
294         "fit_algorithm_dict": {
295             "fit_algorithm": "ridge"}})  # use ridge to prevent overfitting when there many changepoints
296
297 # Generates model config
298 config = ForecastConfig.from_dict(
299     dict(
300         model_template=ModelTemplateEnum.SILVERKITE.name,
301         forecast_horizon=365,  # forecast 1 year
302         coverage=0.95,  # 95% prediction intervals
303         metadata_param=metadata,
304         model_components_param=model_components))
305
306 # Then run with changepoint parameters
307 forecaster = Forecaster()
308 result = forecaster.run_forecast_config(
309     df=df,
310     config=config)

Out:

Fitting 3 folds for each of 1 candidates, totalling 3 fits

Note

The automatic trend changepoint detection algorithm also supports adding additional custom trend changepoints in forecasts. In the changepoints_dict parameter above, you may add the following parameters to include additional trend changepoints besides the detected ones:

  • dates: a list of custom trend changepoint dates, parsable by pandas.to_datetime. For example, [“2020-01-01”, “2020-02-15”].

  • combine_changepoint_min_distance: the minimum distance allowed between a detected changepoint and a custom changepoint, default is None. For example, “5D”. If violated, one of them will be dropped according to the next parameter keep_detected.

  • keep_detected: True or False, default False. Decides whether to keep the detected changepoint or the custom changepoint when they are too close. If set to True, keeps the detected changepoint, otherwise keeps the custom changepoint.

Check results

Details of the results are given in the Simple Forecast example. We just show a few specific results here.

The original trend changepoint detection plot is accessible. One could pass the same parameters in a dictionary as they are using the plot function in ChangepointDetector.

336 fig = result.model[-1].plot_trend_changepoint_detection(dict(plot=False))  # -1 gets the estimator from the pipeline
337 plotly.io.show(fig)

Let’s plot the historical forecast on the holdout test set.

341 backtest = result.backtest
342 fig = backtest.plot()
343 plotly.io.show(fig)

Let’s plot the forecast (trained on all data):

347 forecast = result.forecast
348 fig = forecast.plot()
349 plotly.io.show(fig)

Check out the component plot, trend changepoints are marked in the trend component plot.

354 fig = backtest.plot_components()
355 plotly.io.show(fig)  # fig.show() if you are using "PROPHET" template

Total running time of the script: ( 2 minutes 37.207 seconds)

Gallery generated by Sphinx-Gallery