Note
Click here to download the full example code
Changepoint Detection
You can detect trend and seasonality changepoints with just a few lines of code.
Provide your timeseries as a pandas dataframe with timestamp and value.
For example, to work with daily sessions data, your dataframe could look like this:
import pandas as pd
df = pd.DataFrame({
"datepartition": ["2020-01-08-00", "2020-01-09-00", "2020-01-10-00"],
"macrosessions": [10231.0, 12309.0, 12104.0]
})
The time column can be any format recognized by pd.to_datetime
.
In this example, we’ll load a dataset representing log(daily page views)
on the Wikipedia page for Peyton Manning.
It contains values from 2007-12-10 to 2016-01-20. More dataset info
here.
27 import warnings
28
29 warnings.filterwarnings("ignore")
30
31 import pandas as pd
32 import plotly
33
34 from greykite.algo.changepoint.adalasso.changepoint_detector import ChangepointDetector
35 from greykite.framework.benchmark.data_loader_ts import DataLoaderTS
36 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
37 from greykite.framework.templates.forecaster import Forecaster
38 from greykite.framework.templates.model_templates import ModelTemplateEnum
39
40 # Loads dataset into UnivariateTimeSeries
41 dl = DataLoaderTS()
42 ts = dl.load_peyton_manning_ts()
43 df = ts.df # cleaned pandas.DataFrame
Detect trend change points
Let’s plot the original timeseries.
There are actually trend changes within this data set.
The UnivariateTimeSeries
class is used to store a timeseries and to provide basic description and plotting functions.
The load_peyton_manning
function automatically returns a UnivariateTimeSeries
instance,
however, for any df
, you can always initialize a UnivariateTimeSeries
instance and
do further explorations.
(The interactive plot is generated by plotly
: click to zoom!)
56 fig = ts.plot()
57 plotly.io.show(fig)
ChangepointDetector
utilizes pre-filters, regularization with regression based models, and
post-filters to find time points where trend changes.
To create a simple trend changepoint detection model, we first initialize the
ChangepointDetector
class,
then run its attribute function find_trend_changepoints
.
67 model = ChangepointDetector()
68 res = model.find_trend_changepoints(
69 df=df, # data df
70 time_col="ts", # time column name
71 value_col="y") # value column name
72 pd.DataFrame({"trend_changepoints": res["trend_changepoints"]}) # prints a dataframe showing the result
The code above runs trend changepoint detection with the default parameters.
We may visualize the detection results by plotting it with the attribute
function plot
.
79 fig = model.plot(plot=False) # plot = False returns a plotly figure object.
80 plotly.io.show(fig)
There might be too many changepoints with the default parameters. We could customize the parameters to meet individual requirements.
To understand the parameters, we introduce a little bit of the background
knowledge. The algorithm first does a mean aggregation to eliminate small
fluctuations/seasonality effects (resample_freq
). This avoids the trend
picking up small fluctuations/seasonality effects.
Then a great number of potential changepoints are placed uniformly over the
whole time span (specified by time between changepoints potential_changepoint_distance
or number of potential changepoints potential_changepoint_n
, the former overrides the latter).
The adaptive lasso (more info
at adalasso)
is used to shrink insignificant changepoints’ coefficients to zero.
The initial estimator for adaptive lasso could be one of “ols”, “ridge”
and “lasso” (adaptive_lasso_initial_estimator
). The regularization
strength of adaptive lasso is also controllable by users
(regularization_strength
, between 0.0 and 1.0, greater values imply
fewer changepoints. None
triggers cross-validation to select the best
tuning parameter based on prediction performance).
Yearly seasonality effect is too long to be eliminated by aggregation, so
fitting it with trend is recommended (yearly_seasonality_order
).
This allows changepoints to distinguish trend from yearly seasonality.
Putting changepoints too close to the end of data is not recommended,
because we may not have enough data to fit the final trend,
especially in forecasting tasks. Therefore, one could specify how far
from the end changepoints are not allowed (specified by the time from the end
of data no_changepoint_distance_from_end
or proportion of data from the end
no_changepoint_proportion_from_end
, the former overrides the latter).
Finally, a post-filter is applied to eliminate changepoints that are too close
(actual_changepoint_min_distance
).
The following parameter combination uses longer aggregation with less potential changepoints placed and higher yearly seasonality order. Changepoints are not allowed in the last 20% of the data
124 model = ChangepointDetector() # it's also okay to omit this and re-use the old instance
125 res = model.find_trend_changepoints(
126 df=df, # data df
127 time_col="ts", # time column name
128 value_col="y", # value column name
129 yearly_seasonality_order=15, # yearly seasonality order, fit along with trend
130 regularization_strength=0.5, # between 0.0 and 1.0, greater values imply fewer changepoints, and 1.0 implies no changepoints
131 resample_freq="7D", # data aggregation frequency, eliminate small fluctuation/seasonality
132 potential_changepoint_n=25, # the number of potential changepoints
133 no_changepoint_proportion_from_end=0.2) # the proportion of data from end where changepoints are not allowed
134 pd.DataFrame({"trend_changepoints": res["trend_changepoints"]})
We may also plot the detection result.
139 fig = model.plot(plot=False)
140 plotly.io.show(fig)
Now the detected trend changepoints look better! Similarly, we could also
specify potential_changepoint_distance
and no_changepoint_distance_from_end
instead of potential_changepoint_n
and no_changepoint_proportion_from_end
.
For example potential_changepoint_distance="60D" and
``no_changepoint_distance_from_end="730D"
. Remeber these will override
potential_changepoint_n
and no_changepoint_proportion_from_end
.
Moreover, one could also control what components to be plotted. For example
152 fig = model.plot(
153 observation=True, # whether to plot the observations
154 observation_original=True, # whether to plot the unaggregated values
155 trend_estimate=True, # whether to plot the trend estimation
156 trend_change=True, # whether to plot detected trend changepoints
157 yearly_seasonality_estimate=True, # whether to plot estimated yearly seasonality
158 adaptive_lasso_estimate=True, # whether to plot the adaptive lasso estimated trend
159 seasonality_change=False, # detected seasonality change points, discussed in next section
160 seasonality_change_by_component=True, # plot seasonality by component (daily, weekly, etc.), discussed in next section
161 seasonality_estimate=False, # plot estimated trend+seasonality, discussed in next section
162 plot=False) # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
163 plotly.io.show(fig)
Detect seasonality change points
By seasonality change points, we mean the time points where the shape of seasonality effects change, i.e., the seasonal shape may become “fatter” or “thinner”. Similar to trend changepoint detection, we also have pre-filtering, regularization with regression based model and post-filtering in seasonality change point detection.
To create a simple seasonality changepoint detection model, we could either use
the previous ChangepointDetector
object which already has the trend changepoint
information, or initialize a new ChangepointDetector
object. Then one could run
the find_seasonality_changepoints
function.
Note that because we first remove trend effect from the timeseries before detecting
seasonality changepoints, using the old ChangepointDetector
object with trend changepoint
detection results on the same df will pass the existing trend information and save time.
If a new class object is initialized and one runs find_seasonality_changepoints
directly,
the model will first run find_trend_changepoints
to get trend changepoint information.
In this case, it will run with the default trend changepoint detection parameters.
However, it is recommended that user runs find_trend_changepoints
and check the result
before running find_seasonality_changepoints
.
Here we use the old object which already contains trend changepoint information.
190 res = model.find_seasonality_changepoints(
191 df=df, # data df
192 time_col="ts", # time column name
193 value_col="y") # value column name
194 pd.DataFrame(dict([(k, pd.Series(v)) for k, v in res["seasonality_changepoints"].items()])) # view result
195 # one could also print res["seasonality_changepoints"] directly to view the result
We can also plot the detection results, simply set seasonality_change
and
seasonality_estimate
to be True.
201 fig = model.plot(
202 seasonality_change=True, # detected seasonality change points, discussed in next section
203 seasonality_change_by_component=True, # plot seasonality by component (daily, weekly, etc.), discussed in next section
204 seasonality_estimate=True, # plot estimated trend+seasonality, discussed in next section
205 plot=False) # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
206 plotly.io.show(fig)
In this example, there is not too much seasonality change, thus we only see one yearly seasonality change point, however, we could also customize parameters to increase the seasonality changepoint detection sensitivity.
The only parameter that differs from trend changepoint detection is seasonality_components_df
,
which configures the seasonality components. Supplying daily, weekly and yearly seasonality
works well for most cases. Users can also include monthly and quarterly seasonality.
The full df is:
218 seasonality_components_df = pd.DataFrame({
219 "name": ["tod", "tow", "conti_year"], # component value column name used to create seasonality component
220 "period": [24.0, 7.0, 1.0], # period for seasonality component
221 "order": [3, 3, 5], # Fourier series order
222 "seas_names": ["daily", "weekly", "yearly"]}) # seasonality component name
However, if the inferred data frequency is at least one day, the daily component will be removed.
Another optional parameter is trend_changepoints
that allows users to provide
a list of trend changepoints to skip calling find_trend_changepoints
.
Now we run find_seasonality_changepoints
with a smaller regularization_strength
,
and restrict changepoints to the first 80% data. As recommended, we use our previous
detected trend change points (use the same object after running find_trend_changepoints
).
234 res = model.find_seasonality_changepoints(
235 df=df, # data df
236 time_col="ts", # time column name
237 value_col="y", # value column name
238 seasonality_components_df=pd.DataFrame({ # seasonality config df
239 "name": ["tow", "conti_year"], # component value column name used to create seasonality component
240 "period": [7.0, 1.0], # period for seasonality component
241 "order": [3, 5], # Fourier series order
242 "seas_names": ["weekly", "yearly"]}), # seasonality component name
243 regularization_strength=0.4, # between 0.0 and 1.0, greater values imply fewer changepoints, and 1.0 implies no changepoints
244 no_changepoint_proportion_from_end=0.2, # no changepoint in the last 20% data
245 trend_changepoints=None) # optionally specify trend changepoints to avoid calling find_trend_changepoints
246 pd.DataFrame(dict([(k, pd.Series(v)) for k, v in res["seasonality_changepoints"].items()])) # view result
247 # one could also print res["seasonality_changepoints"] directly to view the result
We can also plot the detection results.
252 fig = model.plot(
253 seasonality_change=True, # detected seasonality change points, discussed in next section
254 seasonality_change_by_component=True, # plot seasonality by component (daily, weekly, etc.), discussed in next section
255 seasonality_estimate=True, # plot estimated trend+seasonality, discussed in next section
256 plot=False) # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
257 plotly.io.show(fig)
Create a forecast with changepoints
Both trend changepoint detection and seasonality changepoint detection algorithms have been integrated with Silverkite, so one is able to invoke the algorithm by passing corresponding parameters. It will first detect changepoints with the given parameters, then feed the detected changepoints to the forecasting model.
268 # specify dataset information
269 metadata = dict(
270 time_col="ts", # name of the time column ("datepartition" in example above)
271 value_col="y", # name of the value column ("macrosessions" in example above)
272 freq="D" # "H" for hourly, "D" for daily, "W" for weekly, etc.
273 # Any format accepted by ``pd.date_range``
274 )
275 # specify changepoint parameters in model_components
276 model_components = dict(
277 changepoints={
278 # it's ok to provide one of ``changepoints_dict`` or ``seasonality_changepoints_dict`` by itself
279 "changepoints_dict": {
280 "method": "auto",
281 "yearly_seasonality_order": 15,
282 "regularization_strength": 0.5,
283 "resample_freq": "7D",
284 "potential_changepoint_n": 25,
285 "no_changepoint_proportion_from_end": 0.2
286 },
287 "seasonality_changepoints_dict": {
288 "potential_changepoint_distance": "60D",
289 "regularization_strength": 0.5,
290 "no_changepoint_proportion_from_end": 0.2
291 }
292 },
293 custom={
294 "fit_algorithm_dict": {
295 "fit_algorithm": "ridge"}}) # use ridge to prevent overfitting when there many changepoints
296
297 # Generates model config
298 config = ForecastConfig.from_dict(
299 dict(
300 model_template=ModelTemplateEnum.SILVERKITE.name,
301 forecast_horizon=365, # forecast 1 year
302 coverage=0.95, # 95% prediction intervals
303 metadata_param=metadata,
304 model_components_param=model_components))
305
306 # Then run with changepoint parameters
307 forecaster = Forecaster()
308 result = forecaster.run_forecast_config(
309 df=df,
310 config=config)
Out:
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Note
The automatic trend changepoint detection algorithm also supports adding additional custom trend
changepoints in forecasts. In the changepoints_dict
parameter above, you may add the following
parameters to include additional trend changepoints besides the detected ones:
dates
: a list of custom trend changepoint dates, parsable bypandas.to_datetime
. For example, [“2020-01-01”, “2020-02-15”].
combine_changepoint_min_distance
: the minimum distance allowed between a detected changepoint and a custom changepoint, default is None. For example, “5D”. If violated, one of them will be dropped according to the next parameterkeep_detected
.
keep_detected
: True or False, default False. Decides whether to keep the detected changepoint or the custom changepoint when they are too close. If set to True, keeps the detected changepoint, otherwise keeps the custom changepoint.
Check results
Details of the results are given in the Simple Forecast example. We just show a few specific results here.
The original trend changepoint detection plot is accessible.
One could pass the same parameters in a dictionary as they are using
the plot
function in ChangepointDetector
.
336 fig = result.model[-1].plot_trend_changepoint_detection(dict(plot=False)) # -1 gets the estimator from the pipeline
337 plotly.io.show(fig)
Let’s plot the historical forecast on the holdout test set.
341 backtest = result.backtest
342 fig = backtest.plot()
343 plotly.io.show(fig)
Let’s plot the forecast (trained on all data):
347 forecast = result.forecast
348 fig = forecast.plot()
349 plotly.io.show(fig)
Check out the component plot, trend changepoints are marked in the trend component plot.
354 fig = backtest.plot_components()
355 plotly.io.show(fig) # fig.show() if you are using "PROPHET" template
Total running time of the script: ( 3 minutes 19.925 seconds)