The Greykite Anomaly Detection model

Authors: Reza Hosseini, Sayan Patra

Greykite AD (Anomaly Detection) is an extension of the Greykite Forecasting library. It provides users with an interpretable, fast, robust and easy to use interface to monitor their metrics with minimal effort.

Greykite AD improves upon the out-of-box confidence intervals generated by Silverkite, by automatically tuning the confidence intervals and other filters (e.g. based on Absolute Percentage Error (APE)) using expected alert rate information and/ or anomaly labels, if available. It allows the users to define robust objective function, constraints and parameter space to optimize the confidence intervals. For example user can target a minimal recall level of 80% while maximizing precision. Additionally, the users can specify a minimum error level to filter out anomalies that are not business relevant. The motivation to include criteria other than statistical significance is to bake in material/ business impact into the detection.

The parameters of the model can be configured via a config file, which makes the model easy to serve in production environments. This approach has proved effective in real-world use cases at scale.

How does the algorithm work?

The algorithm is based on the following steps:

Fit a forecast model using the given ForecastConfig.

Fit a volatility model on forecast errors using the given ADConfig. This includes using expected alert rate and / or anomaly labels to optimize the confidence bands and other filters based on APE etc. The optimization can reflect complex user preferences e.g. target a minimum recall level while maximizing precision.

Advantages of using the Greykite AD

It works on any data frequency (e.g. daily, hourly, 15 minutes, 5 minutes, 1 min) even with small amount of data.

It can use user feedback (labels provided by the user) to adjust itself over time, but works without anomaly labels too.

It takes into account seasonality, holidays, growth and other complex patterns when issuing alerts.

It provides great flexibility in optimization metrics e.g. optimize recall subject to precision being at least 80 percent.

One of the primary advantages of the Greykite AD is that it works for both supervised and unsupervised problems. In the supervised case, the user provides the labels for the anomalies. These labels are used to train, evaluate and update the model. The reward function is chosen to utilize these labels e.g. precision, recall, F1 score etc, or a combination of these. In the unsupervised case, when no labels are provided, the algorithm uses the percent of time points that are labeled as anomalies to optimize the model.

The library is designed such that the users can provide flexible objective functions and constraints that suit their use case. It allows for the following:

Combining objective functions and constraints.

For example, the user can specify to optimize F1 score such that the anomaly percent is less than 5% and the recall is at least 80%.

Filter anomalies based on business requirements.

It is possible that a statistically significant anomaly is not business relevant. For example, a statistically significant spike in the number of users on a website may not have enough business impact to warrant an alert. The user can filter such anomalies by specifying a minimum error level to trigger an alert. The library provides Absolute Percentage Error (APE) and Symmetric Absolute Percentage Error (SAPE) as two options for error level. The user can also pass a grid of values (e.g. ADConfig.ape_grid) and Greykite can use the grid in the optimization. In the supervised case, Greykite can use the labels to find the optimal combination of confidence interval coverage and APE filter based on the user labels. In the un-supervised case, it makes sense to optimize only over either coverage or a filter because the only input is the expected alert rate.

Soft version of metrics to bridge the gap between business requirements and statistical requirements.

We developed soft version of well-known metrics that allows for margin of error in the detection process i.e. soft precision, soft recall and soft F1 score. For example, if the user is satisfied that an anomaly is captured within 5 hours of its occurrence, then the user can specify a window of 5 hours in soft recall. The library will then consider an anomaly to be a true positive if it is detected within 5 hours of its occurrence. This often bridges the gap between the business requirements and the statistical requirements.

See Simple Anomaly Detection to get started. A more detailed tutorial for the anomaly detection process is at Tune your first anomaly detection model. You can follow that guide for advanced configuration.