Simple Anomaly Detection

You can create and evaluate an anomaly detection model with just a few lines of code.

Provide your timeseries as a pandas dataframe with timestamp and value. Optionally, you can also provide the anomaly labels as a column in the dataframe.

For example, to detect anomalies in daily sessions data, your dataframe could look like this:

import pandas as pd
df = pd.DataFrame({
    "date": ["2020-01-08-00", "2020-01-09-00", "2020-01-10-00"],
    "sessions": [10231.0, 12309.0, 12104.0],
    "is_anomaly": [False, True, False]
})

The time column can be any format recognized by pandas.to_datetime.

In this example, we’ll load a dataset representing log(daily page views) on the Wikipedia page for Peyton Manning. It contains values from 2007-12-10 to 2016-01-20. More dataset info here.

29 import warnings
30
31 import plotly
32 from greykite.common.data_loader import DataLoader
33 from greykite.detection.detector.config import ADConfig
34 from greykite.detection.detector.data import DetectorData
35 from greykite.detection.detector.greykite import GreykiteDetector
36 from greykite.framework.templates.autogen.forecast_config import ForecastConfig
37 from greykite.framework.templates.autogen.forecast_config import MetadataParam
38 from greykite.framework.templates.model_templates import ModelTemplateEnum
39
40 warnings.filterwarnings("ignore")
41
42 # Loads dataset into pandas DataFrame
43 dl = DataLoader()
44 df = dl.load_peyton_manning()
45
46 # specify dataset information
47 metadata = MetadataParam(
48     time_col="ts",  # name of the time column ("date" in example above)
49     value_col="y",  # name of the value column ("sessions" in example above)
50     freq="D"  # "H" for hourly, "D" for daily, "W" for weekly, etc.
51     # Any format accepted by `pandas.date_range`
52 )

Create an Anomaly Detection Model

Similar to forecasting, you need to provide a forecast config and an anomaly detection config. You can choose any of the available forecast model templates (see Choose a Model).

61 # In this example, we choose the "AUTO" model template for the forecast config,
62 # and the default anomaly detection config.
63 # The Silverkite "AUTO" model template chooses the parameter configuration
64 # given the input data frequency, forecast horizon and evaluation configs.
65
66 anomaly_detector = GreykiteDetector()  # Creates an instance of the Greykite anomaly detector
67
68 forecast_config = ForecastConfig(
69     model_template=ModelTemplateEnum.AUTO.name,
70     forecast_horizon=7,  # forecasts 7 steps ahead
71     coverage=None,       # Confidence Interval will be tuned by the AD model
72     metadata_param=metadata)
73
74 ad_config = ADConfig()  # Default anomaly detection config
75
76 detector = GreykiteDetector(
77     forecast_config=forecast_config,
78     ad_config=ad_config,
79     reward=None)

Train the Anomaly Detection Model

You can train the anomaly detection model by calling the fit method. This method takes a DetectorData object as input. The DetectorData object consists the time series information as a pandas dataframe. Optionally, you can also provide the anomaly labels as a column in the dataframe. The anomaly labels can also be provided as a list of boolean values. The anomaly labels are used to evaluate the model performance.

91 train_size = int(2700)
92 df_train = df[:train_size].reset_index(drop=True)
93 train_data = DetectorData(df=df_train)
94 detector.fit(data=train_data)

Out:

Fitting 3 folds for each of 1 candidates, totalling 3 fits

Predict with the Anomaly Detection Model

You can predict anomalies by calling the predict method.

101 test_data = DetectorData(df=df)
102 test_data = detector.predict(test_data)

Evaluate the Anomaly Detection Model

The output of the anomaly detection model are stored as attributes of the GreykiteDetector object. (The interactive plots are generated by plotly: click to zoom!)

Training

The fitted_df attribute contains the result on the training data. You can plot the result by calling the plot method with phase="train".

117 print(detector.fitted_df)
118
119 fig = detector.plot(
120     phase="train",
121     title="Greykite Detector Peyton Manning - fit phase")
122 plotly.io.show(fig)

Out:

             ts    actual  forecast  forecast_lower  forecast_upper  is_anomaly_predicted   z_score is_anomaly
0    2007-12-10  9.590761  9.194596        9.166271        9.222921                  True  0.877043       None
1    2007-12-11  8.519590  9.022015        8.993690        9.050340                  True -1.112284       None
2    2007-12-12  8.183677  8.945369        8.917044        8.973694                  True -1.686259       None
3    2007-12-13  8.072467  8.895533        8.867208        8.923858                  True -1.822130       None
4    2007-12-14  7.893572  8.836495        8.808170        8.864820                  True -2.087475       None
...         ...       ...       ...             ...             ...                   ...       ...        ...
2753 2015-06-24  7.344073  6.915129        6.886804        6.943454                  True  0.949609       None
2754 2015-06-25  7.291656  6.937793        6.909468        6.966118                  True  0.783395       None
2755 2015-06-26  7.271704  6.872038        6.843713        6.900363                  True  0.884793       None
2756 2015-06-27  7.454720  6.485441        6.457116        6.513766                  True  2.145821       None
2757 2015-06-28  6.692084  6.709412        6.681087        6.737737                 False -0.038362       None

[2758 rows x 8 columns]

Prediction

The pred_df attribute contains the predicted result. You can plot the result by calling the plot method with phase="predict".

130 print(detector.pred_df)
131
132 fig = detector.plot(
133     phase="predict",
134     title="Greykite Detector Peyton Manning - predict phase")
135 plotly.io.show(fig)

Out:

             ts     actual  forecast  forecast_lower  forecast_upper  is_anomaly_predicted   z_score is_anomaly
0    2007-12-10   9.590761  9.194596        9.166271        9.222921                  True  0.877043       None
1    2007-12-11   8.519590  9.022015        8.993690        9.050340                  True -1.112284       None
2    2007-12-12   8.183677  8.945369        8.917044        8.973694                  True -1.686259       None
3    2007-12-13   8.072467  8.895533        8.867208        8.923858                  True -1.822130       None
4    2007-12-14   7.893572  8.836495        8.808170        8.864820                  True -2.087475       None
...         ...        ...       ...             ...             ...                   ...       ...        ...
2900 2016-01-16   7.817223  8.268451        8.240126        8.296776                  True -0.998943       None
2901 2016-01-17   9.273878  8.670900        8.642575        8.699225                  True  1.334893       None
2902 2016-01-18  10.333775  9.143704        9.115379        9.172029                  True  2.634619       None
2903 2016-01-19   9.125871  8.861068        8.832743        8.889393                  True  0.586231       None
2904 2016-01-20   8.891374  8.860561        8.832236        8.888886                  True  0.068214       None

[2905 rows x 8 columns]

Model Summary

Model summary allows inspection of individual model terms. Check parameter estimates and their significance for insights on how the model works and what can be further improved. You can call the summary method to see the model summary.

144 summary = detector.summary()
145 print(summary)

Out:

======================= Anomaly Detection Model Summary ========================

Number of observations: 2758
Model: GreykiteDetector
Number of detected anomalies: 2513

Average Anomaly Duration: 13 days 07:06:40
Minimum Anomaly Duration: 1 days 00:00:00
Maximum Anomaly Duration: 99 days 00:00:00

Alert Rate(%): 91.11675126903553
Optimal Objective Value: 0
Optimal Parameters: {'coverage': 0.05, 'volatility_features': []}

============================ Forecast Model Summary ============================

Number of observations: 2758,   Number of features: 130
Method: Ridge regression
Number of nonzero features: 130
Regularization parameter: 1.487

Residuals:
         Min           1Q       Median           3Q          Max
      -2.291      -0.2455     -0.05206       0.1748        3.176

            Pred_col   Estimate Std. Err Pr(>)_boot sig. code                95%CI
           Intercept      6.472  0.08988     <2e-16       ***       (6.301, 6.657)
 events_C...New Year    0.04443   0.2609      0.858              (-0.5006, 0.5232)
 events_C...w Year-1    -0.1577   0.2515      0.506              (-0.6969, 0.3487)
 events_C...w Year-2     0.1359   0.2205      0.564              (-0.2588, 0.6172)
 events_C...w Year+1    0.09435     0.25      0.676              (-0.3786, 0.6645)
 events_C...w Year+2     0.1168   0.1701      0.480              (-0.1956, 0.4884)
events_Christmas Day    -0.3431   0.1718      0.032         *  (-0.6954, -0.05099)
 events_C...as Day-1    -0.1046   0.1641      0.498              (-0.4351, 0.1903)
 events_C...as Day-2   0.009805   0.2703      0.970              (-0.5998, 0.4076)
 events_C...as Day+1    -0.1952   0.1791      0.246             (-0.5804, 0.09806)
 events_C...as Day+2     0.1757  0.08532      0.034         *   (0.006934, 0.3328)
 events_E...Ireland]    -0.1355  0.09247      0.148             (-0.3038, 0.05459)
 events_E...eland]-1    -0.1038  0.06502      0.104             (-0.2091, 0.04433)
 events_E...eland]-2   -0.05626  0.04159      0.186             (-0.1407, 0.01898)
 events_E...eland]+1   -0.07939   0.0794      0.328             (-0.2219, 0.07749)
 events_E...eland]+2    -0.0502  0.06782      0.434             (-0.1953, 0.06742)
  events_Good Friday     -0.158  0.05532      0.010         *  (-0.2754, -0.06255)
events_Good Friday-1    -0.1514  0.05425      0.008        **  (-0.2514, -0.03783)
events_Good Friday-2    -0.1018  0.06388      0.088         .   (-0.2282, 0.03023)
events_Good Friday+1   -0.05626  0.04159      0.186             (-0.1407, 0.01898)
events_Good Friday+2    -0.1038  0.06502      0.104             (-0.2091, 0.04433)
 events_I...ence Day     0.0465  0.06725      0.508             (-0.09443, 0.1766)
 events_I...ce Day-1    0.02868   0.0727      0.686              (-0.1186, 0.1622)
 events_I...ce Day-2   -0.02212  0.05494      0.716              (-0.127, 0.07541)
 events_I...ce Day+1    0.01047  0.06082      0.880              (-0.1066, 0.1298)
 events_I...ce Day+2    0.02178  0.07228      0.798              (-0.1204, 0.1528)
    events_Labor Day    -0.2545  0.08887      0.008        **  (-0.4173, -0.05204)
  events_Labor Day-1    -0.1403  0.09722      0.158               (-0.331, 0.0409)
  events_Labor Day-2   -0.08559  0.08947      0.312               (-0.2561, 0.106)
  events_Labor Day+1     -0.212  0.06771     <2e-16       ***   (-0.345, -0.07935)
  events_Labor Day+2    -0.2735  0.06152     <2e-16       ***   (-0.3838, -0.1469)
 events_Memorial Day    -0.2118  0.05515     <2e-16       ***   (-0.3067, -0.0977)
 events_M...al Day-1   -0.06989  0.07632      0.336              (-0.194, 0.09979)
 events_M...al Day-2     0.1114   0.0914      0.212             (-0.04693, 0.3107)
 events_M...al Day+1    0.03951  0.05688      0.480             (-0.06897, 0.1476)
 events_M...al Day+2      0.228   0.1018      0.016         *    (0.03284, 0.4161)
events_New Years Day     0.1774   0.1135      0.128             (-0.05062, 0.3927)
 events_N...rs Day-1     0.2514   0.1263      0.048         *  (-0.002944, 0.4813)
 events_N...rs Day-2     0.3549    0.151      0.018         *      (0.06871, 0.65)
 events_N...rs Day+1     0.4707    0.139      0.002        **      (0.206, 0.7525)
 events_N...rs Day+2      0.416   0.1537      0.004        **     (0.08613, 0.698)
        events_Other    -0.0301  0.03251      0.338             (-0.0914, 0.03633)
      events_Other-1   -0.01524  0.03319      0.656             (-0.0764, 0.04995)
      events_Other-2   0.002043  0.02983      0.950            (-0.05273, 0.06514)
      events_Other+1   -0.01687  0.03355      0.598              (-0.077, 0.05344)
      events_Other+2   -0.01129  0.02988      0.710            (-0.06976, 0.04601)
 events_Thanksgiving   -0.03204  0.07584      0.646              (-0.1836, 0.1049)
 events_T...giving-1    -0.2414  0.05192     <2e-16       ***   (-0.3421, -0.1425)
 events_T...giving-2     -0.167  0.06166      0.012         *  (-0.2772, -0.03954)
 events_T...giving+1    0.01656  0.09726      0.858              (-0.1635, 0.2103)
 events_T...giving+2   -0.04235  0.06179      0.478             (-0.1612, 0.08538)
 events_Veterans Day   -0.06497   0.1086      0.544               (-0.2692, 0.152)
 events_V...ns Day-1    -0.1159  0.08538      0.162             (-0.2827, 0.05792)
 events_V...ns Day-2    -0.1861  0.09677      0.058         .    (-0.382, 0.01128)
 events_V...ns Day+1   -0.07166  0.09513      0.470              (-0.2398, 0.1216)
 events_V...ns Day+2    -0.0256  0.07567      0.740              (-0.1735, 0.1243)
       str_dow_2-Tue   -0.09913  0.05629      0.078         .   (-0.2165, 0.01572)
       str_dow_3-Wed   -0.07437  0.03571      0.032         * (-0.1442, -0.005199)
       str_dow_4-Thu   -0.01387  0.02869      0.632            (-0.06638, 0.04708)
       str_dow_5-Fri  0.0006297  0.02892      0.982            (-0.05414, 0.05497)
       str_dow_6-Sat     -0.181  0.03487     <2e-16       ***   (-0.2405, -0.1074)
       str_dow_7-Sun   -0.06788  0.06096      0.274             (-0.1909, 0.05126)
                 ct1     0.1797   0.1369      0.192              (-0.0829, 0.4668)
      is_weekend:ct1    -0.0109   0.0893      0.910               (-0.167, 0.1796)
   str_dow_2-Tue:ct1 -0.0004736   0.1028      0.998               (-0.202, 0.1908)
   str_dow_3-Wed:ct1   -0.02066  0.07259      0.760              (-0.1574, 0.1308)
   str_dow_4-Thu:ct1    0.02278  0.07299      0.758              (-0.1343, 0.1509)
   str_dow_5-Fri:ct1    0.06624   0.0694      0.314             (-0.06949, 0.2109)
   str_dow_6-Sat:ct1    0.01419  0.06192      0.838              (-0.1138, 0.1228)
   str_dow_7-Sun:ct1   -0.02508  0.09093      0.756              (-0.2045, 0.1594)
   cp0_2009_01_26_00     0.5946   0.1322     <2e-16       ***     (0.3175, 0.8421)
 is_weeke...01_26_00     0.1062  0.09194      0.260             (-0.08049, 0.2722)
 str_dow_...01_26_00     0.0287  0.09603      0.796              (-0.1606, 0.2369)
 str_dow_...01_26_00    0.04327   0.0779      0.578             (-0.09932, 0.2012)
 str_dow_...01_26_00      0.088  0.07208      0.214             (-0.05984, 0.2158)
 str_dow_...01_26_00    0.04049  0.07495      0.584              (-0.0956, 0.1924)
 str_dow_...01_26_00    0.02755  0.06685      0.654              (-0.1011, 0.1579)
 str_dow_...01_26_00    0.07864  0.09267      0.386              (-0.1083, 0.2646)
   cp1_2012_03_12_00    -0.6865  0.09241     <2e-16       ***   (-0.8741, -0.4989)
 is_weeke...03_12_00    -0.1621  0.08535      0.050         .  (-0.3197, 0.007013)
 str_dow_...03_12_00   -0.09134   0.1423      0.498              (-0.3901, 0.1756)
 str_dow_...03_12_00   -0.04552    0.105      0.682              (-0.2692, 0.1419)
 str_dow_...03_12_00   -0.07273  0.09571      0.428              (-0.2571, 0.1256)
 str_dow_...03_12_00   -0.08202  0.09443      0.368               (-0.269, 0.0992)
 str_dow_...03_12_00   -0.06132  0.08937      0.486              (-0.2236, 0.1113)
 str_dow_...03_12_00    -0.1008   0.1092      0.368              (-0.3097, 0.1149)
 ct1:sin1_tow_weekly   -0.01722  0.06094      0.726              (-0.1371, 0.1128)
 ct1:cos1_tow_weekly    0.01484   0.1099      0.888              (-0.2185, 0.2241)
 cp0_2009...w_weekly  -0.001664  0.06511      0.978              (-0.1184, 0.1241)
 cp0_2009...w_weekly     0.1179  0.09465      0.208             (-0.07559, 0.2906)
 cp1_2012...w_weekly    0.01401  0.06814      0.834               (-0.134, 0.1285)
 cp1_2012...w_weekly    -0.1005   0.1396      0.454               (-0.3541, 0.187)
     sin1_tow_weekly    0.03757  0.02218      0.080         . (-0.008053, 0.07838)
     cos1_tow_weekly     0.2106  0.05319     <2e-16       ***     (0.1132, 0.3198)
    sin1_tom_monthly    0.09547  0.02967      0.002        **    (0.04164, 0.1588)
    cos1_tom_monthly     0.0593   0.0249      0.008        **    (0.01234, 0.1054)
    sin2_tom_monthly    0.07182  0.02579      0.006        **    (0.01721, 0.1176)
    cos2_tom_monthly   -0.08665  0.02536     <2e-16       ***  (-0.1345, -0.03671)
    sin3_tom_monthly   -0.02128  0.02863      0.446            (-0.07114, 0.04116)
    cos3_tom_monthly   -0.01719  0.02485      0.512            (-0.06576, 0.02989)
    sin4_tom_monthly  -0.006038  0.02445      0.806            (-0.04932, 0.04072)
    cos4_tom_monthly    -0.0194  0.02445      0.428            (-0.06729, 0.02718)
    sin5_tom_monthly    0.01227  0.02557      0.628            (-0.03554, 0.06254)
    cos5_tom_monthly   0.009444  0.02618      0.704            (-0.04067, 0.06013)
    sin6_tom_monthly   0.009462  0.02563      0.722            (-0.03975, 0.05969)
    cos6_tom_monthly   -0.00576   0.0251      0.832            (-0.05486, 0.04342)
    sin7_tom_monthly   -0.02706  0.02518      0.296            (-0.07774, 0.02171)
    cos7_tom_monthly   -0.02494  0.02478      0.330            (-0.07246, 0.02663)
     sin1_ct1_yearly    -0.2708  0.02886     <2e-16       ***    (-0.3276, -0.217)
     cos1_ct1_yearly     0.6189  0.05056     <2e-16       ***     (0.5198, 0.7175)
     sin2_ct1_yearly     0.0898  0.02705      0.002        **    (0.04139, 0.1396)
     cos2_ct1_yearly    -0.1118   0.0251     <2e-16       ***    (-0.16, -0.06402)
     sin3_ct1_yearly     0.3182  0.03163     <2e-16       ***     (0.2588, 0.3786)
     cos3_ct1_yearly    0.08626  0.02635     <2e-16       ***    (0.03744, 0.1403)
     sin4_ct1_yearly    0.09706  0.02422     <2e-16       ***    (0.05219, 0.1447)
     cos4_ct1_yearly     -0.156    0.029     <2e-16       ***  (-0.2133, -0.09715)
     sin5_ct1_yearly    -0.1293  0.02995     <2e-16       ***  (-0.1886, -0.07665)
     cos5_ct1_yearly    -0.1002  0.02404     <2e-16       ***  (-0.1457, -0.04934)
     sin6_ct1_yearly     -0.166  0.02498     <2e-16       ***    (-0.2129, -0.114)
     cos6_ct1_yearly    -0.1487  0.02601     <2e-16       ***  (-0.2006, -0.09779)
     sin7_ct1_yearly     -0.128  0.02578     <2e-16       ***  (-0.1772, -0.07743)
     cos7_ct1_yearly    0.04329  0.02491      0.082         . (-0.008112, 0.08897)
     sin8_ct1_yearly  -0.003006  0.02549      0.900            (-0.05187, 0.04744)
     cos8_ct1_yearly     0.2561  0.02897     <2e-16       ***     (0.2002, 0.3072)
              y_lag7     0.9348   0.2124     <2e-16       ***      (0.5128, 1.309)
              y_lag8     0.5674   0.1868     <2e-16       ***     (0.1889, 0.9225)
              y_lag9   -0.09031   0.1699      0.602              (-0.4167, 0.2293)
    y_avglag_7_14_21      1.781   0.2337     <2e-16       ***       (1.352, 2.236)
    y_avglag_7_to_13     0.3023   0.2023      0.144             (-0.06962, 0.6984)
   y_avglag_14_to_20    -0.1425   0.1301      0.286              (-0.4134, 0.1057)
Signif. Code: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.7181,   Adjusted R-squared: 0.7075
F-statistic: 66.925 on 99 and 2657 DF,   p-value: 1.110e-16
Model AIC: 17685.0,   model BIC: 18280.0

WARNING: the condition number is large, 2.28e+04. This might indicate that there are strong multicollinearity or other numerical problems.
WARNING: the F-ratio and its p-value on regularized methods might be misleading, they are provided only for reference purposes.

What’s next?

If you’re satisfied with the forecast performance, you’re done!

For a complete example of how to tune this forecast, see Tune your first anomaly detection model.

Total running time of the script: ( 7 minutes 16.539 seconds)

Gallery generated by Sphinx-Gallery