Anomaly detection baselines

Question

How do robust covariance detectors compare with popular anomaly-detection baselines on a tabular heavy-tail task?

This benchmark is meant for orientation rather than final publication. It translates robust covariance output into the language many ML users expect: precision, recall, F1, ROC AUC, detection count, and runtime.

Benchmark design

The data contain heavy-tailed inliers and a small fraction of mixed outliers. Outliers include both mean-shifted points and leverage-like observations. The benchmark compares:

  • robustcov FastMCD robust-distance detector;

  • robustcov auto robust anomaly detector;

  • sklearn IsolationForest;

  • sklearn LocalOutlierFactor;

  • sklearn OneClassSVM;

  • sklearn EllipticEnvelope.

Results

Anomaly baseline F1 comparison
Anomaly baseline comparison

method

seconds

precision

recall

f1

roc_auc

detected

sklearn EllipticEnvelope

0.4706082669999887

0.8055555555555556

0.8055555555555556

0.8055555555555556

0.9877214170692431

72

robustcov FastMCD detector

0.18479795199982618

0.75

0.75

0.75

0.9827227589908749

72

sklearn IsolationForest

0.13584519300002285

0.5694444444444444

0.5694444444444444

0.5694444444444444

0.9688338701019861

72

robustcov Auto detector

0.03845888500018191

0.5555555555555556

0.5555555555555556

0.5555555555555556

0.9685654857756307

72

sklearn OneClassSVM

0.009737448999658227

0.45714285714285713

0.4444444444444444

0.4507042253521127

0.9214472624798711

70

sklearn LocalOutlierFactor

0.021117043000231206

0.4444444444444444

0.4444444444444444

0.4444444444444444

0.553408480944713

72

Interpretation

The FastMCD detector is the most direct robustcov baseline for separable tabular anomalies. It converts robust Mahalanobis distances into anomaly labels using an empirical threshold. This is the easiest method to explain to users: fit a robust center and covariance, compute robust distances, and flag the largest distances.

AutoRobustAnomalyDetector is intended as a diagnostic ensemble rather than a universal replacement for dedicated anomaly detectors. It is useful when the user wants a robust covariance view of the data and wants to compare several scatter estimators quickly.

The sklearn methods are included because users will naturally compare against them. The result to emphasize is not that robust covariance always wins, but that it gives interpretable distances, covariance diagnostics, and a clear geometric explanation of flagged points.

Command

python benchmarks/anomaly_detection_baselines.py \
  --n 900 \
  --p 12 \
  --contamination 0.08 \
  --csv results/anomaly_baselines.csv

python examples/plot_anomaly_baselines.py \
  --input results/anomaly_baselines.csv \
  --output results/anomaly_baselines.png

What to look at first

  • Use F1 when contamination is known and you care about balanced precision/recall.

  • Use ROC AUC when you care more about ranking anomalies than choosing a threshold.

  • Inspect robust distance profiles after the benchmark; they explain why points are flagged.