Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
robustcov 0.0.1 documentation
robustcov 0.0.1 documentation

User guide

  • Installation
  • Quickstart
  • Estimator guide
  • Use-case gallery
    • Finance and risk
      • Finance-style heavy-tail covariance
      • Portfolio covariance stress comparison
    • Fraud, security, and network anomalies
      • Fraud-style tabular anomaly screening
      • Network-traffic anomaly simulation
    • Sensors, industrial monitoring, and quality control
      • Sensor anomaly detection
      • Predictive-maintenance monitoring
      • Quality-control monitoring
    • Biomedical, image, and embedding data
      • Biomedical / signal-window anomaly detection
      • Image-feature one-class anomaly detection
      • Text / embedding outlier screening
    • Real ML datasets
      • Breast-cancer screening as anomaly ranking
      • Digits one-class anomaly detection
      • Wine class screening
      • Multimodal anomaly detection
    • Robust ML preprocessing
      • Robust preprocessing before classification
    • Finance-style heavy-tail covariance
    • Portfolio covariance stress comparison
    • Fraud-style tabular anomaly screening
    • Network-traffic anomaly simulation
    • Sensor anomaly detection
    • Predictive-maintenance monitoring
    • Quality-control monitoring
    • Biomedical / signal-window anomaly detection
    • Image-feature one-class anomaly detection
    • Text / embedding outlier screening
    • Breast-cancer screening as anomaly ranking
    • Digits one-class anomaly detection
    • Wine class screening
    • Robust preprocessing before classification
    • Multimodal anomaly detection
  • Benchmark gallery
    • Small-sample heavy-tail benchmark
    • Speed comparison
    • OpenMP scaling benchmark
    • Anomaly detection baselines
    • Hard contamination scenarios
  • Algorithms
  • Diagnostics
  • Optional OpenMP acceleration
  • FAQ

Reference and evidence

  • API reference
  • API stability
  • Robust statistics background
  • External and Kaggle gallery
    • Credit-card fraud result
    • Predictive maintenance
    • IEEE-CIS fraud
    • Medical screening
    • Finance market-stress anomaly detection
    • Rolling-window finance anomaly detection
  • References

Extended material

  • Notebooks
  • Kaggle and external dataset roadmap
  • Kaggle and external examples
  • External demo workflow
  • Project readiness notes
Back to top
View this page

External and Kaggle gallery¶

This page is the single entry point for optional Kaggle and external-data examples. These examples are not part of tests because they require manual downloads, dataset-specific licenses, or larger local files.

The goal is not to claim that robustcov wins everywhere. The goal is to show where robust covariance gives a strong advantage, where it is competitive, where it is mainly diagnostic, and where another method is better.

How to read the cards¶

Result labels¶

Label

Meaning

Strong win

robustcov clearly improves the most relevant metric against common unsupervised baselines.

Competitive

robustcov is close to the best method, or wins one metric but loses another.

Competitive, slow

robustcov improves quality but runtime is currently a weakness.

Not best

another baseline performs better; the robustcov result is still reported for transparency.

Diagnostic

there are no ground-truth labels, but robust distances provide interpretable stress/anomaly rankings.

Recommended result pages¶

Credit-card fraud PR-AUC comparison

Credit-card fraud

Strong win. FastMCD PR-AUC 0.712 and F1 0.801 on a classic imbalanced fraud dataset.

Predictive maintenance F1 comparison

Predictive maintenance

Competitive. robustcov gives the best F1, while IsolationForest has stronger PR-AUC and speed.

Top finance stress days

Finance market stress

Diagnostic. RegularizedCauchy ranks unusual cross-asset return days.

Top anomalous rolling finance windows

Rolling market regimes

Diagnostic. Window-level features identify abnormal volatility/correlation regimes.

Honest secondary results¶

IEEE-CIS runtime comparison

IEEE-CIS fraud

Competitive, slow. Best tested unsupervised quality, but runtime is a major weakness.

Medical screening F1 comparison

Medical screening

Not best. Useful diagnostic result; EllipticEnvelope wins this benchmark.

Current documented external results¶

External result registry¶

Dataset / example

Status

Main method

Headline result

Notes

Credit-card fraud

Strong win

FastMCD

PR-AUC 0.712, F1 0.801

Large metric gap vs common sklearn anomaly baselines.

Predictive maintenance

Competitive

Auto(StudentTScatter)

F1 0.947 vs IsolationForest 0.944

IsolationForest is faster and has better PR-AUC.

IEEE-CIS fraud

Competitive, slow

RegularizedCauchy

PR-AUC 0.093 vs IsolationForest 0.084

Best tested unsupervised quality, but much slower.

Medical screening

Not best

Auto(StudentTScatter)

PR-AUC 0.567 vs EllipticEnvelope 0.629

Honest negative/diagnostic result.

Finance market stress

Diagnostic

RegularizedCauchy

23 / 899 days detected

Top days cluster around stress-like periods.

Rolling-window finance

Diagnostic

RegularizedCauchy

5 / 176 windows detected

Top windows cluster around September stress regimes.

Why UNSW-NB15 is not highlighted¶

The commonly used UNSW-NB15 training split can contain a very high attack fraction. That makes it less like rare-anomaly detection and more like unsupervised or semi-supervised classification. robustcov may still be useful there as a risk-ranking diagnostic, but it is not a clean headline anomaly benchmark for this package. We therefore do not highlight it in the external gallery.

Run external examples¶

External examples are optional and dataset-dependent. The recommended path is:

python examples_external/<script>.py --data path/to/data.csv --outdir results/external/<name>
python examples_external/collect_external_results.py \
  --root results/external \
  --outdir results/external_registry

The scripts, dataset notes, and notebook templates live under examples_external/. They are intentionally outside the core test suite because Kaggle datasets have separate licenses, download steps, and file sizes.

Next
Credit-card fraud result
Previous
Robust statistics background
Copyright ©
Made with Sphinx and @pradyunsg's Furo
On this page
  • External and Kaggle gallery
    • How to read the cards
    • Recommended result pages
    • Honest secondary results
    • Current documented external results
    • Why UNSW-NB15 is not highlighted
    • Run external examples