External and Kaggle gallery =========================== This page is the single entry point for optional Kaggle and external-data examples. These examples are **not part of tests** because they require manual downloads, dataset-specific licenses, or larger local files. The goal is not to claim that ``robustcov`` wins everywhere. The goal is to show where robust covariance gives a strong advantage, where it is competitive, where it is mainly diagnostic, and where another method is better. How to read the cards --------------------- .. list-table:: Result labels :header-rows: 1 * - Label - Meaning * - Strong win - robustcov clearly improves the most relevant metric against common unsupervised baselines. * - Competitive - robustcov is close to the best method, or wins one metric but loses another. * - Competitive, slow - robustcov improves quality but runtime is currently a weakness. * - Not best - another baseline performs better; the robustcov result is still reported for transparency. * - Diagnostic - there are no ground-truth labels, but robust distances provide interpretable stress/anomaly rankings. Recommended result pages ------------------------ .. raw:: html

Credit-card fraud

Strong win. FastMCD PR-AUC 0.712 and F1 0.801 on a classic imbalanced fraud dataset.

Predictive maintenance

Competitive. robustcov gives the best F1, while IsolationForest has stronger PR-AUC and speed.

Finance market stress

Diagnostic. RegularizedCauchy ranks unusual cross-asset return days.

Rolling market regimes

Diagnostic. Window-level features identify abnormal volatility/correlation regimes.

Honest secondary results ------------------------ .. raw:: html

IEEE-CIS fraud

Competitive, slow. Best tested unsupervised quality, but runtime is a major weakness.

Medical screening

Not best. Useful diagnostic result; EllipticEnvelope wins this benchmark.

Current documented external results ----------------------------------- .. list-table:: External result registry :header-rows: 1 * - Dataset / example - Status - Main method - Headline result - Notes * - Credit-card fraud - Strong win - FastMCD - PR-AUC 0.712, F1 0.801 - Large metric gap vs common sklearn anomaly baselines. * - Predictive maintenance - Competitive - Auto(StudentTScatter) - F1 0.947 vs IsolationForest 0.944 - IsolationForest is faster and has better PR-AUC. * - IEEE-CIS fraud - Competitive, slow - RegularizedCauchy - PR-AUC 0.093 vs IsolationForest 0.084 - Best tested unsupervised quality, but much slower. * - Medical screening - Not best - Auto(StudentTScatter) - PR-AUC 0.567 vs EllipticEnvelope 0.629 - Honest negative/diagnostic result. * - Finance market stress - Diagnostic - RegularizedCauchy - 23 / 899 days detected - Top days cluster around stress-like periods. * - Rolling-window finance - Diagnostic - RegularizedCauchy - 5 / 176 windows detected - Top windows cluster around September stress regimes. Why UNSW-NB15 is not highlighted -------------------------------- The commonly used UNSW-NB15 training split can contain a very high attack fraction. That makes it less like rare-anomaly detection and more like unsupervised or semi-supervised classification. ``robustcov`` may still be useful there as a risk-ranking diagnostic, but it is not a clean headline anomaly benchmark for this package. We therefore do not highlight it in the external gallery. Run external examples --------------------- External examples are optional and dataset-dependent. The recommended path is: .. code-block:: bash python examples_external/