External and Kaggle gallery¶

This page is the single entry point for optional Kaggle and external-data examples. These examples are not part of tests because they require manual downloads, dataset-specific licenses, or larger local files.

The goal is not to claim that robustcov wins everywhere. The goal is to show where robust covariance gives a strong advantage, where it is competitive, where it is mainly diagnostic, and where another method is better.

How to read the cards¶

Result labels¶
Label	Meaning
Strong win	robustcov clearly improves the most relevant metric against common unsupervised baselines.
Competitive	robustcov is close to the best method, or wins one metric but loses another.
Competitive, slow	robustcov improves quality but runtime is currently a weakness.
Not best	another baseline performs better; the robustcov result is still reported for transparency.
Diagnostic	there are no ground-truth labels, but robust distances provide interpretable stress/anomaly rankings.

Honest secondary results¶

IEEE-CIS fraud

Competitive, slow. Best tested unsupervised quality, but runtime is a major weakness.

Medical screening

Not best. Useful diagnostic result; EllipticEnvelope wins this benchmark.

Current documented external results¶

External result registry¶
Dataset / example	Status	Main method	Headline result	Notes
Credit-card fraud	Strong win	FastMCD	PR-AUC 0.712, F1 0.801	Large metric gap vs common sklearn anomaly baselines.
Predictive maintenance	Competitive	Auto(StudentTScatter)	F1 0.947 vs IsolationForest 0.944	IsolationForest is faster and has better PR-AUC.
IEEE-CIS fraud	Competitive, slow	RegularizedCauchy	PR-AUC 0.093 vs IsolationForest 0.084	Best tested unsupervised quality, but much slower.
Medical screening	Not best	Auto(StudentTScatter)	PR-AUC 0.567 vs EllipticEnvelope 0.629	Honest negative/diagnostic result.
Finance market stress	Diagnostic	RegularizedCauchy	23 / 899 days detected	Top days cluster around stress-like periods.
Rolling-window finance	Diagnostic	RegularizedCauchy	5 / 176 windows detected	Top windows cluster around September stress regimes.

Why UNSW-NB15 is not highlighted¶

The commonly used UNSW-NB15 training split can contain a very high attack fraction. That makes it less like rare-anomaly detection and more like unsupervised or semi-supervised classification. robustcov may still be useful there as a risk-ranking diagnostic, but it is not a clean headline anomaly benchmark for this package. We therefore do not highlight it in the external gallery.

Run external examples¶

External examples are optional and dataset-dependent. The recommended path is:

python examples_external/<script>.py --data path/to/data.csv --outdir results/external/<name>
python examples_external/collect_external_results.py \
  --root results/external \
  --outdir results/external_registry

The scripts, dataset notes, and notebook templates live under examples_external/. They are intentionally outside the core test suite because Kaggle datasets have separate licenses, download steps, and file sizes.

External and Kaggle gallery¶

How to read the cards¶

Recommended result pages¶

Credit-card fraud

Predictive maintenance

Finance market stress

Rolling market regimes