IEEE-CIS fraud

Status

Best quality among tested unsupervised baselines, but slow

RegularizedCauchy achieved the best F1, ROC-AUC, and PR-AUC among the tested unsupervised baselines, but it was much slower than IsolationForest. This should be reported as a quality/interpretability result, not as a speed win.

Why this matters

IEEE-CIS is a large heterogeneous tabular fraud dataset. It contains mixed numeric/categorical behavior, missingness, and fraud signals that are often better handled by supervised gradient boosting. This makes it a good stress case for honest reporting: robust covariance can help, but it is not a magic solution for all tabular fraud problems.

Result summary

IEEE-CIS fraud external benchmark

Method

F1

PR-AUC

ROC-AUC

Seconds

robustcov RegularizedCauchy

0.1550

0.0931

0.7641

1367.0149

sklearn IsolationForest

0.1390

0.0838

0.7387

1.1571

sklearn EllipticEnvelope

0.0914

0.0753

0.7578

3045.0699

sklearn LocalOutlierFactor

0.0633

0.0452

0.6539

27.7558

IEEE-CIS PR-AUC comparison

PR-AUC comparison. RegularizedCauchy gives the best quality among these unsupervised baselines, but the margin over IsolationForest is modest.

IEEE-CIS F1 comparison

F1 comparison at the same detection budget.

IEEE-CIS runtime comparison

Runtime comparison on a log scale. The large runtime gap is the main reason this result is classified as competitive/slow rather than a strong win.

Output from the run

IEEE-CIS fraud benchmark
method,seconds,precision,recall,f1,roc_auc,pr_auc,detected
robustcov RegularizedCauchy,1367.0149,0.1550,0.1550,0.1550,0.7641,0.0931,2561
sklearn IsolationForest,1.1571,0.1390,0.1390,0.1390,0.7387,0.0838,2561
sklearn EllipticEnvelope,3045.0699,0.0914,0.0914,0.0914,0.7578,0.0753,2561
sklearn LocalOutlierFactor,27.7558,0.0633,0.0633,0.0633,0.6539,0.0452,2561
saved outputs to results/external/ieee_cis_fraud

Interpretation

This benchmark is useful but should be framed carefully. RegularizedCauchy improves unsupervised quality metrics, but the dataset is large and heterogeneous and the runtime is not yet competitive with IsolationForest. In practice, this robust anomaly score is most useful as an additional feature for a larger fraud pipeline, or as an interpretable unsupervised diagnostic.

Engineering follow-up

The next improvement for large Kaggle-style tabular data is a sampled-fit/full- score mode, for example fitting the robust scatter on 50k representative rows and scoring all rows. This would preserve much of the robust-distance signal while making the workflow much faster.