Medical screening

Status

Competitive, not best

On this medical tabular screening dataset, robustcov is close to the tested baselines, but it is not the best method. EllipticEnvelope gives the strongest F1 and PR-AUC in this run. We keep the result because it is a useful negative/competitive example: robust covariance is not expected to win every tabular anomaly benchmark.

Why this matters

Medical screening tables often contain mixed risk factors, nonlinear effects, and population-level confounders. Robust covariance can still provide a useful interpretable anomaly score, but this is not always the best standalone detector for such data.

Result summary

Medical screening external benchmark

Method

F1

PR-AUC

ROC-AUC

Seconds

sklearn EllipticEnvelope

0.5996

0.6285

0.6356

1.5929

sklearn IsolationForest

0.5811

0.5941

0.6078

0.9813

robustcov Auto(StudentTScatter)

0.5712

0.5674

0.5863

2.5164

sklearn LocalOutlierFactor

0.5365

0.5391

0.5487

10.5699

Medical screening PR-AUC comparison

PR-AUC comparison. robustcov is competitive but trails EllipticEnvelope and IsolationForest on this dataset.

Medical screening F1 comparison

F1 comparison at a fixed detection budget.

Medical screening runtime comparison

Runtime comparison on a log scale.

Output from the run

medical screening benchmark
method,seconds,precision,recall,f1,roc_auc,pr_auc,detected
sklearn EllipticEnvelope,1.5929,0.5996,0.5996,0.5996,0.6356,0.6285,34979
sklearn IsolationForest,0.9813,0.5811,0.5811,0.5811,0.6078,0.5941,34979
robustcov Auto(StudentTScatter),2.5164,0.5712,0.5712,0.5712,0.5863,0.5674,34979
sklearn LocalOutlierFactor,10.5699,0.5365,0.5365,0.5365,0.5487,0.5391,34979
saved outputs to results/external/medical_screening

Interpretation

This is a useful trust-building result. It shows that robustcov is not promoted as a universal winner. The dataset likely contains risk structure that is not purely covariance-shaped; tree-based or supervised models may be more appropriate in a production medical screening setting.

Recommendation

Use robustcov here as a diagnostic score or preprocessing feature rather than the sole production detector. If labels are available, evaluate the robust score alongside supervised models.