Medical screening¶
Status¶
Competitive, not best
On this medical tabular screening dataset, robustcov is close to the
tested baselines, but it is not the best method. EllipticEnvelope gives
the strongest F1 and PR-AUC in this run. We keep the result because it is a
useful negative/competitive example: robust covariance is not expected to win
every tabular anomaly benchmark.
Why this matters¶
Medical screening tables often contain mixed risk factors, nonlinear effects, and population-level confounders. Robust covariance can still provide a useful interpretable anomaly score, but this is not always the best standalone detector for such data.
Result summary¶
Method |
F1 |
PR-AUC |
ROC-AUC |
Seconds |
|---|---|---|---|---|
sklearn EllipticEnvelope |
0.5996 |
0.6285 |
0.6356 |
1.5929 |
sklearn IsolationForest |
0.5811 |
0.5941 |
0.6078 |
0.9813 |
robustcov Auto(StudentTScatter) |
0.5712 |
0.5674 |
0.5863 |
2.5164 |
sklearn LocalOutlierFactor |
0.5365 |
0.5391 |
0.5487 |
10.5699 |
PR-AUC comparison. robustcov is competitive but trails
EllipticEnvelope and IsolationForest on this dataset.¶
F1 comparison at a fixed detection budget.¶
Runtime comparison on a log scale.¶
Output from the run¶
medical screening benchmark
method,seconds,precision,recall,f1,roc_auc,pr_auc,detected
sklearn EllipticEnvelope,1.5929,0.5996,0.5996,0.5996,0.6356,0.6285,34979
sklearn IsolationForest,0.9813,0.5811,0.5811,0.5811,0.6078,0.5941,34979
robustcov Auto(StudentTScatter),2.5164,0.5712,0.5712,0.5712,0.5863,0.5674,34979
sklearn LocalOutlierFactor,10.5699,0.5365,0.5365,0.5365,0.5487,0.5391,34979
saved outputs to results/external/medical_screening
Interpretation¶
This is a useful trust-building result. It shows that robustcov is not
promoted as a universal winner. The dataset likely contains risk structure that
is not purely covariance-shaped; tree-based or supervised models may be more
appropriate in a production medical screening setting.
Recommendation¶
Use robustcov here as a diagnostic score or preprocessing feature rather
than the sole production detector. If labels are available, evaluate the robust
score alongside supervised models.