Fraud-style tabular anomaly screening

This example is the small, readable version of the credit-card fraud story. The data mimic ordinary transactions with a small suspicious tail, and robust distances are used as a ranking signal rather than as a black-box classifier.

Result at a glance

FastMCD recovers almost all injected suspicious rows in this synthetic tabular setting: precision and recall are both about 0.986 with 70 flagged rows. The useful point is not just the score; the distance profile gives an interpretable audit trail for why those rows were flagged.

What the data represent

The generator creates transaction-like numerical features with a dominant clean population and a small group of shifted suspicious observations. This matches the regime where global robust covariance is usually appropriate: one main cloud plus separated anomalies.

Why this estimator

FastMCD with a robust-distance threshold. FastMCD is a good first choice when anomalies are expected to sit outside a mostly elliptical normal bulk.

Reproduce the result

python examples/use_case_fraud_screening.py

Output from the run

fraud-style tabular anomaly screening
precision=0.986, recall=0.986, detected=70
radial_kurtosis=5.578, support=1098
saved diagnostics to results/use_cases/fraud

Figures and diagnostics

Fraud-style tabular anomaly screening — distance profile Fraud-style tabular anomaly screening — distance panel

How to read the result

Read the profile from left to right: the flat central region is the normal population and the rising tail is the suspicious queue. A sharp tail is a good sign for review workflows because it means the highest-ranked transactions are meaningfully different from the bulk.

What this does not prove

In real fraud systems, labels, transaction history, and categorical features matter. Treat robustcov scores as a high-signal unsupervised feature or triage layer, not a complete fraud model.