Fraud-style tabular anomaly screening ===================================== This example is the small, readable version of the credit-card fraud story. The data mimic ordinary transactions with a small suspicious tail, and robust distances are used as a ranking signal rather than as a black-box classifier. Result at a glance ------------------ FastMCD recovers almost all injected suspicious rows in this synthetic tabular setting: precision and recall are both about 0.986 with 70 flagged rows. The useful point is not just the score; the distance profile gives an interpretable audit trail for why those rows were flagged. What the data represent ----------------------- The generator creates transaction-like numerical features with a dominant clean population and a small group of shifted suspicious observations. This matches the regime where global robust covariance is usually appropriate: one main cloud plus separated anomalies. Why this estimator ------------------ ``FastMCD`` with a robust-distance threshold. FastMCD is a good first choice when anomalies are expected to sit outside a mostly elliptical normal bulk. Reproduce the result -------------------- .. code-block:: bash python examples/use_case_fraud_screening.py Output from the run ------------------- .. literalinclude:: ../_static/gallery/fraud_screening/output.txt :language: text Figures and diagnostics ----------------------- .. image:: ../_static/gallery/fraud_screening/distance_profile.png :alt: Fraud-style tabular anomaly screening — distance profile :width: 760px .. image:: ../_static/gallery/fraud_screening/distance_panel.png :alt: Fraud-style tabular anomaly screening — distance panel :width: 760px How to read the result ---------------------- Read the profile from left to right: the flat central region is the normal population and the rising tail is the suspicious queue. A sharp tail is a good sign for review workflows because it means the highest-ranked transactions are meaningfully different from the bulk. What this does not prove ------------------------ In real fraud systems, labels, transaction history, and categorical features matter. Treat robustcov scores as a high-signal unsupervised feature or triage layer, not a complete fraud model.