Credit-card fraud result ======================== Why this result matters ----------------------- Credit-card fraud is a popular imbalanced anomaly-detection benchmark. It is a useful public example because users already understand the task, and because PR AUC and F1 are more informative than accuracy on rare fraud events. Observed result --------------- A local external run reported the following table. .. list-table:: Credit-card fraud external result :header-rows: 1 * - Method - Seconds - Precision - Recall - F1 - ROC AUC - PR AUC * - robustcov FastMCD - 57.202 - 0.801 - 0.801 - 0.801 - 0.957 - 0.712 * - sklearn IsolationForest - 3.392 - 0.262 - 0.262 - 0.262 - 0.948 - 0.143 * - sklearn EllipticEnvelope - 12.518 - 0.213 - 0.213 - 0.213 - 0.920 - 0.125 * - sklearn LocalOutlierFactor - 35.981 - 0.000 - 0.000 - 0.000 - 0.513 - 0.002 Plots ----- .. figure:: ../_static/external_results/credit_card_fraud/pr_auc.png :alt: Credit-card fraud PR-AUC comparison :width: 95% PR-AUC comparison. This metric is important for rare fraud because it focuses on precision/recall behavior under severe class imbalance. .. figure:: ../_static/external_results/credit_card_fraud/f1.png :alt: Credit-card fraud F1 comparison :width: 95% Thresholded F1 comparison at the same detected-count level. Output from the run ------------------- .. literalinclude:: ../_static/external_results/credit_card_fraud/output.txt :language: text Interpretation -------------- ``robustcov FastMCD`` was slower than ``IsolationForest`` in this run, but it produced a much stronger thresholded fraud-screening result and a much higher PR AUC. This is a good Kaggle/notebook story because the robust-distance score is interpretable and the metric gap is large. How to reproduce ---------------- Download the credit-card fraud CSV manually, then run: .. code-block:: bash python examples_external/kaggle_credit_card_fraud.py \ --data /path/to/creditcard.csv \ --outdir results/external/credit_card_fraud Outputs ------- The script writes: * ``metrics.csv``; * ``pr_auc.png``; * ``f1.png``; * ``robust_score_profile.png``; * ``summary.md``. Production note --------------- This should be presented as an unsupervised screening score, not as a full competition-winning fraud pipeline. In supervised Kaggle settings, the robust score can also be used as a feature for gradient boosting or other classifiers.