Kaggle and external examples ============================ Kaggle notebooks are a practical way to make robust covariance useful to a wider ML audience. Users usually arrive with a problem such as fraud detection, predictive maintenance, finance stress detection, or medical screening; they do not usually start by searching for Tyler or Cauchy scatter estimators. The examples in ``examples_external/`` are therefore designed as optional, publishable templates. They are **not part of tests** and are not required for the core docs build because they need external downloads and dataset-specific licenses. How to use these examples ------------------------- 1. Download the dataset manually from Kaggle or the dataset provider. 2. Run the matching script with ``--data /path/to/file.csv``. 3. Inspect the generated ``metrics.csv``, plots, and ``summary.md``. 4. Turn the result into a Kaggle notebook if it is competitive or interpretable. .. code-block:: bash python examples_external/kaggle_credit_card_fraud.py \ --data /path/to/creditcard.csv \ --outdir results/external/credit_card_fraud Each script writes: * ``metrics.csv`` with precision, recall, F1, ROC AUC, PR AUC, and runtime; * one or more metric plots; * a robust-score profile plot; * ``summary.md`` for easy notebook/report copying. Recommended external targets ---------------------------- .. raw:: html Credit-card fraud detection --------------------------- **Script:** ``examples_external/kaggle_credit_card_fraud.py`` **Expected data:** a CSV such as Kaggle's credit-card fraud dataset with a binary ``Class`` column. **Why it fits robustcov:** fraud is rare, the feature distribution is non-Gaussian, and robust distances provide an interpretable score for ranking suspicious transactions. **Recommended metric:** PR AUC first, then F1 at the chosen contamination threshold. ROC AUC can look deceptively high on very imbalanced fraud data. .. code-block:: bash python examples_external/kaggle_credit_card_fraud.py \ --data /path/to/creditcard.csv IEEE-CIS transaction fraud -------------------------- **Script:** ``examples_external/kaggle_ieee_cis_fraud.py`` **Expected data:** ``train_transaction.csv`` with ``isFraud`` and ``TransactionID`` columns. **Why it fits robustcov:** robustcov is unlikely to replace a full supervised gradient-boosting competition pipeline, but it can provide fast unsupervised transaction scores, preprocessing filters, and interpretable anomaly profiles. .. code-block:: bash python examples_external/kaggle_ieee_cis_fraud.py \ --data /path/to/train_transaction.csv \ --max-rows 100000 Predictive maintenance ---------------------- **Script:** ``examples_external/kaggle_predictive_maintenance.py`` **Expected data:** a sensor or equipment table with a failure indicator such as ``Machine failure``, ``failure``, or ``target``. **Why it fits robustcov:** faults often manifest as combinations of unusual sensor readings rather than one extreme variable. Robust covariance gives an interpretable multivariate score. .. code-block:: bash python examples_external/kaggle_predictive_maintenance.py \ --data /path/to/predictive_maintenance.csv Medical tabular screening ------------------------- **Script:** ``examples_external/kaggle_medical_screening.py`` **Expected data:** a diagnostic feature table with a label such as ``diagnosis``, ``target``, ``Class``, or ``outcome``. **Why it fits robustcov:** robust scores can support exploratory screening and data-quality analysis. They should not be interpreted as clinical decisions. .. code-block:: bash python examples_external/kaggle_medical_screening.py \ --data /path/to/medical.csv \ --label-column diagnosis \ --positive-label malignant Finance market stress and rolling regimes ----------------------------------------- **Scripts:** ``examples_external/finance_market_stress.py`` and ``examples_external/finance_rolling_window_anomaly.py`` **Expected data:** price or return CSV with a date column and one column per asset. This can come from Kaggle, Yahoo Finance exports, Bloomberg, Quandl, or another provider. .. code-block:: bash python examples_external/finance_market_stress.py \ --prices /path/to/prices.csv \ --outdir results/external/finance_market_stress python examples_external/finance_rolling_window_anomaly.py \ --prices /path/to/prices.csv \ --window 20 \ --step 5 \ --outdir results/external/finance_rolling_window Collecting external results --------------------------- After running several external examples, collect their ``metrics.csv`` files into a compact registry: .. code-block:: bash python examples_external/collect_external_results.py \ --root results/external \ --outdir results/external_registry This writes ``external_results.csv``, ``external_results.md``, and ``external_results.html``. Notebook template ----------------- A copyable Kaggle notebook template is included at: .. code-block:: text examples_external/notebooks/robustcov_kaggle_template.ipynb Use the scripts when you want reproducible local runs. Use the notebook template when publishing a short Kaggle walkthrough. What makes a good Kaggle notebook? ---------------------------------- A useful robustcov notebook should be short and evidence-driven: * load data and define the target; * compare robustcov with IsolationForest, LOF, and other familiar baselines; * report PR AUC, F1, ROC AUC, and runtime; * show score distributions or robust-distance profiles; * explain where robustcov is competitive and where supervised models are still better.