Kaggle and external examples¶
Kaggle notebooks are a practical way to make robust covariance useful to a wider ML audience. Users usually arrive with a problem such as fraud detection, predictive maintenance, finance stress detection, or medical screening; they do not usually start by searching for Tyler or Cauchy scatter estimators.
The examples in examples_external/ are therefore designed as optional, publishable templates. They are not part of tests and are not required for the core docs build because they need external downloads and dataset-specific licenses.
How to use these examples¶
Download the dataset manually from Kaggle or the dataset provider.
Run the matching script with
--data /path/to/file.csv.Inspect the generated
metrics.csv, plots, andsummary.md.Turn the result into a Kaggle notebook if it is competitive or interpretable.
python examples_external/kaggle_credit_card_fraud.py \
--data /path/to/creditcard.csv \
--outdir results/external/credit_card_fraud
Each script writes:
metrics.csvwith precision, recall, F1, ROC AUC, PR AUC, and runtime;one or more metric plots;
a robust-score profile plot;
summary.mdfor easy notebook/report copying.
Recommended external targets¶
PR-AUC
Credit-card fraud
Classic imbalanced anomaly detection. Good first Kaggle notebook target.
tabular
IEEE-CIS fraud
Use robustcov as a transaction screening score and as a feature for supervised models.
faults
Predictive maintenance
Equipment faults often appear as multivariate deviations from normal operation.
screening
Medical tabular screening
Robust scores are interpretable patient-level screening features, not clinical decisions.
Credit-card fraud detection¶
Script: examples_external/kaggle_credit_card_fraud.py
Expected data: a CSV such as Kaggle’s credit-card fraud dataset with a binary Class column.
Why it fits robustcov: fraud is rare, the feature distribution is non-Gaussian, and robust distances provide an interpretable score for ranking suspicious transactions.
Recommended metric: PR AUC first, then F1 at the chosen contamination threshold. ROC AUC can look deceptively high on very imbalanced fraud data.
python examples_external/kaggle_credit_card_fraud.py \
--data /path/to/creditcard.csv
IEEE-CIS transaction fraud¶
Script: examples_external/kaggle_ieee_cis_fraud.py
Expected data: train_transaction.csv with isFraud and TransactionID columns.
Why it fits robustcov: robustcov is unlikely to replace a full supervised gradient-boosting competition pipeline, but it can provide fast unsupervised transaction scores, preprocessing filters, and interpretable anomaly profiles.
python examples_external/kaggle_ieee_cis_fraud.py \
--data /path/to/train_transaction.csv \
--max-rows 100000
Predictive maintenance¶
Script: examples_external/kaggle_predictive_maintenance.py
Expected data: a sensor or equipment table with a failure indicator such as Machine failure, failure, or target.
Why it fits robustcov: faults often manifest as combinations of unusual sensor readings rather than one extreme variable. Robust covariance gives an interpretable multivariate score.
python examples_external/kaggle_predictive_maintenance.py \
--data /path/to/predictive_maintenance.csv
Medical tabular screening¶
Script: examples_external/kaggle_medical_screening.py
Expected data: a diagnostic feature table with a label such as diagnosis, target, Class, or outcome.
Why it fits robustcov: robust scores can support exploratory screening and data-quality analysis. They should not be interpreted as clinical decisions.
python examples_external/kaggle_medical_screening.py \
--data /path/to/medical.csv \
--label-column diagnosis \
--positive-label malignant
Finance market stress and rolling regimes¶
Scripts: examples_external/finance_market_stress.py and
examples_external/finance_rolling_window_anomaly.py
Expected data: price or return CSV with a date column and one column per asset. This can come from Kaggle, Yahoo Finance exports, Bloomberg, Quandl, or another provider.
python examples_external/finance_market_stress.py \
--prices /path/to/prices.csv \
--outdir results/external/finance_market_stress
python examples_external/finance_rolling_window_anomaly.py \
--prices /path/to/prices.csv \
--window 20 \
--step 5 \
--outdir results/external/finance_rolling_window
Collecting external results¶
After running several external examples, collect their metrics.csv files into
a compact registry:
python examples_external/collect_external_results.py \
--root results/external \
--outdir results/external_registry
This writes external_results.csv, external_results.md, and
external_results.html.
Notebook template¶
A copyable Kaggle notebook template is included at:
examples_external/notebooks/robustcov_kaggle_template.ipynb
Use the scripts when you want reproducible local runs. Use the notebook template when publishing a short Kaggle walkthrough.
What makes a good Kaggle notebook?¶
A useful robustcov notebook should be short and evidence-driven:
load data and define the target;
compare robustcov with IsolationForest, LOF, and other familiar baselines;
report PR AUC, F1, ROC AUC, and runtime;
show score distributions or robust-distance profiles;
explain where robustcov is competitive and where supervised models are still better.