Small-sample heavy-tail benchmark

Question

What should a user do when the sample size is small, the dimension is not tiny, and the data are heavy-tailed? This is the regime where empirical covariance, Ledoit-Wolf, OAS, and classical MCD can become unstable or misleading.

Design

The benchmark simulates elliptical Student-t data over a grid of sample sizes, feature dimensions, and degrees of freedom. Smaller degrees of freedom mean heavier tails. For each setting, each estimator is compared to the known population scatter using relative Frobenius error.

The main output is not a single timing number. It is the ranking across the whole grid: win rate, mean rank, median error, and median runtime.

Summary table

Small-sample heavy-tail summary

method

appearances

failures

win_rate

mean_rank

median_error

mean_error

median_seconds

robustcov Cauchy

27

0

0.7407

1.4074

0.5994

0.7625

0.013307

robustcov StudentT(df=3)

27

0

0.0000

3.1852

0.6675

0.8793

0.015101

robustcov HellTyler(exp)

27

0

0.1111

3.2963

0.8503

0.9889

0.042900

robustcov RegTyler

27

0

0.0370

3.5185

0.8021

12.0413

0.005107

robustcov KLTyler

27

0

0.0370

4.5185

0.8021

12.0413

0.005073

sklearn MinCovDet

27

0

0.1111

6.2963

2.1739

15.6922

0.024392

sklearn LedoitWolf

27

0

0.0000

6.5556

2.5696

262.0354

0.000420

sklearn OAS

27

0

0.0000

7.5926

5.3643

1629.2743

0.000325

sklearn Empirical

27

0

0.0000

8.6296

6.9285

1681.1440

0.000449

Ranking plot

Small-sample heavy-tail mean-rank plot

Interpretation

The important result is that RegularizedCauchy is the strongest default in this grid. It has high win rate, low mean rank, and low median error. StudentTScatter is often close and is a smoother alternative when the user wants less aggressive Cauchy-style radial downweighting.

The benchmark also explains why the package should not be positioned as a generic collection of older robust estimators. MVE is historically important, but the strongest evidence here is for regularized heavy-tail scatter in small-sample settings.

Run it yourself

python benchmarks/small_sample_heavy_tail.py --csv results/small_sample.csv
python benchmarks/benchmark_summary.py \
  --input results/small_sample.csv \
  --csv results/small_sample_summary.csv \
  --html results/small_sample_summary.html \
  --markdown results/small_sample_summary.md