Small-sample heavy-tail benchmark ================================= Question -------- What should a user do when the sample size is small, the dimension is not tiny, and the data are heavy-tailed? This is the regime where empirical covariance, Ledoit-Wolf, OAS, and classical MCD can become unstable or misleading. Design ------ The benchmark simulates elliptical Student-t data over a grid of sample sizes, feature dimensions, and degrees of freedom. Smaller degrees of freedom mean heavier tails. For each setting, each estimator is compared to the known population scatter using relative Frobenius error. The main output is not a single timing number. It is the ranking across the whole grid: win rate, mean rank, median error, and median runtime. Summary table ------------- .. csv-table:: Small-sample heavy-tail summary :file: ../_static/benchmarks/small_sample_summary.csv :header-rows: 1 Ranking plot ------------ .. image:: ../_static/benchmarks/small_sample_rank.png :alt: Small-sample heavy-tail mean-rank plot :width: 760px Interpretation -------------- The important result is that ``RegularizedCauchy`` is the strongest default in this grid. It has high win rate, low mean rank, and low median error. ``StudentTScatter`` is often close and is a smoother alternative when the user wants less aggressive Cauchy-style radial downweighting. The benchmark also explains why the package should not be positioned as a generic collection of older robust estimators. MVE is historically important, but the strongest evidence here is for regularized heavy-tail scatter in small-sample settings. Run it yourself --------------- .. code-block:: bash python benchmarks/small_sample_heavy_tail.py --csv results/small_sample.csv python benchmarks/benchmark_summary.py \ --input results/small_sample.csv \ --csv results/small_sample_summary.csv \ --html results/small_sample_summary.html \ --markdown results/small_sample_summary.md