OpenMP scaling benchmark

Question

Does optional OpenMP parallelism improve speed on larger workloads?

Design

The benchmark runs the same estimator with different thread counts. BLAS thread counts should be set to one so OpenMP and BLAS do not oversubscribe the CPU.

OMP_NUM_THREADS=4 OPENBLAS_NUM_THREADS=1 MKL_NUM_THREADS=1 \
python benchmarks/openmp_scaling.py --n 8000 --p 20 --threads 1 2 4

Scaling table

OpenMP scaling

method

threads

median_seconds

min_seconds

max_seconds

speedup_vs_1

FastMCD

1

0.679777

0.679550

0.683289

1.000

FastMCD

2

0.381042

0.377283

0.381148

1.784

FastMCD

4

0.234113

0.232033

0.240740

2.904

RegularizedTyler

1

0.027655

0.026847

0.031376

1.000

RegularizedTyler

2

0.016086

0.015778

0.017262

1.719

RegularizedTyler

4

0.009717

0.009653

0.015875

2.846

Plot

OpenMP scaling plot

Interpretation

OpenMP helps most when the workload has enough rows, enough features, or enough random starts to pay for threading overhead. Small examples may not speed up because thread startup and scheduling costs dominate. In larger benchmark settings, robust distance evaluation, covariance accumulation, Tyler updates, and FastMCD candidate evaluation can all benefit.

Practical advice

Use explicit environment variables for reproducible timing:

OMP_NUM_THREADS=4 OPENBLAS_NUM_THREADS=1 MKL_NUM_THREADS=1

Inside Python, users can also control the package thread count:

import robustcov as rc

print(rc.has_openmp())
rc.set_num_threads(4)
est = rc.FastMCD(n_jobs=4, random_state=0).fit(X)