Optional OpenMP acceleration

robustcov can use OpenMP in its C++ extension when the compiler and build environment provide it. OpenMP is optional: builds without OpenMP still work and fall back to the serial kernels.

What is parallelized?

The current OpenMP path targets the loops that are shared by several estimators:

  • robust / Mahalanobis distance evaluation over observations;

  • column means and covariance accumulation over selected subsets;

  • Tyler and regularized Tyler scatter accumulation;

  • FastMCD random-start evaluation and final candidate polishing.

These are useful first targets because they are repeated many times inside FastMCD C-steps, reweighting, Tyler iterations, diagnostics, and benchmark runs.

Thread-control API

Use the process-wide helpers when you want global control:

import robustcov as rc

print(rc.has_openmp())
print(rc.get_num_threads())
rc.set_num_threads(4)

Estimators that call the C++ backend also accept n_jobs:

est = rc.FastMCD(n_init=500, n_jobs=4, random_state=0).fit(X)
ty = rc.RegularizedTyler(alpha=0.1, n_jobs=4).fit(X)

For temporary changes, use thread_limit:

with rc.thread_limit(2):
    est = rc.FastMCD(n_init=1000, random_state=0).fit(X)

Benchmarking scaling

Run:

python benchmarks/openmp_scaling.py --n 8000 --p 20 --threads 1 2 4 --csv results/openmp_scaling.csv

Interpret scaling carefully. Small data can be slower with multiple threads because thread launch and reduction overhead dominate. The gains should be most visible for larger n, larger p, and many FastMCD starts.

BLAS and OpenMP interaction

Some NumPy/SciPy builds also use threaded BLAS. If BLAS and OpenMP both use many threads, oversubscription can hurt performance. For clean OpenMP benchmarks, set BLAS thread counts explicitly, for example:

OMP_NUM_THREADS=4 OPENBLAS_NUM_THREADS=1 MKL_NUM_THREADS=1 python benchmarks/openmp_scaling.py

Determinism

Random starts are generated serially so random_state remains reproducible. Parallel reductions may still cause tiny floating-point differences because summation order changes. These should be numerically negligible.

Benchmark integration

The benchmark report includes OpenMP scaling by default:

OMP_NUM_THREADS=4 OPENBLAS_NUM_THREADS=1 MKL_NUM_THREADS=1 \
python benchmarks/make_report.py --outdir results/report

The generated report includes openmp_scaling.csv and openmp_scaling.png. For larger n, higher p, and more FastMCD starts, the parallel benefit should become more visible.

A common shell mistake is to paste Python snippets directly into bash. Use python - <<'PY' or an interactive Python session when checking OpenMP helpers:

python - <<'PY'
import robustcov as rc
print(rc.has_openmp())
print(rc.get_num_threads())
PY