Optional OpenMP acceleration¶
robustcov can use OpenMP in its C++ extension when the compiler and build
environment provide it. OpenMP is optional: builds without OpenMP still work and
fall back to the serial kernels.
What is parallelized?¶
The current OpenMP path targets the loops that are shared by several estimators:
robust / Mahalanobis distance evaluation over observations;
column means and covariance accumulation over selected subsets;
Tyler and regularized Tyler scatter accumulation;
FastMCD random-start evaluation and final candidate polishing.
These are useful first targets because they are repeated many times inside FastMCD C-steps, reweighting, Tyler iterations, diagnostics, and benchmark runs.
Thread-control API¶
Use the process-wide helpers when you want global control:
import robustcov as rc
print(rc.has_openmp())
print(rc.get_num_threads())
rc.set_num_threads(4)
Estimators that call the C++ backend also accept n_jobs:
est = rc.FastMCD(n_init=500, n_jobs=4, random_state=0).fit(X)
ty = rc.RegularizedTyler(alpha=0.1, n_jobs=4).fit(X)
For temporary changes, use thread_limit:
with rc.thread_limit(2):
est = rc.FastMCD(n_init=1000, random_state=0).fit(X)
Benchmarking scaling¶
Run:
python benchmarks/openmp_scaling.py --n 8000 --p 20 --threads 1 2 4 --csv results/openmp_scaling.csv
Interpret scaling carefully. Small data can be slower with multiple threads
because thread launch and reduction overhead dominate. The gains should be most
visible for larger n, larger p, and many FastMCD starts.
BLAS and OpenMP interaction¶
Some NumPy/SciPy builds also use threaded BLAS. If BLAS and OpenMP both use many threads, oversubscription can hurt performance. For clean OpenMP benchmarks, set BLAS thread counts explicitly, for example:
OMP_NUM_THREADS=4 OPENBLAS_NUM_THREADS=1 MKL_NUM_THREADS=1 python benchmarks/openmp_scaling.py
Determinism¶
Random starts are generated serially so random_state remains reproducible.
Parallel reductions may still cause tiny floating-point differences because
summation order changes. These should be numerically negligible.
Benchmark integration¶
The benchmark report includes OpenMP scaling by default:
OMP_NUM_THREADS=4 OPENBLAS_NUM_THREADS=1 MKL_NUM_THREADS=1 \
python benchmarks/make_report.py --outdir results/report
The generated report includes openmp_scaling.csv and openmp_scaling.png. For larger
n, higher p, and more FastMCD starts, the parallel benefit should become more visible.
A common shell mistake is to paste Python snippets directly into bash. Use python - <<'PY'
or an interactive Python session when checking OpenMP helpers:
python - <<'PY'
import robustcov as rc
print(rc.has_openmp())
print(rc.get_num_threads())
PY