FAQ¶
Why not prioritize MVE?¶
Minimum-volume ellipsoid estimators are historically important, but they are not the current differentiator for this package. The benchmark evidence is stronger for efficient FastMCD and regularized heavy-tail scatter estimators such as RegularizedCauchy and StudentTScatter.
When should I use FastMCD?¶
Use FastMCD when the data are mostly clean, the outliers are separable, and n is comfortably larger than p. It is the most interpretable choice for classical robust distances and support diagnostics.
When should I use RegularizedCauchy?¶
Use RegularizedCauchy for small-sample, high-dimensional, heavy-tailed covariance problems. It is currently the strongest method in the package’s small-sample heavy-tail benchmark gallery.
When should I use StudentTScatter?¶
Use StudentTScatter when you want a smooth heavy-tail covariance-like estimator with a fixed degrees-of-freedom parameter. It is often competitive with Cauchy and can be easier to tune conceptually.
When should I use AutoRobustScatter?¶
Use AutoRobustScatter for exploratory workflows when you want the package to compare several robust scatter candidates. It is a diagnostic selector, not an oracle. For production, inspect the chosen estimator and the robust-distance diagnostics.
Are OpenMP results deterministic?¶
For fixed random seeds, estimator choices are intended to be reproducible. Parallel floating-point reductions can still introduce tiny numerical differences because summation order changes. This is normal for parallel numerical code.
Why are there both a Benchmark Gallery and a Use-case Gallery?¶
The benchmark gallery answers evidence questions: speed, ranking, scaling, and failure modes. The use-case gallery answers application questions: what to do for finance, fraud, sensors, embeddings, and ML preprocessing.