Algorithms¶
This page gives the mathematical and practical description of the estimators used by
robustcov. The package focuses on robust covariance/scatter estimation and robust-distance
diagnostics, not on fitting a full probability model with density, sampler, AIC, or BIC.
Notation¶
Let \(X = \{x_i\}_{i=1}^n\), with \(x_i \in \mathbb{R}^p\). A location estimate is \(\hat\mu\), a covariance or scatter estimate is \(\hat\Sigma\), and robust squared Mahalanobis distances are
For shape-only estimators such as Tyler’s estimator, the scale of \(\hat\Sigma\) is not
identified by the estimating equation. robustcov normalizes shape matrices and optionally
applies a radial scale correction for diagnostics.
FastMCD / MinCovDet¶
FastMCD is the package’s estimator for the classical contamination model: most observations
come from a compact elliptical bulk and a minority are outliers. It approximates the minimum
covariance determinant problem: find a subset \(H\) of size \(h\) whose empirical covariance
has small determinant.
The raw subset location and covariance are
The FastMCD idea is the C-step. Starting from a candidate subset, compute Mahalanobis distances with the current subset covariance and keep the \(h\) observations with smallest distances. The C-step has the monotonicity property that it does not increase the determinant under regularity conditions.
random elemental starts
↓
short C-steps
↓
retain best determinant candidates
↓
full C-step polishing
↓
raw robust location/covariance
↓
reweighting by robust distances
↓
final location/covariance and support diagnostics
In robustcov, the final covariance is computed on the selected/reweighted support. This is
important under contamination: rescaling the final covariance using all observations can reintroduce
outlier inflation. FastMCD is best when \(n \gg p\) and outliers are separable. It is not the
right tool for \(p > n\) covariance recovery or diffuse heavy tails.
Tyler shape estimator¶
Tyler’s estimator is a distribution-free shape estimator for elliptical data. It estimates the shape matrix up to scale by solving the fixed-point equation
with a normalization such as
The radial weight is
This makes Tyler’s estimator highly robust to radial outliers because observations with large robust distances receive small weights. Since the estimator is shape-only, it is often paired with a separate scale correction or used primarily for robust distances and shape diagnostics.
Regularized Tyler / KL Tyler / Wiesel Tyler¶
When \(p\) is close to \(n\) or \(p > n\), unregularized scatter estimates can become
singular or unstable. RegularizedTyler shrinks the Tyler update toward a target matrix
\(T\), typically the identity or a diagonal target:
The result is normalized after each update. Shrinkage improves conditioning and makes the estimator
usable in high-dimensional small-sample regimes. In the current MVP, KLRegularizedTyler and
WieselTyler are documented aliases around this regularized Tyler prototype. They keep the API
space open for a future exact objective-specific implementation.
Geometry note. Regularized Tyler and Wiesel-style estimators are often understood through the geometry of the symmetric positive-definite cone. Their objectives can be geodesically convex under appropriate formulations, even when they are not ordinary Euclidean-convex functions of the matrix entries. This is why the package documentation separates the fixed-point update used in the MVP from stronger mathematical claims about an exact KL/Wiesel objective. The current implementation is pragmatic; future versions may expose objective-level solvers once the exact formulation is stabilized.
Student-t scatter¶
StudentTScatter is an iteratively reweighted covariance estimator motivated by the multivariate
Student-t model with fixed degrees of freedom \(\nu\). Given squared robust distances
\(d_i^2\), it uses the radial weight
The weighted update is
followed by optional shrinkage
Smaller \(\nu\) means heavier tails and more aggressive downweighting. Unlike MCD, Student-t scatter does not try to identify a hard subset. It is therefore useful when the whole data set is heavy-tailed rather than clean data plus a clearly separated outlier cloud.
Regularized Cauchy¶
RegularizedCauchy is the very-heavy-tail member of the same M-estimator family. It corresponds
to a Cauchy-like radial downweighting rule and shrinkage toward a stable target. In practice this is
the current flagship estimator for small-sample heavy-tail covariance recovery.
A simplified view is
The benchmark gallery shows that this combination of aggressive radial downweighting and shrinkage can strongly outperform empirical covariance, Ledoit-Wolf, OAS, and MCD when the data are very heavy-tailed and \(p\) is close to or larger than \(n\).
HellingerRegularizedTyler, experimental¶
HellingerRegularizedTyler is intentionally marked experimental. It applies Tyler-like radial
weights with square-root-space shrinkage. It is useful for exploratory comparisons, but it should
not yet be cited as the exact optimizer of a specific Hellinger objective. The API label is
experimental until the objective and fixed-point update are finalized.
AutoRobustScatter¶
AutoRobustScatter is a practical selector. It fits a small candidate set and chooses an
estimator using a diagnostic or stability score.
candidate estimators
↓
fit each candidate
↓
compute convergence, condition, tail, and distance diagnostics
↓
optionally compute split-sample stability
↓
choose the lowest score
The diagnostic score combines convergence, finite covariance checks, condition-number penalties, and tail diagnostics. The stability score adds split-sample scatter stability. This is not an oracle: it is a pragmatic default for users who do not yet know whether Cauchy, Student-t, or Tyler is the best fit.
Multimodal robust diagnostics¶
A single robust covariance estimator is designed for a setting that is approximately one central elliptical cloud plus contamination. In a genuinely multimodal distribution there may be several valid clouds:
where each \(F_k\) is a legitimate local population and \(G\) is contamination. If a single global covariance is fitted to this mixture, smaller valid modes may be assigned very large robust distances and incorrectly flagged as outliers.
ClusterRobustOutlierDetector is a pragmatic diagnostic for this case. It is not a full robust
mixture model. It uses a two-stage procedure:
cluster observations into K modes
↓
fit a robust scatter estimator inside each cluster
↓
score each point by distance to its assigned local cluster
↓
flag points with large local robust distances
For an observation assigned to cluster \(c(i)\), the local score is
This is useful when multiple clusters are valid but each cluster is locally elliptical. It should not be sold as a replacement for robust mixture modeling: there is no likelihood, no EM algorithm, no automatic number-of-components selection, and no claim that the clustering step is itself robust. Its purpose is to prevent a global robust covariance model from treating legitimate modes as outliers.
A future experimental layer could add trimmed Gaussian mixtures or robust Student-t mixtures, but that would move the package toward robust clustering. The current feature stays within the package scope: robust scatter plus interpretable diagnostics.
Robust-distance diagnostics¶
All estimators can be inspected through robust distances. robustcov reports radial kurtosis,
QQ-tail deviation, condition number, detected fraction, and distance-profile plots.
A useful normalized radial kurtosis diagnostic is
which is close to one for an ideal Gaussian elliptical model and larger for heavy tails or outlier-contaminated data. In practice, radial kurtosis should be interpreted together with QQ plots and the distance profile: high radial kurtosis can be a valid property of heavy-tailed data, not necessarily estimator failure.
Estimator selection summary¶
Situation |
Recommended estimator |
Reason |
|---|---|---|
Separable outliers, \(n \gg p\) |
|
robust subset/support estimation and classical outlier diagnostics |
Small sample, heavy tails, \(p \approx n\) or \(p > n\) |
|
aggressive radial downweighting plus shrinkage |
Smooth heavy-tailed covariance-like estimate |
|
softer radial weights than Cauchy |
Shape estimation under elliptical heavy tails |
|
scale-free robust shape with shrinkage |
Unsure which heavy-tail estimator to use |
|
diagnostic or stability-based selection |
References¶
See References for the full bibliography. Key background includes Rousseeuw and Van Driessen for FastMCD, Tyler for shape estimation, Wiesel for regularized robust covariance, and standard Student-t/Cauchy M-estimation literature.