Algorithms¶

This page gives the mathematical and practical description of the estimators used by robustcov. The package focuses on robust covariance/scatter estimation and robust-distance diagnostics, not on fitting a full probability model with density, sampler, AIC, or BIC.

Notation¶

Let \(X = \{x_i\}_{i=1}^n\), with \(x_i \in \mathbb{R}^p\). A location estimate is \(\hat\mu\), a covariance or scatter estimate is \(\hat\Sigma\), and robust squared Mahalanobis distances are

\[d_i^2 = (x_i - \hat\mu)^T \hat\Sigma^{-1} (x_i - \hat\mu).\]

For shape-only estimators such as Tyler’s estimator, the scale of \(\hat\Sigma\) is not identified by the estimating equation. robustcov normalizes shape matrices and optionally applies a radial scale correction for diagnostics.

FastMCD / MinCovDet¶

FastMCD is the package’s estimator for the classical contamination model: most observations come from a compact elliptical bulk and a minority are outliers. It approximates the minimum covariance determinant problem: find a subset \(H\) of size \(h\) whose empirical covariance has small determinant.

\[H^* \approx \arg\min_{|H|=h} \det\left( \frac{1}{h-1} \sum_{i \in H} (x_i-\bar x_H)(x_i-\bar x_H)^T \right).\]

The raw subset location and covariance are

\[\hat\mu_H = \frac{1}{h}\sum_{i\in H} x_i, \qquad \hat\Sigma_H = \frac{1}{h-1}\sum_{i\in H}(x_i-\hat\mu_H)(x_i-\hat\mu_H)^T.\]

The FastMCD idea is the C-step. Starting from a candidate subset, compute Mahalanobis distances with the current subset covariance and keep the \(h\) observations with smallest distances. The C-step has the monotonicity property that it does not increase the determinant under regularity conditions.

random elemental starts
    ↓
short C-steps
    ↓
retain best determinant candidates
    ↓
full C-step polishing
    ↓
raw robust location/covariance
    ↓
reweighting by robust distances
    ↓
final location/covariance and support diagnostics

In robustcov, the final covariance is computed on the selected/reweighted support. This is important under contamination: rescaling the final covariance using all observations can reintroduce outlier inflation. FastMCD is best when \(n \gg p\) and outliers are separable. It is not the right tool for \(p > n\) covariance recovery or diffuse heavy tails.

Tyler shape estimator¶

Tyler’s estimator is a distribution-free shape estimator for elliptical data. It estimates the shape matrix up to scale by solving the fixed-point equation

\[\hat S = \frac{p}{n}\sum_{i=1}^n \frac{z_i z_i^T}{z_i^T \hat S^{-1} z_i}, \qquad z_i = x_i - \hat\mu,\]

with a normalization such as

\[\operatorname{tr}(\hat S) = p.\]

The radial weight is

\[w_i(d_i^2) = \frac{p}{d_i^2}.\]

This makes Tyler’s estimator highly robust to radial outliers because observations with large robust distances receive small weights. Since the estimator is shape-only, it is often paired with a separate scale correction or used primarily for robust distances and shape diagnostics.

Regularized Tyler / KL Tyler / Wiesel Tyler¶

When \(p\) is close to \(n\) or \(p > n\), unregularized scatter estimates can become singular or unstable. RegularizedTyler shrinks the Tyler update toward a target matrix \(T\), typically the identity or a diagonal target:

\[S_{\text{Tyler}} = \frac{p}{n}\sum_{i=1}^n \frac{z_i z_i^T}{z_i^T S^{-1} z_i},\]

\[S_{\text{new}} = (1-\alpha) S_{\text{Tyler}} + \alpha T, \qquad 0 \leq \alpha \leq 1.\]

The result is normalized after each update. Shrinkage improves conditioning and makes the estimator usable in high-dimensional small-sample regimes. In the current MVP, KLRegularizedTyler and WieselTyler are documented aliases around this regularized Tyler prototype. They keep the API space open for a future exact objective-specific implementation.

Geometry note. Regularized Tyler and Wiesel-style estimators are often understood through the geometry of the symmetric positive-definite cone. Their objectives can be geodesically convex under appropriate formulations, even when they are not ordinary Euclidean-convex functions of the matrix entries. This is why the package documentation separates the fixed-point update used in the MVP from stronger mathematical claims about an exact KL/Wiesel objective. The current implementation is pragmatic; future versions may expose objective-level solvers once the exact formulation is stabilized.

Student-t scatter¶

StudentTScatter is an iteratively reweighted covariance estimator motivated by the multivariate Student-t model with fixed degrees of freedom \(\nu\). Given squared robust distances \(d_i^2\), it uses the radial weight

\[w_i(d_i^2) = \frac{\nu + p}{\nu + d_i^2}.\]

The weighted update is

\[S_{\text{M}} = \frac{1}{\sum_i w_i}\sum_{i=1}^n w_i z_i z_i^T,\]

followed by optional shrinkage

\[S_{\text{new}} = (1-\alpha)S_{\text{M}} + \alpha T.\]

Smaller \(\nu\) means heavier tails and more aggressive downweighting. Unlike MCD, Student-t scatter does not try to identify a hard subset. It is therefore useful when the whole data set is heavy-tailed rather than clean data plus a clearly separated outlier cloud.

Regularized Cauchy¶

RegularizedCauchy is the very-heavy-tail member of the same M-estimator family. It corresponds to a Cauchy-like radial downweighting rule and shrinkage toward a stable target. In practice this is the current flagship estimator for small-sample heavy-tail covariance recovery.

A simplified view is

\[w_i(d_i^2) \propto \frac{1 + p}{1 + d_i^2}, \qquad S_{\text{new}} = (1-\alpha)S_{\text{Cauchy}} + \alpha T.\]

The benchmark gallery shows that this combination of aggressive radial downweighting and shrinkage can strongly outperform empirical covariance, Ledoit-Wolf, OAS, and MCD when the data are very heavy-tailed and \(p\) is close to or larger than \(n\).

HellingerRegularizedTyler, experimental¶

HellingerRegularizedTyler is intentionally marked experimental. It applies Tyler-like radial weights with square-root-space shrinkage. It is useful for exploratory comparisons, but it should not yet be cited as the exact optimizer of a specific Hellinger objective. The API label is experimental until the objective and fixed-point update are finalized.

AutoRobustScatter¶

AutoRobustScatter is a practical selector. It fits a small candidate set and chooses an estimator using a diagnostic or stability score.

candidate estimators
    ↓
fit each candidate
    ↓
compute convergence, condition, tail, and distance diagnostics
    ↓
optionally compute split-sample stability
    ↓
choose the lowest score

The diagnostic score combines convergence, finite covariance checks, condition-number penalties, and tail diagnostics. The stability score adds split-sample scatter stability. This is not an oracle: it is a pragmatic default for users who do not yet know whether Cauchy, Student-t, or Tyler is the best fit.

Multimodal robust diagnostics¶

A single robust covariance estimator is designed for a setting that is approximately one central elliptical cloud plus contamination. In a genuinely multimodal distribution there may be several valid clouds:

\[X \sim \sum_{k=1}^K \pi_k F_k + \epsilon G,\]

where each \(F_k\) is a legitimate local population and \(G\) is contamination. If a single global covariance is fitted to this mixture, smaller valid modes may be assigned very large robust distances and incorrectly flagged as outliers.

ClusterRobustOutlierDetector is a pragmatic diagnostic for this case. It is not a full robust mixture model. It uses a two-stage procedure:

cluster observations into K modes
    ↓
fit a robust scatter estimator inside each cluster
    ↓
score each point by distance to its assigned local cluster
    ↓
flag points with large local robust distances

For an observation assigned to cluster \(c(i)\), the local score is

\[d_i^2 = (x_i - \hat\mu_{c(i)})^T \hat\Sigma_{c(i)}^{-1} (x_i - \hat\mu_{c(i)}).\]

This is useful when multiple clusters are valid but each cluster is locally elliptical. It should not be sold as a replacement for robust mixture modeling: there is no likelihood, no EM algorithm, no automatic number-of-components selection, and no claim that the clustering step is itself robust. Its purpose is to prevent a global robust covariance model from treating legitimate modes as outliers.

A future experimental layer could add trimmed Gaussian mixtures or robust Student-t mixtures, but that would move the package toward robust clustering. The current feature stays within the package scope: robust scatter plus interpretable diagnostics.

Robust-distance diagnostics¶

All estimators can be inspected through robust distances. robustcov reports radial kurtosis, QQ-tail deviation, condition number, detected fraction, and distance-profile plots.

A useful normalized radial kurtosis diagnostic is

\[\kappa_r = \frac{\mathbb{E}[d^4]}{p(p+2)},\]

which is close to one for an ideal Gaussian elliptical model and larger for heavy tails or outlier-contaminated data. In practice, radial kurtosis should be interpreted together with QQ plots and the distance profile: high radial kurtosis can be a valid property of heavy-tailed data, not necessarily estimator failure.

Estimator selection summary¶

Practical estimator guidance¶
Situation	Recommended estimator	Reason
Separable outliers, \(n \gg p\)	`FastMCD`	robust subset/support estimation and classical outlier diagnostics
Small sample, heavy tails, \(p \approx n\) or \(p > n\)	`RegularizedCauchy`	aggressive radial downweighting plus shrinkage
Smooth heavy-tailed covariance-like estimate	`StudentTScatter`	softer radial weights than Cauchy
Shape estimation under elliptical heavy tails	`RegularizedTyler`	scale-free robust shape with shrinkage
Unsure which heavy-tail estimator to use	`AutoRobustScatter`	diagnostic or stability-based selection

References¶

See References for the full bibliography. Key background includes Rousseeuw and Van Driessen for FastMCD, Tyler for shape estimation, Wiesel for regularized robust covariance, and standard Student-t/Cauchy M-estimation literature.