Algorithms ========== This page gives the mathematical and practical description of the estimators used by ``robustcov``. The package focuses on robust covariance/scatter estimation and robust-distance diagnostics, not on fitting a full probability model with density, sampler, AIC, or BIC. Notation -------- Let :math:`X = \{x_i\}_{i=1}^n`, with :math:`x_i \in \mathbb{R}^p`. A location estimate is :math:`\hat\mu`, a covariance or scatter estimate is :math:`\hat\Sigma`, and robust squared Mahalanobis distances are .. math:: d_i^2 = (x_i - \hat\mu)^T \hat\Sigma^{-1} (x_i - \hat\mu). For shape-only estimators such as Tyler's estimator, the scale of :math:`\hat\Sigma` is not identified by the estimating equation. ``robustcov`` normalizes shape matrices and optionally applies a radial scale correction for diagnostics. FastMCD / MinCovDet ------------------- ``FastMCD`` is the package's estimator for the classical contamination model: most observations come from a compact elliptical bulk and a minority are outliers. It approximates the minimum covariance determinant problem: find a subset :math:`H` of size :math:`h` whose empirical covariance has small determinant. .. math:: H^* \approx \arg\min_{|H|=h} \det\left( \frac{1}{h-1} \sum_{i \in H} (x_i-\bar x_H)(x_i-\bar x_H)^T \right). The raw subset location and covariance are .. math:: \hat\mu_H = \frac{1}{h}\sum_{i\in H} x_i, \qquad \hat\Sigma_H = \frac{1}{h-1}\sum_{i\in H}(x_i-\hat\mu_H)(x_i-\hat\mu_H)^T. The FastMCD idea is the **C-step**. Starting from a candidate subset, compute Mahalanobis distances with the current subset covariance and keep the :math:`h` observations with smallest distances. The C-step has the monotonicity property that it does not increase the determinant under regularity conditions. .. code-block:: text random elemental starts ↓ short C-steps ↓ retain best determinant candidates ↓ full C-step polishing ↓ raw robust location/covariance ↓ reweighting by robust distances ↓ final location/covariance and support diagnostics In ``robustcov``, the final covariance is computed on the selected/reweighted support. This is important under contamination: rescaling the final covariance using all observations can reintroduce outlier inflation. ``FastMCD`` is best when :math:`n \gg p` and outliers are separable. It is not the right tool for :math:`p > n` covariance recovery or diffuse heavy tails. Tyler shape estimator --------------------- Tyler's estimator is a distribution-free shape estimator for elliptical data. It estimates the shape matrix up to scale by solving the fixed-point equation .. math:: \hat S = \frac{p}{n}\sum_{i=1}^n \frac{z_i z_i^T}{z_i^T \hat S^{-1} z_i}, \qquad z_i = x_i - \hat\mu, with a normalization such as .. math:: \operatorname{tr}(\hat S) = p. The radial weight is .. math:: w_i(d_i^2) = \frac{p}{d_i^2}. This makes Tyler's estimator highly robust to radial outliers because observations with large robust distances receive small weights. Since the estimator is shape-only, it is often paired with a separate scale correction or used primarily for robust distances and shape diagnostics. Regularized Tyler / KL Tyler / Wiesel Tyler ------------------------------------------- When :math:`p` is close to :math:`n` or :math:`p > n`, unregularized scatter estimates can become singular or unstable. ``RegularizedTyler`` shrinks the Tyler update toward a target matrix :math:`T`, typically the identity or a diagonal target: .. math:: S_{\text{Tyler}} = \frac{p}{n}\sum_{i=1}^n \frac{z_i z_i^T}{z_i^T S^{-1} z_i}, .. math:: S_{\text{new}} = (1-\alpha) S_{\text{Tyler}} + \alpha T, \qquad 0 \leq \alpha \leq 1. The result is normalized after each update. Shrinkage improves conditioning and makes the estimator usable in high-dimensional small-sample regimes. In the current MVP, ``KLRegularizedTyler`` and ``WieselTyler`` are documented aliases around this regularized Tyler prototype. They keep the API space open for a future exact objective-specific implementation. Geometry note. Regularized Tyler and Wiesel-style estimators are often understood through the geometry of the symmetric positive-definite cone. Their objectives can be geodesically convex under appropriate formulations, even when they are not ordinary Euclidean-convex functions of the matrix entries. This is why the package documentation separates the fixed-point update used in the MVP from stronger mathematical claims about an exact KL/Wiesel objective. The current implementation is pragmatic; future versions may expose objective-level solvers once the exact formulation is stabilized. Student-t scatter ----------------- ``StudentTScatter`` is an iteratively reweighted covariance estimator motivated by the multivariate Student-t model with fixed degrees of freedom :math:`\nu`. Given squared robust distances :math:`d_i^2`, it uses the radial weight .. math:: w_i(d_i^2) = \frac{\nu + p}{\nu + d_i^2}. The weighted update is .. math:: S_{\text{M}} = \frac{1}{\sum_i w_i}\sum_{i=1}^n w_i z_i z_i^T, followed by optional shrinkage .. math:: S_{\text{new}} = (1-\alpha)S_{\text{M}} + \alpha T. Smaller :math:`\nu` means heavier tails and more aggressive downweighting. Unlike MCD, Student-t scatter does not try to identify a hard subset. It is therefore useful when the whole data set is heavy-tailed rather than clean data plus a clearly separated outlier cloud. Regularized Cauchy ------------------ ``RegularizedCauchy`` is the very-heavy-tail member of the same M-estimator family. It corresponds to a Cauchy-like radial downweighting rule and shrinkage toward a stable target. In practice this is the current flagship estimator for small-sample heavy-tail covariance recovery. A simplified view is .. math:: w_i(d_i^2) \propto \frac{1 + p}{1 + d_i^2}, \qquad S_{\text{new}} = (1-\alpha)S_{\text{Cauchy}} + \alpha T. The benchmark gallery shows that this combination of aggressive radial downweighting and shrinkage can strongly outperform empirical covariance, Ledoit-Wolf, OAS, and MCD when the data are very heavy-tailed and :math:`p` is close to or larger than :math:`n`. HellingerRegularizedTyler, experimental --------------------------------------- ``HellingerRegularizedTyler`` is intentionally marked experimental. It applies Tyler-like radial weights with square-root-space shrinkage. It is useful for exploratory comparisons, but it should not yet be cited as the exact optimizer of a specific Hellinger objective. The API label is experimental until the objective and fixed-point update are finalized. AutoRobustScatter ----------------- ``AutoRobustScatter`` is a practical selector. It fits a small candidate set and chooses an estimator using a diagnostic or stability score. .. code-block:: text candidate estimators ↓ fit each candidate ↓ compute convergence, condition, tail, and distance diagnostics ↓ optionally compute split-sample stability ↓ choose the lowest score The diagnostic score combines convergence, finite covariance checks, condition-number penalties, and tail diagnostics. The stability score adds split-sample scatter stability. This is not an oracle: it is a pragmatic default for users who do not yet know whether Cauchy, Student-t, or Tyler is the best fit. Multimodal robust diagnostics ----------------------------- A single robust covariance estimator is designed for a setting that is approximately one central elliptical cloud plus contamination. In a genuinely multimodal distribution there may be several valid clouds: .. math:: X \sim \sum_{k=1}^K \pi_k F_k + \epsilon G, where each :math:`F_k` is a legitimate local population and :math:`G` is contamination. If a single global covariance is fitted to this mixture, smaller valid modes may be assigned very large robust distances and incorrectly flagged as outliers. ``ClusterRobustOutlierDetector`` is a pragmatic diagnostic for this case. It is not a full robust mixture model. It uses a two-stage procedure: .. code-block:: text cluster observations into K modes ↓ fit a robust scatter estimator inside each cluster ↓ score each point by distance to its assigned local cluster ↓ flag points with large local robust distances For an observation assigned to cluster :math:`c(i)`, the local score is .. math:: d_i^2 = (x_i - \hat\mu_{c(i)})^T \hat\Sigma_{c(i)}^{-1} (x_i - \hat\mu_{c(i)}). This is useful when multiple clusters are valid but each cluster is locally elliptical. It should not be sold as a replacement for robust mixture modeling: there is no likelihood, no EM algorithm, no automatic number-of-components selection, and no claim that the clustering step is itself robust. Its purpose is to prevent a global robust covariance model from treating legitimate modes as outliers. A future experimental layer could add trimmed Gaussian mixtures or robust Student-t mixtures, but that would move the package toward robust clustering. The current feature stays within the package scope: robust scatter plus interpretable diagnostics. Robust-distance diagnostics --------------------------- All estimators can be inspected through robust distances. ``robustcov`` reports radial kurtosis, QQ-tail deviation, condition number, detected fraction, and distance-profile plots. A useful normalized radial kurtosis diagnostic is .. math:: \kappa_r = \frac{\mathbb{E}[d^4]}{p(p+2)}, which is close to one for an ideal Gaussian elliptical model and larger for heavy tails or outlier-contaminated data. In practice, radial kurtosis should be interpreted together with QQ plots and the distance profile: high radial kurtosis can be a valid property of heavy-tailed data, not necessarily estimator failure. Estimator selection summary --------------------------- .. list-table:: Practical estimator guidance :header-rows: 1 * - Situation - Recommended estimator - Reason * - Separable outliers, :math:`n \gg p` - ``FastMCD`` - robust subset/support estimation and classical outlier diagnostics * - Small sample, heavy tails, :math:`p \approx n` or :math:`p > n` - ``RegularizedCauchy`` - aggressive radial downweighting plus shrinkage * - Smooth heavy-tailed covariance-like estimate - ``StudentTScatter`` - softer radial weights than Cauchy * - Shape estimation under elliptical heavy tails - ``RegularizedTyler`` - scale-free robust shape with shrinkage * - Unsure which heavy-tail estimator to use - ``AutoRobustScatter`` - diagnostic or stability-based selection References ---------- See :doc:`references` for the full bibliography. Key background includes Rousseeuw and Van Driessen for FastMCD, Tyler for shape estimation, Wiesel for regularized robust covariance, and standard Student-t/Cauchy M-estimation literature.