Multimodal anomaly detection¶
A single robust covariance model assumes one main cloud. Many real datasets have several legitimate modes: customer segments, operating regimes, embedding clusters, or image-feature groups. In that setting, a global center can make valid small modes look suspicious.
Result at a glance¶
The cluster-aware detector improves F1 from 0.486 to 0.800 at the same detection budget. That gap is the whole point of the example: the problem is not only robustness to outliers, but respecting multiple valid centers.
What the data represent¶
The example creates three valid two-dimensional modes and a small set of anomalies placed between and around those modes.
Why this estimator¶
ClusterRobustOutlierDetector clusters first, then fits local robust scatter models. It is a pragmatic diagnostic, not a full robust mixture model.
Reproduce the result¶
python examples/use_case_multimodal_anomaly.py
Output from the run¶
multimodal anomaly detection demo
method,precision,recall,f1,detected
global RegularizedCauchy,0.486,0.486,0.486,35
cluster robust detector,0.800,0.800,0.800,35
ClusterRobustOutlierDetector(n_clusters=3); cluster_sizes=[194, 188, 193]; valid_clusters=[True, True, True]; threshold=9.043; detected=35
saved metrics to results/use_cases/multimodal_anomaly/metrics.csv
saved figure: results/use_cases/multimodal_anomaly/cluster_distance_panel.png
saved figure: results/use_cases/multimodal_anomaly/global_distance_profile.png
saved diagnostics to results/use_cases/multimodal_anomaly
Figures and diagnostics¶
How to read the result¶
The global profile answers “far from one global center?”; the cluster panel answers “far from the assigned local mode?” For multimodal data, the second question is usually the useful one.
What this does not prove¶
Use this when clusters are meaningful. If clustering is unstable or arbitrary, compare several n_clusters values and inspect cluster stability before trusting the outlier list.