Exact Solutions to the Sparse Clustering Problem
Conference
Regional Statistics Conference 2026
Format: IPS Abstract - Malta 2026
Keywords: clustering, sparsity
Session: IPS 1230- Advances in Statistical Learning
Wednesday 3 June 2:30 p.m. - 4:10 p.m. (Europe/Malta)
Abstract
In this research, we consider the Sparse Clustering Problem, a variation to the Minimum Sum of Squares Clustering (MSSC). MSSC assumes Euclidean distances and cannot identify which features to ignore, leading to poor performance when data contains noisy or redundant dimensions. To address the latter limitation, recent work introduced feature weights that are bounded by Lasso- and Ridge-inspired penalties to ensure sparsity. Currently, only heuristic methods exist to solve this problem due to the complexity of the problem. In this research, we develop a branch-and-cut framework that supports general dissimilarity measures and finds provably optimal solutions for the Sparse Clustering Problem for instances up to 100 observations within seconds.