A comparison between initialization strategies for the infinite hidden Markov model
Conference
Regional Statistics Conference 2026
Format: CPS Abstract - Malta 2026
Keywords: clustering, clustering,, markovprocess, nonparametric bayesian methods, time-series data
Session: CPS 02 Time Series
Wednesday 3 June 10 a.m. - 11 a.m. (Europe/Malta)
Abstract
Infinite hidden Markov models (iHMMs) provide a flexible Bayesian nonparametrics framework for modelling multivariate time series characterized by structural changes, nonlinear dynamics, and regime shifts. They assume the existence of a latent process that evolves according to a first-order Markov chain and governs state-dependent emission distributions. A key advantage of this framework is that the number of latent states, K, does not need to be specified a priori, as model complexity is inferred directly from the data through a countably infinite state space, of which only a finite subset is effectively used. This feature contrasts with finite hidden Markov models, where K must be selected in advance, typically by fitting multiple models over a grid of candidate values and choosing among them by using information criteria or cross-validation procedures. The flexibility of the iHMM is achieved by placing a hierarchical Dirichlet process prior on the transition probabilities, allowing the data to determine model complexity, while posterior inference is typically carried out via the beam sampler, which combines dynamic programming with slice sampling to adaptively truncate the infinite state space.
Despite the widespread use of iHMMs across disciplines such as econometrics, signal processing, and environmental sciences, the role of initialization in posterior inference has received little systematic attention. Existing literature commonly adopts a uniform initialization of latent states, as originally suggested for the beam sampler, implicitly assuming robustness to initial conditions. However, evidence from finite HMMs and mixture models indicates that initialization can critically affect convergence speed, stability, and the quality of inferred latent structures.
This study addresses this gap by conducting the first comprehensive investigation of initialization strategies for iHMMs with Gaussian emissions. We evaluate several approaches commonly employed in finite HMMs, including uniform initialization, distance-based clustering methods (k-means and partitioning around medoids), and Gaussian mixture initializations. Their performance is assessed through extensive simulation studies under both correctly specified (Gaussian) and misspecified (heavy-tailed) scenarios, as well as through applications to two real-world datasets.
Our results show that distance-based clustering initializations consistently outperform the alternatives in terms of latent state recovery, convergence behavior, and robustness to distributional misspecification. Uniform initialization performs poorly across nearly all scenarios, even when regimes are well separated. Gaussian mixture initializations remain competitive in low-dimensional Gaussian settings but deteriorate as dimensionality increases or Gaussianity assumptions are violated. Empirical applications to European industrial production indices and the Old Faithful geyser dataset confirm these findings, highlighting the substantial impact of initialization on model interpretability and computational efficiency.