Download PDF

Data-Driven Identification of Heterogeneous Measurement Uncertainty

Author

Conference

2026 IAOS Conference

Format: CPS Abstract - IAOS 2026

Keywords: clustering, complex observational data, data-driven methods, heteroscedasticity, measurement uncertainty, mixed models, unsupervised learning

Session: Complex analysis & indicators in official statistics (2)

Wednesday 13 May 2:30 p.m. - 4 p.m. (Europe/Vilnius)

Abstract

Reliable quantification of measurement uncertainty is a central requirement for statistical inference and evidence-based decision-making based on observational data. In many applied settings, observation errors are heterogeneous and driven by complex, partially unobserved conditions, while standard statistical practice often relies on homogeneous variance assumptions or coarse categorical groupings. Such simplifications can lead to misspecified stochastic models and overoptimistic assessments of precision.

This contribution presents a data-driven framework for identifying and modelling heterogeneous measurement uncertainty by integrating unsupervised learning into mixed-effects modelling. Auxiliary predictors describing individual observations are first explored using exploratory data analysis and unsupervised clustering in order to reveal latent measurement conditions that are not explicitly available in conventional metadata. Cluster memberships are then incorporated as random-effect design matrices in a linear mixed model, allowing the contribution of distinct observational regimes to measurement variability to be quantified via variance component estimation using restricted maximum likelihood. Uncertainty of the estimated components is assessed through likelihood-based inference and the associated Fisher information.

We test this framework on a high-precision geodetic leveling network. These data pose a significant modelling challenge because observations are not independent: they are linked through a complex network topology defined by the functional measurement model and are influenced by a wide range of geometric and environmental covariates. Model comparisons show that the inclusion of cluster-derived stochastic components substantially improves model adequacy relative to homogeneous variance formulations. The resulting heteroscedastic model produces better-behaved standardized residuals and reveals that classical least-squares adjustment tends to underestimate variability. In contrast, the proposed framework provides statistically rigorous and more realistic uncertainty bounds, avoiding the overoptimistic precision often induced by misspecified homogeneous models.

Beyond improved model adequacy, the framework yields an interpretable decomposition of measurement uncertainty into data-driven components associated with distinct observational conditions. By combining unsupervised learning with classical likelihood-based inference, the proposed approach offers a practical and transferable strategy for improving uncertainty quantification in complex observational data, with relevance well beyond the specific application considered here.