Regional Statistics Conference 2026

Regional Statistics Conference 2026

MLUtils: an Official Statistics oriented common interface for Machine Learning

Conference

Regional Statistics Conference 2026

Format: CPS Abstract - Malta 2026

Keywords: machine-learning, official-statistics, software

Session: CPS 04 Dissemination Communication Software

Wednesday 3 June 10 a.m. - 11 a.m. (Europe/Malta)

Abstract

The use of machine learning techniques is becoming increasingly common in Official Statistics, and they will soon be an important part of the production pipelines. While there are many R and Python libraries that provide a wide variety of machine learning techniques and implementations, they have some drawbacks for their use on official statistics production. First, every library has its own syntax, and if we want to replace one method with another, we might end up changing the script. Second, it is usually not straightforward to address some of the problems that arise in applications of machine learning to official statistics, like model-assisted estimation of aggregates with complex survey designs, the need to enforce restrictions between variables or the use of models for semicontinuous variables.

MLUtils is a library (with R and Python versions), aimed at providing a common interface for the machine learning tasks that might take part in the standard processes of a statistical office. Its purpose is to standardize the use of machine learning across official statistical processes, while offering functionalities that are often essential in this domain but uncommon in general-purpose machine learning applications. The library follows an object-oriented design and is highly modular, making it straightforward to extend and incorporate more sophisticated techniques as needs evolve.

The improvement in the quality of statistical production achieved by using MLUtils is twofold. On the one hand, by standardizing the use of machine learning in the production workflows it facilitates reproducibility, reduces implementation effort, and promotes harmonization of methodologies across teams and departments. On the other hand, the techniques implemented in the package allow us to implement in an easy way new uncertainty quantification measures and quality indicators specifically tailored to official statistics, like design-based predictive inference.