Regional Statistics Conference 2026

Regional Statistics Conference 2026

R Package VIM - The new Function vimpute and its Impact on Imputation Quality

Conference

Regional Statistics Conference 2026

Format: CPS Abstract - Malta 2026

Keywords: imputation, missing data, sequential

Session: CPS 08 Quality

Thursday 4 June 11 a.m. - noon (Europe/Malta)

Abstract

The vimpute() function represents an extension of the R package VIM, providing a unified and robust framework for imputing missing values in complex datasets. Implemented as a single, user-friendly function, vimpute() enables the simultaneous imputation of multiple variables while allowing for variable-specific imputation models that adapt to the statistical properties of each variable.
The method follows a sequential imputation strategy and is fully integrated with the mlr3 ecosystem, supporting modern machine learning algorithms such as Random Forests and XGBoost. For numerical variables, predictive mean matching (PMM) is applied to preserve the original data distribution, while categorical variables are imputed stochastically based on predicted class probabilities. An innovation is the optional automatic hyperparameter tuning, which identifies optimal model settings for each variable and improves predictive performance without manual intervention.
The imputation process iterates automatically and includes a convergence check to ensure stability and consistency of the resulting dataset. Through simulation studies and real-world applications, vimpute() demonstrates strong performance in terms of statistical quality and practical usability, making it particularly well-suited for official statistics and empirical research where transparency, flexibility, and high data quality are essential.