Regional Statistics Conference 2026

Regional Statistics Conference 2026

Relative Importance Analysis and Its Application to Variable Selection

Conference

Regional Statistics Conference 2026

Format: CPS Poster - Malta 2026

Keywords: multivariate ordinal regression model, variable selection

Session: CPS Poster Session 02

Thursday 4 June 11 a.m. - noon (Europe/Malta)

Abstract

Variable selection is a critical technique in data science, applied in constructing predictive models for both statistical and machine learning purposes. Effective variable selection not only enhances the efficiency of statistical inference or machine learning but also improves the interpretability of the results. This study explores the use of relative importance for variable selection. While relative importance has advantageous properties, evaluating importance and selecting the optimal variables serve different purposes: the former compares variables based on their relative contribution to explaining the dependent variable, while the latter aims to construct the most efficient predictive model using the fewest variables. Therefore, before applying relative importance for variable selection, it is essential to thoroughly understand its theory and properties, then extend it into a practical tool for variable selection.
This study first establishes a general framework that decomposes relative importance (RI) analysis into two functional steps: orthogonalization and weight reallocation. We investigate the theoretical foundation and properties of each step. We then implement and evaluate the RI-based variable ranking and selection. We demonstrate through simulations and real datasets that predictive models built on these rankings are highly competitive, often outperforming established regularization techniques such as the lasso and relaxed lasso. While Lasso-based techniques have dominated recent literature, our findings reveal that RI-based methods serve as a powerful, yet underutilized, alternative. We argue that these tools merit broader adoption and closer integration into the standard toolkits of the statistics and machine learning communities.