Regional Statistics Conference 2026

Regional Statistics Conference 2026

Analysis of the Household Budget Survey: A Distributional Data Approach

Conference

Regional Statistics Conference 2026

Format: IPS Abstract - Malta 2026

Session: IPS 1287 - Recent Developments in Symbolic and Distributional Data Analysis

Thursday 4 June 8:30 a.m. - 10:10 a.m. (Europe/Malta)

Abstract

The analysis of official statistics is often based on aggregated data. This happens when interest lies in regional, sociological, or otherwise defined groups as a whole, rather than on individual observations. But data aggregation raises the issue of information loss. To prevent a too important information loss when individual observations are aggregated, variability across records should be somehow kept. Symbolic Data Analysis provides a framework for the representation and analysis of complex data, comprising inherent variability. To this aim, new variable types have been introduced, whose realizations are not single real values or categories, but sets, intervals, or distributions over a given domain. In this work we focus on the Portuguese Household Budget Survey. Microdata relating to individual households are aggregated into groups based on geographical location and income. In the resulting symbolic data, units are described by the empirical distributions of numerical attributes. We assume parametric models for numerical distributional variables based on the representation of each distribution by central statistics, and inter-quantile ranges, for a chosen set of quantiles. Multivariate Normal distributions are assumed for the whole set of indicators, considering alternative sparse structures of the variance-covariance matrix. Given that the variables have restricted domains, a transformation to the real line is applied. This model then allows for Model-based Clustering of the defined groups, identifying sociological clusters. The identified structure is connected to both geographical location and income level.