Regional Statistics Conference 2026

Regional Statistics Conference 2026

Small area estimation of categorical indicators using finite mixtures of multinomial logistic regression models

Conference

Regional Statistics Conference 2026

Format: IPS Abstract - Malta 2026

Keywords: latent variable models, multinomial, random-effects, semi-parametric estimation

Abstract

Frequently, surveys are designed to collect categorical indicators, and small area estimation (SAE) methods based on generalized linear mixed models are widely used for prediction. Jiang (2003) introduced an Empirical Best Prediction (EBP) approach for general responses in the Exponential Family, thereby encompassing categorical variables as well. Despite its appeal, this framework presents two important limitations. First, it relies on strong and typically untestable parametric assumptions on the distribution of the random effects (i.e., the mixing distribution), which is conventionally assumed to be Gaussian. Second, it entails a substantial computational burden, affecting not only parameter estimation but also the computation of the EBP and, in particular, the derivation of accurate measures of uncertainty.
To overcome these issues, we propose a semi-parametric EBP for categorical outcomes by extending the proposal by Marino et al. (2019), originally developed for univariate responses in the Exponential Family. In our approach, the mixing distribution is left unspecified and is estimated directly from the data via a non-parametric maximum likelihood approach. The resulting estimate is known to be a discrete distribution defined over a finite support, which naturally induces a finite mixture representation.
In addition, we derive a second-order approximation to the mean squared error of the proposed semi-parametric EBP for categorical outcomes, following similar lines to those in Marino et al. (2019). The finite-sample properties of the proposed methodology, including both prediction and uncertainty quantification, are thoroughly assessed through an extensive simulation study, highlighting its advantages in terms of robustness and computational feasibility.