Regional Statistics Conference 2026

Regional Statistics Conference 2026

Similarity-Based Augmentation Heuristic (SIMBAH): A Synthetic Data Approach to Consumer Survey Methodological Evaluation

Conference

Regional Statistics Conference 2026

Format: CPS Abstract - Malta 2026

Session: CPS 26 Synthetic Data

Wednesday 3 June 4:30 p.m. - 5:30 p.m. (Europe/Malta)

Abstract

Authors: Anggraini Widjanarti, Muhammad Azkaenza, Mohammad Khoyrul Hidayat, Larasati Ratusinkaya, She Asa Handarzeni, Farhan Hafizh, Okiriza Wibisono

Methodological evaluation in official statistics often face practical constraints that limit to test improvements immediately within ongoing statistical production. Adjustments that are conceptually desirable from a methodological perspective may be difficult to implement in real time due to operational rules, risks to time-series continuity, and institutional obligations to preserve the stability of key indicators.
At Bank Indonesia, adjustments to the sample composition of the Consumer Survey are constrained by a maximum 5% monthly deviation limit and the need to maintain the historical continuity of the Consumer Confidence Index (IKK). These constraints complicate efforts to move toward the ideal expenditure class structure without introducing structural breaks in a core indicator of household consumption derived from survey data processing. Synthetic data augmentation offers a simulation-based approach to address this challenge, enabling ex-ante evaluation of respondent composition adjustments without interfering in routine survey data collection and processing.
This study applies SMOTE and Gaussian Copula methods, complemented by a novel Similarity-Based Augmentation Heuristic (SIMBAH), to simulate adjustment scenarios including expenditure class reclassification, particularly for cities with sparse or empty classes. SIMBAH proposes an innovative approach to structural data scarcity by leveraging correlation-based sample borrowing across cities with similar household consumption dynamics and geographic proximity, extending beyond conventional augmentation methods that constrained by insufficient local observations. Evaluation combines (i) fidelity metrics capturing distributional similarity and correlation preservation, and (ii) utility metrics assesing time-series stability through IKK correlation and RMSE at both city and national levels .
Findings indicate that synthetic data augmentation can effectively support methodological evaluations in official statistics, allowing evidence-based survey design improvements while preserving the integrity and continuity of statistical production. Importantly, SIMBAH improves the feasibility of evaluating reclassification scenarios in structurally sparse settings, providing a practical pathways for statistical institutions to pursue methodological advancement without compromising production stability for high-salience economic indicators.

Keywords: Synthetic Data, Data Augmentation, Consumer Survey, Methodological Evaluation, Official Statistics, Consumer Confidence Index