Regional Statistics Conference 2026

Regional Statistics Conference 2026

Finding redundancies between data collection: an AI-based approach

Conference

Regional Statistics Conference 2026

Format: IPS Abstract - Malta 2026

Session: IPS 1177 - Using AI to strengthen data governance

Wednesday 3 June 2:30 p.m. - 4:10 p.m. (Europe/Malta)

Abstract

Enhancing the consistency and efficiency of data-collection frameworks requires effective mechanisms to detect and evaluate overlaps across reporting requirements. This paper introduces an experimental AI-based framework developed at Banco de Portugal to identify redundancies between data collections. Building on a corpus of 80 regulatory and methodological documents, the approach integrates complementary Natural Language Processing techniques, including keyword-based similarity measures (TF-IDF and YAKE) and Large Language Model (LLM)-driven extraction and semantic comparison of reporting datapoints. The methodology reliably identifies known cases of redundancy, both partial and substantial, across documents that differ in structure, granularity, and technical terminology. The results provide evidence that AI-driven tools can play a decisive role in supporting systematic redundancy detection, thereby contributing to reporting simplification and reducing operational complexity within central banking environments.