The Application of Large Language Models in Producing Official Statistics: Integration with the Generic Statistical Business Process Model
Conference
Format: CPS Abstract - IAOS 2026
Session: Large Language Models & Machine Learning in official statistics
Tuesday 12 May 4:30 p.m. - 6 p.m. (Europe/Vilnius)
Abstract
Large Language Models (LLMs) represent a transformative advancement in artificial intelligence, offering unprecedented capabilities in natural language understanding and generation. In the realm of official statistics, where accuracy, timeliness, and transparency are paramount, LLMs hold significant potential to enhance various stages of statistical production. This paper examines the integration of LLMs with the Generic Statistical Business Process Model (GSBPM), a standardized framework for statistical processes adopted by national and international statistical offices. Drawing on recent case studies and guidelines, we explore applications across GSBPM phases, including text mining, data integration, script mining, and voice mining, while highlighting the role of machine learning (ML) and deep learning (DL) in improving natural language processing (NLP) and LLMs. A detailed review of LLM architecture, natural language modeling steps, and processing pipelines underscores their statistical relevance. We highlight benefits such as efficiency gains and improved data quality, and address challenges including biases and hallucinations. Recommendations for responsible implementation emphasize human oversight and ethical governance. By leveraging LLMs thoughtfully, statistical organizations can modernize operations while upholding public trust.