Use of Artificial Intelligence to Extract Unstructured Data from Financial Statements
Conference
Format: CPS Abstract - IAOS 2026
Keywords: #officialstatistics, administrative data, artificial intelligence, innovative data collection, machinelearning
Session: Data sources for AI
Tuesday 12 May 11 a.m. - 12:30 p.m. (Europe/Vilnius)
Abstract
The Singapore Department of Statistics (DOS) has developed the Integrated Business Survey System (IBSS)-Pro AI to address challenges in extracting data from unstructured financial statements. Since 2014, DOS has utilised financial statements filed in XBRL format, which provides machine-readable structured data, to compile business statistics. However, XBRL filings do not contain all the relevant information. Financial statements in PDF or RTF formats contain valuable information that requires manual extraction by case officers, which can take up to one hour per financial account.
Commissioned in November 2024, IBSS-Pro AI employs a combination of Named Entity Recognition (NER), rule-based logic, and Natural Language Processing (NLP) to automate the extraction of unstructured information from firms' financial statements. The system follows a workflow involving document ingestion, pre-processing, extraction algorithms, and structured data output with human-in-the-loop feedback mechanisms.
The IBSS-Pro AI system enables case officers to validate extracted data through comparison with other available data sources, with easy editing capabilities when corrections are needed. Case officers can also surface new keywords for system evaluation. This continuous improvement process enhances system accuracy over time. This innovation aims to improve operational efficiency and support the compilation of more timely and comprehensive business statistics, reducing the manual burden on survey operations whilst maintaining data quality and integrity.