Using AI for Automatic Data Extraction and Classification in Official Statistics – Austria’s Use-Cases
Conference
Format: CPS Abstract - IAOS 2026
Keywords: information-extraction
Session: AI & ML in official statistics (3)
Thursday 14 May 9 a.m. - 10:30 a.m. (Europe/Vilnius)
Abstract
Various processes concerning data collection at Statistics Austria still rely on the manual extraction and classification of data. To improve the quality of these processes and reduce the amount of manual workload, we implemented an AI solution for four use cases over the last year, with three additional use-cases currently under development. This talk focuses on how we leverage AI to classify image and text inputs of clothing items necessary for the consumer price index by fine-tuning a text and an image model, correct code classifications of business balance sheets with a zero-shot approach, and automatically extract values from annual business reports in order to calculate various key figures using pre-trained models. In addition, we are developing several Optical Character Recognition (OCR) models to extract information from non-machine-readable files such as housing invoices and death certificates.