A Modern Statistical Business Register for Lithuania: Quality, Governance and the Use of New Data and Methods
Conference
Format: CPS Abstract - IAOS 2026
Keywords: administrative data integration, automatic_coding, datagovernance, llm, methods, modernization, nace, register, webscraping
Session: New developments in register data & coding
Tuesday 12 May 2:30 p.m. - 4 p.m. (Europe/Vilnius)
Abstract
Statistical Business Registers (SBRs) are the backbone of official economic statistics, making their quality, adaptability, and governance crucial for the statistical system. In Lithuania, the modernization of the SBR has been closely linked to broader institutional and technological reforms, with Statistics Lithuania evolving into the State Data Agency to take a central role in administrative data governance and integration. At the same time, administrative and statistical data sources are being consolidated in the State Data Governance Information System, implemented on the Palantir Foundry platform.
This transition required the SBR to be redesigned and rebuilt in a new technological environment, moving away from legacy Oracle-based solutions towards a modular, data-centric architecture. Consequently, both technical solutions and methodological foundations were systematically re-evaluated, in line with the principles of the Statistical Business Register Maturity Model, ensuring methodological consistency, automation, and alignment with European and international standards.
The modernization of the SBR was driven by the need to strengthen methodological consistency, data governance, and quality assurance. Particular attention was paid to register variables, derived characteristics, and process logic, with a focus on maximizing automation while ensuring alignment with statistical standards and emerging data integration requirements. The new platform supports tighter integration of data flows, improved traceability of changes, and the systematic application of quality checks, providing a stronger foundation for the SBR as a core statistical tool.
Rather than a single, tightly coupled system, the SBR is now a flexible, modular environment enabling continuous integration of administrative sources and reusable processing components. More frequent and automated data exchanges, transparent data flows, and embedded validation rules reduce legacy-related constraints and create conditions for innovation, responsiveness, and more efficient maintenance.
Within this modernized environment, the update of the economic activity classification to NACE Rev. 2.1 was implemented as a large-scale, dedicated recoding project. Accurate recoding relied on numerous administrative and alternative data sources, including web-based information and business descriptions, supported by automated text extraction and Large Language Models. These methods complemented rule-based and probabilistic approaches within a hybrid framework combining automation and expert review. Beyond the classification update, the project expanded the methodological toolkit and evidence base for ongoing quality assessment of principal economic activity in the SBR.
This paper presents the Lithuanian experience of modernizing the SBR and discusses how the integration of new data sources, combined with methodological and technical solutions, supports modernization, automation, and quality assurance. It demonstrates how flexible, hybrid approaches facilitate the use of innovative methods and diverse data sources while preserving data coherence, reliability, and statistical quality.