2026 IAOS Conference

2026 IAOS Conference

The use of public data sources in official statistics: an application with the EuroGroups Register

Conference

2026 IAOS Conference

Format: CPS Abstract - IAOS 2026

Keywords: #newdatasources, globalisation, newdatasources, statistical registers

Session: New sources: citizen & public data

Wednesday 13 May 2:30 p.m. - 4 p.m. (Europe/Vilnius)

Abstract

The advent of emerging technologies, such as AI and Machine Learning, and the availability of large amount of public data is revolutionizing the practices for the update and maintenance of statistical business registers, thereby boosting efficiency. These innovations are driving advancements in the use of publicly available data, replacing survey methodologies, and enhancing data integration and quality, leading to more complete and timely statistical outcomes.

The EuroGroups Register (EGR) is the European statistical business register on Multinational Enterprise (MNE) groups of the European Statistical System, managed by Eurostat in collaboration with the National Statistical Institutes of EU and EFTA countries.

The EGR coverage and quality has improved significantly over the last years and the NSIs regularly use it as a source for MNE populations in various statistical production processes. The EGR data are of high interest to different statistical domains and to policymakers for their own work. The increasing demand and additional needs require even more complete coverage. This necessitates to explore additional techniques beyond the regular collection of data from Member States. The EGR already integrates data from three open sources: (i) Companies House, from the British government, to cover the gaps left to the EGR after the Brexit; (ii) EDGAR, from the Securities Exchange Commission in the US and (iii) Wikipedia, web-scrapped data, to cover consolidated group data.

The current work leverages on the lessons learned from the already integrated data sources and the initial process used, to a systematic way to expand on the number of integrated sources and to also improve the existing processes:

• Improve the initially established process to systematically select new data sources, evaluate them and integrate them to the EGR.

• The processing of information disseminated as images and/or PDFs has been historically challenging. With new existing tools, collecting and processing this information is a possibility.

• More information becomes available in the country-by-country reports; mandated by Directive 2013/34/EU and adopted at the end of 2024 with the Implementing Regulation 2024/2952.

• Reporting frameworks to make data machine-readable (e.g., XBRL) are increasingly available.