64th ISI World Statistics Congress

64th ISI World Statistics Congress

The Human Migration Database

Author

MD
Maciej Danko

Co-author

Conference

64th ISI World Statistics Congress

Format: CPS Abstract

Keywords: migration

Abstract

Migration is a major component of population change at the global level, with broad societal implications. Unfortunately, reliable and internationally comparable detailed migration statistics are not available for most of the countries, including the majority of European countries. In developed countries, these data are usually collected by national statistical institutes (NSI) or/and other governmental agencies. Nevertheless, their ability to track migration flows (especially emigration) is limited. The alternative sources used to estimate migration stocks and flows are highly heterogeneous therefore their use to assess migration patterns may be hampered by inconsistent definitions and problems with quality and availability.

The Human Migration Database (HMigD) is designed to provide high-quality and detailed data on international migration in developed countries and fill the existing data gap. The main goal is to provide internationally comparable data on bilateral migration flows focusing on the European Union countries. Although the Population Migration Database is synthetic, i.e. it uses the advanced modeling techniques described below, it follows four main guiding principles: comparability, flexibility, accessibility, and reproducibility that characterize Open Data concept.

One of the key aspects of migration data to be considered in migration models is their quality. In general, the quality depends on the ability of governmental agencies to trace migration flows, i.e. the legal incentives for registering the migration event and the methodology used to measure migration. Due to this reason, migration estimates produced by National Statistical Institutes (NSIs) and other sources (e.g. LFS) are not directly comparable. The major migration data quality problems can be classified into four groups: (1) Accuracy issues related to random, rather than systematic errors made in the data collection process; (2) Undercounting reflecting a non-systematic bias in migration estimates; (3) Coverage, issues related to exclusion certain population segments; and (4) Inconsistencies in the definition of international migrant due to deviations of national migration criteria (minimum duration of stay) from international (UN/Eurostat) standards.

In the first stage of the project, we systematically evaluate and classify data quality problems, which is an important task for creating a reliable evidence base for further stages of the project. Ignoring potential systematic errors and misinterpretation of problematic data can lead to misleading conclusions or estimates. The quality of migration data is assessed using available metadata, expert opinion, and data-driven methods.

In the second stage of the project, we extend previous work on estimating international migration by developing a hierarchical Bayesian model that integrates and harmonizes various migration data sources (e.g., administrative data and Labour Force Survey data) taking into account differences in data quality and definitions used, as well as socioeconomic and demographic information.

In the last step, we do exhaustive quality checks of the results (including plausibility checks). The validated model estimates are used to produce a final output that is accompanied by the Shiny application, thanks to which it is possible to access the results under various alternative modeling assumptions.

Although we currently focus only on the EU countries, the database will include other developed countries in the future