Modernising probabilistic linking at the Australian Bureau of Statistics using Splink and its potential to improve multisource statistics production
Conference
65th ISI World Statistics Congress
Format: IPS Abstract - WSC 2025
Keywords: fellegi-sunter, multi-source, probabilisticlinkage, record linkage, splink
Abstract
The paper will explain ABS’s experiences with and potential for Splink to improve the quality of linkages and enhance data integration capability. Splink and its underlying methodology - the Fellegi-Sunter model - gives us the opportunity to address data quality challenges across the entire Australian government. It enhances our ability to measure linkage uncertainty, obtain accurate measures of linkage quality, and provide a means of accounting for linkage error in analysis projects. An example is adjusting estimates of regression models fitted to linked data to obtain unbiased estimates of parameters and their variances.
Initial ABS results show that using Splink results in faster linkages whose outputs are consistent with our current deterministic approach. Splink also produces pairwise match probabilities for all linked records, which opens the potential of several micro-quality metrics that can be derived which are not possible with deterministic approaches. The use of Splink will assist ABS to meet the growing demand for linked data products across federal and state jurisdictions.
We will explore the potential for Splink to be used to build the ABS Person Linkage Spine. The Spine is a key ABS asset that is central to our data linking methods. Instead of linking datasets one-to-one for individual projects, we can link all datasets to the Spine once and then combine datasets via the Spine as needed for multiple projects. Splink has allowed us to reduce the number of steps involved in producing the Spine, improving efficiency and simplifying a complex process.