65th ISI World Statistics Congress

65th ISI World Statistics Congress

LLM-Assisted Record Linkage: Framework for Record Linkage in Official Statistics

Conference

65th ISI World Statistics Congress

Format: SIPS Abstract - WSC 2025

Keywords: "data_linkage, artificial intelligence, llm

Session: SIPS 1164 - IAOS Young Statisticians Prize 2023, 2024, 2025

Monday 6 October 9:20 a.m. - 10:30 a.m. (Europe/Amsterdam)

Abstract

National statistical offices (NSOs) increasingly rely on record linkage to link census data, administrative sources, and survey responses. However, conventional string-similarity methods often struggle with free-text fields. To address these challenges, we systematically benchmark modern open-source large language models (LLMs) against classic string-based comparators for record linkage. Building on these findings, we introduce a hybrid approach that retains well-established probabilistic frameworks yet integrates an LLM-based classifier for ambiguous record pairs. We apply a Bayesian update to combine the LLM’s output with the prior probability, with the aim of reducing burden on manual clerical review. Our experiments show that selectively deploying open-source LLMs for the most uncertain pairs can significantly reduce manual effort by refining decisions through Bayesian updating. As NSOs must ensure transparency, explainability, and adherence to official statistical standards, our research systematically addresses these concerns while evaluating the potential of LLMs for record linkage. We address critical considerations —ranging from data privacy and hardware requirements to human-in-the-loop review and calibration—and present a framework for integrating LLMs into existing record linkage workflows.