65th ISI World Statistics Congress

65th ISI World Statistics Congress

Classifying respondent comments from the 2021 Canadian Census of Population using machine learning methods

Conference

65th ISI World Statistics Congress

Format: SIPS Abstract - WSC 2025

Keywords: deep learning, machine learning, text-classification

Session: SIPS 1164 - IAOS Young Statisticians Prize 2023, 2024, 2025

Monday 6 October 9:20 a.m. - 10:30 a.m. (Europe/Amsterdam)

Abstract

To improve the analysis of respondent comments from the Canadian Census of Population, data scientists at Statistics Canada compared and evaluated traditional machine learning, deep learning and transformer-based techniques. Cross-lingual Language Model-Robustly Optimized Bidirectional Encoder Representations from Transformers (XLM-R), a cross-lingual language model, fine-tuned on census respondent comments yield the best result of 89.91% F1 score overall despite language and class imbalances. Following the evaluation, the fine-tuned model was implemented successfully to objectively categorize comments from the 2021 Census of Population, with high accuracy. As a result, feedback from respondents was directed to the appropriate subject matter analysts, for them to analyze post-collection.