Are statisticians in official statistics becoming data scientists?
Conference
Format: CPS Abstract - IAOS 2026
Keywords: data science, skills, statistical_graphs
Session: Official statistics skills & data ethics
Thursday 14 May 9 a.m. - 10:30 a.m. (Europe/Vilnius)
Abstract
This paper investigates the evolving role of statisticians in official statistics in the context of the increasing availability of massive data. It addresses three key questions: the extent to which National Statistical Institutes (NSIs) identify modern data science skills as crucial, the evolution of the need for data scientists, and the essential skills for NSI workers. Through a case study of the French NSI (INSEE), the analysis of 7400 job descriptions from 2014 to 2023 reveals a gradual increase in the demand for data science skills, particularly in statistical studies. However, these skills remain underrepresented in the statistical production line. The study highlights the importance of interdisciplinary and transferable skills and the role of academic training in preparing statisticians for evolving demands.
Methodologically, we build our analysis on a corpus of standardized descriptions across years with structured information on job titles, duties, and detailed lists of required knowledge and know how derived from INSEE’s official occupational taxonomy. A central methodological contribution of the study is the construction of a co occurrence network of skills. For each pair of skills, we compute joint appearances across job descriptions, producing a weighted, undirected adjacency matrix from which a dense skills network is derived. We apply a spinglass community detection algorithm, selected for its robustness in networks of this size and density, to partition the network into six coherent skill communities. The methodology includes too a targeted keyword analysis of data science related terms (e.g., “machine learning,” “massive data,” “AI”) across job descriptions to quantify temporal trends and to isolate a subsample of “data science” positions. The over or under representation of skills within this subsample is then assessed by comparison with their global frequencies.