From online job advertisements to labour market statistics: an NLP and LLM-based approach
Conference
Format: CPS Poster - IAOS 2026
Keywords: jobvacancies, labourmarket, large language models, official-statistics
Session: Poster Session
Tuesday 12 May 12:30 p.m. - 2:30 p.m. (Europe/Vilnius)
Abstract
Job vacancy portals provide timely information on labour demand, which has encouraged Central Statistical Bureau of Latvia to explore how natural language processing (NLP) methods – vectorization, RAG, and Large Language Models (LLMs) – can be used to convert online job advertisements into data that are analytically useful for producing official statistics. Achieving this requires examining how NLP methods and LLM-based text understanding and reasoning can be combined with established statistical practices to extract, interpret, and harmonise available information.
As online job advertisements are highly heterogeneous, unstructured, and not designed for statistical use, their integration into official statistics remains methodologically challenging. In addition, predefined rules or keyword-based methods are often insufficient.
To handle variation in job descriptions, multilingual content, and differing levels of detail, the developed approach relies on the contextual reasoning capabilities of LLMs, with particular attention to linking vacancy information with existing statistical registers and international classifications of occupations and economic activities. At the same time, the approach is designed to satisfy the consistency and quality control requirements of official statistics.
Although the resulting data provide more timely and granular insights into labour demand than traditional sources, the use of NLP and LLM-based reasoning in statistical production raises methodological questions that require further investigation.