Regional Statistics Conference 2026

Regional Statistics Conference 2026

Sector Enrichment of Corporate Payment Transactions Data Using Large Language Models

Conference

Regional Statistics Conference 2026

Format: CPS Abstract - Malta 2026

Keywords: artificial intelligence, economic-sectors, large language models, payment-system

Session: CPS 18 Large Language Models Applications

Friday 5 June 11 a.m. - noon (Europe/Malta)

Abstract

Sectoral assessment is essential for monitoring economic activity. One increasingly valuable source for such assessment is payment system transaction data, which provide a promising high-frequency signal by capturing real economic interactions in near–real time and can be aggregated to track sectoral dynamics faster than many traditional indicators. However, sector-level analytics depend on reliably mapping transacting corporations to economic sectors.
A main constraint, particularly in Indonesia, is that payment infrastructures are designed for speed. To enable rapid processing, transaction messages typically contain limited descriptive attributes, and sector information is often missing or incomplete. This limitation is especially pronounced for non-listed corporations, for which standardized sector labels are less consistently available.
To address this gap, we propose a sector-enrichment method for corporate entities transacting through the Indonesian Payment. The methods combine sector proxies from external reference sources, such as bank reports and publicly available data, with a Large Language Model (LLM) component to infer sectors when labels are missing or ambiguous. The LLM leverages operationally available signals (e.g., corporate names, listed/non-listed status, and other lightweight attributes), that do not slow transaction processing. It produces sector predictions aligned with a predefined taxonomy, namely the sectoral classification published by Indonesia’s statistical bureau.
We evaluate the approach using held out labeled entities to assess predictive performance and robustness across sector classes and entity types. Results show that LLM-based inference achieves strong predictive quality and potentially expands sectoral coverage beyond what can be obtained from bank sources alone. The resulting sector-enriched transaction dataset supports sector-level analysis of corporate payment flows and enables the construction of high-frequency proxies for monitoring corporate activity and sectoral economic growth in Indonesia.