Download PDF

torchTextClassifiers: a unified framework for text classification with PyTorch, from a MLOps perspective

Author

Conference

Regional Statistics Conference 2026

Format: IPS Abstract - Malta 2026

Keywords: automatic_coding, deep learning, ml-ops, text-classification

Session: IPS 1255 - From Text to Code: How Insee has integrated AI and ML for Text Classification and Information Retrieval

Thursday 4 June 11:30 a.m. - 1:10 p.m. (Europe/Malta)

Abstract

In a modern MLOps-driven ecosystem, model design serves as a foundational building block. Data scientists require the flexibility to iterate across diverse deep learning architectures, yet the transition from experimentation to production demands high standardization. To be effective, model training must be streamlined, portable across environments, and rigorously versioned to ensure reproducibility and a clear source of truth.

While PyTorch is the industry standard for custom neural network design, its flexibility can lead to fragmented workflows. To bridge this gap, Insee introduces torchTextClassifiers: a unified framework engineered to streamline the development and deployment of text classification models.

By conceptualizing the core components of text classification, the framework enables users to build state-of-the-art models with high customization—supporting everything from fastText to BERT. It is compatible with modern ecosystems like Hugging Face and adds specialized features such as explainability, custom tokenizer training, and hierarchical classification handling. The framework remains lightweight, staying close to raw PyTorch and PyTorch Lightning without introducing unnecessary dependencies. It specifically targets organizations that prefer training production-oriented custom models over deploying large, rigid pre-trained architectures.

Built with the unique needs of National Statistical Institutes (NSIs) in mind, torchTextClassifiers extends standard PyTorch capabilities with specialized features for automatic coding use cases. As a modular Python package, it integrates seamlessly into MLOps pipelines. By leveraging PyPI as a central distribution point, it ensures that model designs are standardized, portable, and ready for the journey from development to production-scale inference.