Download PDF

Central limit theorem for a large multimodal model

Author

Andrej Srakar

Conference

Regional Statistics Conference 2026

Format: CPS Abstract - Malta 2026

Keywords: central limit theorem,, large language models, spherical

Session: CPS 15 Mathematical Statistics

Friday 5 June 11 a.m. - noon (Europe/Malta)

Abstract

Since the pioneering works from the 1980s by Breuer, Dobrushin, Major, Rosenblatt, Taqqu and others, central and noncentral limit theorems have been constantly refined, extended and applied to an increasing number of diverse situations. In recent years, fourth moment theorem CLTs, quantitative CLTs, Breuer-Major and Dobrushin-Major CLTs, de Jong CLTs, functional CLTs and others have been developed. In our contribution we extend our recent work on central limit theorems for large language model with and without long memory (Srakar, 2025 – presented at the recent ISI World Congress 2025 in The Hague) to large multimodal models. The latter are characterized, in mathematical terms, by diversity of inputs consisting of text and images. We model this as two different types of tokens within our mathematical model. The latter is based on mathematical model of Transformers by Geshkovski et al. (2024a; 2024b) who model LLM's as interacting particle systems on n-1 dimensional sphere, and prove and observe clustering behaviour of this stochastic system. Our central limit theorem studies tokens of the large multimodal model on spherical random fields, considering their two-part interaction. This brings the analysis close to Coulomb gas type mathematics. Our derived scaling limits and upper and lower bounds for the behavior of the dynamical system under study show dependence upon number of layers and time perspective. They reflect the clustering behaviour observed by Geshkovski et al. We study the results also in the context of inclusion of additional feedforward layers using Lie-Trotter splitting scheme and usual two-part SPDE framework to study CLT's for neural networks. We present applications on datasets from finance and medical imaging. In conclusion we discuss implications for statistical estimation and inference in a natural language processing context.