Download PDF

Statistical Data Integration and Prediction in Climate Science

Author

Sebastian Mutz

Conference

Regional Statistics Conference 2026

Format: IPS Abstract - Malta 2026

Keywords: climate, data-driven methods, statistics

Session: IPS 1233 - Methods and Applications of Statistical Data Fusion and Integration

Friday 5 June 2 p.m. - 3:40 p.m. (Europe/Malta)

Abstract

Statistics and data fusion approaches are well established in meteorology and climatology. They are applied for dataset merging, interpolation, probabilistic inference, detecting patterns in high-dimensional data, and data-driven predictions. I provide application examples for the last two:
1. The study of the evolution of the Earth’s surface is often limited by insufficient knowledge of spatiotemporal changes in climate and its erosion potential. A combination of cluster and discriminant analysis was applied to high-dimensional simulated climate datasets to quantify differences in regional expressions of two climate states with respect to erosion-relevant atmospheric variables. The generated overviews contextualise field data and support the design of field campaigns and the formulation of new hypotheses.
2. Empirical–statistical downscaling (ESD) is a computationally efficient framework for translating large-scale climate and weather information into local atmospheric states through a set of transfer functions. These are classically determined by multiple regression (ordinary least squares or regularised variants) with bootstrapping and cross-validation. ESD is commonly used to refine weather forecasts or downscale predictions from coarse climate models to improve climate change impact assessments and policy decisions. By varying predictor and predictand datasets, ESD can be used for climate model bias correction, predicting local atmospheric states, examining dependencies in the climate system, and simulating the response of climate-impacted systems like glaciers, ecology, or agricultural yields. Two variations of this approach were applied for a) downscaling global climate predictions in Southwest Germany and b) predicting glacier mass balance changes in the Andes. For the former, the experiments were repeated with nonlinear methods like Random Forests and multilayer perceptrons. Classic statistical methods are computationally more economic and tend to outperform others, especially in data-scarce settings and for predictions of values outside the observed range.