Download PDF

Model-agnostic interpretability of deep learning models for car emissions assessment

Author

Alfonso Iodice D'Enza

Conference

Regional Statistics Conference 2026

Format: IPS Abstract - Malta 2026

Keywords: interpretability, shapley

Session: IPS 1285 - Novel Statistical Approaches to Risk Analysis for Sustainability and Responsible Decisions.

Wednesday 3 June 11:20 a.m. - 1 p.m. (Europe/Malta)

Abstract

Telematics data offer a valuable opportunity to study the environmental impact of driving behaviour at scale, but deriving actionable insights from such data remains challenging. On the one hand, reliable emission estimation typically requires complex simulation-based or machine learning models. On the other hand, these models often function as black boxes, limiting their usefulness for behavioural interpretation and targeted intervention. This work proposes an interpretable framework for analysing vehicle CO_2 emissions from large-scale telematics data by integrating emission reconstruction, supervised surrogate modelling, model-agnostic explanation techniques, and behavioural profiling.

Microscopic emissions are first reconstructed from raw telematics observations using NeuralMOVES, a deep learning surrogate of the U.S. EPA MOVES simulator. This allows scalable estimation of observation-level CO_2 emission rates while retaining sensitivity to driving dynamics, vehicle characteristics, and contextual conditions. These microscopic estimates are then aggregated to the customer level and used to define a supervised learning problem in which driver-level emissions are explained through structural and behavioural predictors derived from trip histories.

To model this relationship, several predictive methods are compared, including Elastic Net, Random Forest, and XGBoost.
To make the fitted model interpretable, the analysis combines local and global explanation tools. SHapley Additive exPlanations (SHAP) are used to decompose individual predictions into feature-level contributions, revealing how behavioural and structural variables drive emissions for each customer. Accumulated Local Effects (ALE) plots are then employed to characterise the global effect of key predictors while mitigating distortions due to correlated features. Together, these tools provide complementary views of the mechanisms underlying predicted emissions.

A further challenge is that large-scale SHAP analysis produces a high-dimensional set of explanation vectors that is difficult to summarise. To address this, Archetypal Analysis is applied to the matrix of SHAP values in order to identify extreme and interpretable behavioural emission profiles. Rather than grouping drivers by similarity in raw covariates, the proposed approach clusters them in the space of model explanations, yielding archetypes that reflect distinct emission mechanisms.

The resulting framework moves from black-box prediction to interpretable behavioural segmentation. It provides a principled way to identify recurring emission profiles, distinguish behavioural from structural drivers of emissions, and support scalable eco-driving recommendations or insurance-oriented interventions. More broadly, the paper shows how explanation-based profiling can transform complex machine learning outputs into operationally meaningful insights for sustainable mobility.