Regional Statistics Conference 2026

Regional Statistics Conference 2026

A Lightweight and Interpretable Machine Learning Framework as a Robust Alternative to Deep Learning in Medical Image Classification

Conference

Regional Statistics Conference 2026

Format: IPS Abstract - Malta 2026

Keywords: computerized_clinical_decision, deep learning, image analysis;, machine learning

Abstract

Many real world classification tasks are fundamentally challenged by high cardinality and severe long tail distributions, where minority classes are often the most critical but the least represented. While Deep Learning architectures are frequently deployed for such tasks, their black-box nature, extreme parameter over reliance, and massive computational footprint often obscure the underlying statistical relationships and limit deployment in resource constrained environments.

This study introduces a scalable, computationally efficient, and statistically grounded machine learning framework designed to handle extreme class imbalance without the need for high capacity deep architectures or external pre-training biases.

We propose a novel Frequency-Ordered Sequential One-versus-Rest (FO-OvR) classification strategy. This approach exploits the inherent monotonic class-prevalence structure of a dataset to dynamically prune the search space and mitigate imbalance ratios during iterative training. To validate the flexibility and robustness of this strategy, a large-scale dataset comprising more than 50,000 dermoscopic images retrieved from the ISIC archive was utilized. Within this framework, we benchmarked a diverse suite of statistical estimators, specifically k-Nearest Neighbors, Support Vector Machines, Decision Trees, Random Forest, Logistic Regression, and Quadratic Discriminant Analysis. The framework was applied to a complex dermatological diagnostic task involving 11 distinct categories with a maximum imbalance ratio of 1:103, utilizing low-dimensional handcrafted features.

The FO-OvR framework demonstrated superior efficacy in capturing minority class patterns, maintaining F1 scores exceeding 90% even for the rarest entities. Benchmarked against foundational CNNs such as AlexNet and VGG16 trained from scratch, the proposed approach achieved a reduction in training time by a factor of more than 1000x (33 seconds versus 10 hours) and enabled real time inference on standard CPU hardware. Crucially, feature importance analysis confirmed that the model decisions were driven by semantically meaningful attributes, providing a level of transparency that deep learning models typically lack.

This work demonstrates that for complex, imbalanced domains, statistically grounded Green AI systems provide a robust and sustainable alternative to black-box architectures. The proposed FO-OvR strategy, compatible with various machine learning estimators, offers a generalized roadmap for efficient and interpretable classification in high stakes fields where data is skewed and computational resources are limited.