A Lightweight and Interpretable Machine Learning Framework as a Robust Alternative to Deep Learning in Medical Image Classification
Conference
Regional Statistics Conference 2026
Format: IPS Abstract - Malta 2026
Keywords: computerized_clinical_decision, deep learning, image analysis;, machine learning
Thursday 4 June 11:30 a.m. - 1:10 p.m. (Europe/Malta)
Abstract
Many real world classification tasks are fundamentally challenged by high cardinality and severe long tail distributions, where minority classes are often the most critical but the least represented. While Deep Learning architectures are frequently deployed for such tasks, their black-box nature, extreme parameter over reliance, and massive computational footprint often obscure the underlying statistical relationships and limit deployment in resource constrained environments.
This study introduces a scalable, computationally efficient, and statistically grounded machine learning framework designed to handle extreme class imbalance without the need for high capacity deep architectures or external pre-training biases.
We propose a novel Frequency-Ordered Sequential One-versus-Rest (FO-OvR) classification strategy. This approach exploits the inherent monotonic class-prevalence structure of a dataset to dynamically prune the search space and mitigate imbalance ratios during iterative training. To validate the flexibility and robustness of this strategy, a large-scale dataset comprising more than 50,000 dermoscopic images retrieved from the ISIC archive was utilized. Within this framework, we benchmarked a diverse suite of statistical estimators, specifically k-Nearest Neighbors, Support Vector Machines, Decision Trees, Random Forest, Logistic Regression, and Quadratic Discriminant Analysis. The framework was applied to a complex dermatological diagnostic task involving 11 distinct categories with a maximum imbalance ratio of 1:103, utilizing low-dimensional handcrafted features.
The FO-OvR framework demonstrated superior efficacy in capturing minority class patterns, maintaining F1 scores exceeding 90% even for the rarest entities. Benchmarked against foundational CNNs such as AlexNet and VGG16 trained from scratch, the proposed approach achieved a reduction in training time by a factor of more than 1000x (33 seconds versus 10 hours) and enabled real time inference on standard CPU hardware. Crucially, feature importance analysis confirmed that the model decisions were driven by semantically meaningful attributes, providing a level of transparency that deep learning models typically lack.
This work demonstrates that for complex, imbalanced domains, statistically grounded Green AI systems provide a robust and sustainable alternative to black-box architectures. The proposed FO-OvR strategy, compatible with various machine learning estimators, offers a generalized roadmap for efficient and interpretable classification in high stakes fields where data is skewed and computational resources are limited.