Download PDF

Algorithmic Bias as Distinct from Mere Data Bias

Author

Andrew Ackerman

Conference

Regional Statistics Conference 2026

Format: CPS Abstract - Malta 2026

Keywords: data ethics, machine learning, simulation

Session: CPS 10 Computation Simulation

Thursday 4 June 11 a.m. - noon (Europe/Malta)

Abstract

Recent examples of performance degradation in algorithms such as the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) have inspired the use of the term algorithmic bias to suggest that performance may vary drastically across different demographic subgroups. However, many have objected to this nomenclature, arguing that what is termed algorithmic bias is no more than data bias – an artifact of poor training data in some manner. Against this objection, we present two examples of performance degradation even under ideal data conditions. Specifically, we demonstrate that Support Vector Machine (SVM) and K-Nearest-Neighbors (KNN) can be made to overfit to training data even when it is class-balanced and identically distributed to the testing data. Moreover, the degradation in performance is markedly worse for one class label than the other, suggesting that similar deficiencies performed on real data with demographic features could result in what is commonly called algorithmic bias. We also perform a simulation study to demonstrate that the presented examples are not the only such instances in which algorithmic bias, as distinct from data bias, can occur. We conclude by showing that misidentifying algorithmic bias as mere data bias may result in overlooking available solutions for bias. We provide practical suggestions for mitigating algorithmic bias, including the use of fairness-aware classifiers, and reiterate the need to properly train and tune classification algorithms according to established, but not to be overlooked, standards of practice.