Algorithmic Bias as Distinct from Mere Data Bias
Conference
Regional Statistics Conference 2026
Format: CPS Abstract - Malta 2026
Keywords: data ethics, machine learning, simulation
Session: CPS 10 Computation Simulation
Thursday 4 June 11 a.m. - noon (Europe/Malta)
Abstract
Recent examples of performance degradation in algorithms such as the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) have inspired the use of the term algorithmic bias to suggest that performance may vary drastically across different demographic subgroups. However, many have objected to this nomenclature, arguing that what is termed algorithmic bias is no more than data bias – an artifact of poor training data in some manner. Against this objection, we present two examples of performance degradation even under ideal data conditions. Specifically, we demonstrate that Support Vector Machine (SVM) and K-Nearest-Neighbors (KNN) can be made to overfit to training data even when it is class-balanced and identically distributed to the testing data. Moreover, the degradation in performance is markedly worse for one class label than the other, suggesting that similar deficiencies performed on real data with demographic features could result in what is commonly called algorithmic bias. We also perform a simulation study to demonstrate that the presented examples are not the only such instances in which algorithmic bias, as distinct from data bias, can occur. We conclude by showing that misidentifying algorithmic bias as mere data bias may result in overlooking available solutions for bias. We provide practical suggestions for mitigating algorithmic bias, including the use of fairness-aware classifiers, and reiterate the need to properly train and tune classification algorithms according to established, but not to be overlooked, standards of practice.