Cross-Validation with Antithetic Gaussian Randomization
Conference
65th ISI World Statistics Congress
Format: SIPS Abstract - WSC 2025
Keywords: cross-validation, randomization
Abstract
In this talk, I will introduce a new cross-validation method based on an equicorrelated Gaussian randomization scheme. The method is well-suited for problems where sample splitting is infeasible, such as when data observations violate the assumption of independent and identical distribution. Our method constructs train-test data pairs using externally generated Gaussian randomization variables. The key innovation in our proposal is to employ a carefully designed correlation structure among the randomization variables, which we refer to as antithetic Gaussian randomization. We show that this correlation is crucial in ensuring that the variance of our cross-validated estimator remains bounded while allowing the bias to vanish with just a few train-test repetitions. This desirable bias-variance property of our cross-validated estimator extends to a wide range of loss functions, including those commonly used for fitting generalized linear models.