Ethel Newbold Award Lecture
Conference
Category: Bernoulli Society for Mathematical Statistics and Probability (BS)
Abstract
We study the advantages of using accelerated gradient methods,
specifically based on the Frank-Wolfe method and projected gradient
descent, for privacy and heavy-tailed robustness. Our approaches are
as follows: For the Frank-Wolfe method, our technique is based on a
tailored learning rate and a uniform lower bound on the gradient
l_2-norm over the constraint set. For accelerating projected gradient
descent, we use the popular variant based on Nesterov's momentum, and
we optimize our objective over R^p. These accelerations reduce
iteration complexity, translating into stronger statistical guarantees
for empirical and population risk minimization, for instance. Our
analysis covers three settings: non-random data, random model-free
data, and parametric models (linear regression and generalized linear
models). Methodologically, we approach both privacy and robustness
based on noisy gradients. We ensure differential privacy via the
Gaussian mechanism and advanced composition, and we achieve
heavy-tailed robustness using a geometric median-of-means estimator,
which also sharpens the dependency on the covariates' dimension.
Finally, we compare our rates to existing bounds and identify
scenarios where our methods attain optimal convergence.
This is joint work with Laurentiu Marchis (Cambridge).