Algorithms Seminar

The Secret Lives of Optimizers: How Optimization Algorithms Implicitly Shape the Loss Landscape

October 19, -


Deep learning relies on the ability of first-order optimization algorithms, such as stochastic gradient descent (SGD) and Adam, to navigate high-dimensional non-convex loss landscapes and find minimizers that generalize to data outside of the training set. However, this process remains poorly understood. Common heuristics in the deep learning community, like the notion that SGD generalizes better than Adam despite training slower or that decreasing the batch size improves generalization, hint at complex dynamics not fully captured by traditional optimization theories. To bridge this gap, researchers observed that, in addition to minimizing the training loss, optimizers like SGD play an active role in shaping the geometry of the loss landscape. This phenomenon, known as implicit regularization, suggests that optimizers have innate preferences for certain types of minimizers of the training loss. I will present two recent results on the implicit regularization effect of SGD, highlighting its strong preference for “flatter” minimizers, which tend to generalize better.


Alex Damian, Ph.D. student in the Program in Applied and Computational Mathematics at Princeton University, is currently working on understanding implicit regularization and feature learning in deep learning and his research interests broadly lie in optimization and machine learning.


LSRC D344 or join virtually via Zoom


Yiheng Shen