Truth or backpropaganda? An empirical investigation of deep learning theory

Micah Goldblum, Jonas Geiping, Avi Schwarzschild, Michael Moeller, Tom Goldstein

Keywords: batch normalization, deep learning theory, generalization, loss landscape, neural tangent kernel, robustness

Wed Session 4 (17:00-19:00 GMT) [Live QA] [Cal]
Wed Session 5 (20:00-22:00 GMT) [Live QA] [Cal]
Wednesday: Theory

Abstract: We empirically evaluate common assumptions about neural networks that are widely held by practitioners and theorists alike. In this work, we: (1) prove the widespread existence of suboptimal local minima in the loss landscape of neural networks, and we use our theory to find examples; (2) show that small-norm parameters are not optimal for generalization; (3) demonstrate that ResNets do not conform to wide-network theories, such as the neural tangent kernel, and that the interaction between skip connections and batch normalization plays a role; (4) find that rank does not correlate with generalization or robustness in a practical setting.

Similar Papers

Understanding Generalization in Recurrent Neural Networks
Zhuozhuo Tu, Fengxiang He, Dacheng Tao,