Truth or backpropaganda? An empirical investigation of deep learning theory

Micah Goldblum; Jonas Geiping; Avi Schwarzschild; Michael Moeller; Tom Goldstein

Truth or backpropaganda? An empirical investigation of deep learning theory

Micah Goldblum, Jonas Geiping, Avi Schwarzschild, Michael Moeller, Tom Goldstein

Keywords: batch normalization, deep learning theory, generalization, loss landscape, neural tangent kernel, robustness

Abstract Paper Code Reviews Chat

Wed Session 4 (17:00-19:00 GMT) [Live QA] [Cal]

Wed Session 5 (20:00-22:00 GMT) [Live QA] [Cal]

Wednesday: Theory

Abstract: We empirically evaluate common assumptions about neural networks that are widely held by practitioners and theorists alike. In this work, we: (1) prove the widespread existence of suboptimal local minima in the loss landscape of neural networks, and we use our theory to find examples; (2) show that small-norm parameters are not optimal for generalization; (3) demonstrate that ResNets do not conform to wide-network theories, such as the neural tangent kernel, and that the interaction between skip connections and batch normalization plays a role; (4) find that rank does not correlate with generalization or robustness in a practical setting.

Truth or backpropaganda? An empirical investigation of deep learning theory

Micah Goldblum, Jonas Geiping, Avi Schwarzschild, Michael Moeller, Tom Goldstein

Similar Papers

Four Things Everyone Should Know to Improve Batch Normalization

Cecilia Summers, Michael J. Dinneen,

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Yu Bai, Jason D. Lee,

Understanding Generalization in Recurrent Neural Networks

Zhuozhuo Tu, Fengxiang He, Dacheng Tao,

Piecewise linear activations substantially shape the loss surfaces of neural networks

Fengxiang He, Bohan Wang, Dacheng Tao,