The asymptotic spectrum of the Hessian of DNN throughout training

Arthur Jacot; Franck Gabriel; Clement Hongler

The asymptotic spectrum of the Hessian of DNN throughout training

Arthur Jacot, Franck Gabriel, Clement Hongler

Keywords: deep learning theory, gradient descent, loss surface, neural tangent kernel

Abstract Paper Reviews Chat

Tues Session 3 (12:00-14:00 GMT) [Live QA] [Cal]

Tues Session 5 (20:00-22:00 GMT) [Live QA] [Cal]

Abstract: The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK). In this article, we show that the NTK allows one to gain precise insight into the Hessian of the cost of DNNs: we obtain a full characterization of the asymptotics of the spectrum of the Hessian, at initialization and during training.

The asymptotic spectrum of the Hessian of DNN throughout training

Arthur Jacot, Franck Gabriel, Clement Hongler

Similar Papers

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks

Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu,

Finite Depth and Width Corrections to the Neural Tangent Kernel

Boris Hanin, Mihai Nica,

Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks

Yu Bai, Jason D. Lee,

Understanding Why Neural Networks Generalize Well Through GSNR of Parameters

Jinlong Liu, Yunzhi Bai, Guoqing Jiang, Ting Chen, Huayan Wang,