The asymptotic spectrum of the Hessian of DNN throughout training

Arthur Jacot, Franck Gabriel, Clement Hongler

Keywords: deep learning theory, gradient descent, loss surface, neural tangent kernel

Tues Session 3 (12:00-14:00 GMT) [Live QA] [Cal]
Tues Session 5 (20:00-22:00 GMT) [Live QA] [Cal]

Abstract: The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK). In this article, we show that the NTK allows one to gain precise insight into the Hessian of the cost of DNNs: we obtain a full characterization of the asymptotics of the spectrum of the Hessian, at initialization and during training.

Similar Papers

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks
Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu,
Understanding Why Neural Networks Generalize Well Through GSNR of Parameters
Jinlong Liu, Yunzhi Bai, Guoqing Jiang, Ting Chen, Huayan Wang,