Transferring Optimality Across Data Distributions via Homotopy Methods

Matilde Gargiani; Andrea Zanelli; Quoc Tran Dinh; Moritz Diehl; Frank Hutter

Transferring Optimality Across Data Distributions via Homotopy Methods

Matilde Gargiani, Andrea Zanelli, Quoc Tran Dinh, Moritz Diehl, Frank Hutter

Keywords: fine tuning, gradient descent, optimization, regression, transfer learning

Abstract Paper Reviews Chat

Mon Session 3 (12:00-14:00 GMT) [Live QA] [Cal]

Mon Session 4 (17:00-19:00 GMT) [Live QA] [Cal]

Abstract: Homotopy methods, also known as continuation methods, are a powerful mathematical tool to efficiently solve various problems in numerical analysis, including complex non-convex optimization problems where no or only little prior knowledge regarding the localization of the solutions is available. In this work, we propose a novel homotopy-based numerical method that can be used to transfer knowledge regarding the localization of an optimum across different task distributions in deep learning applications. We validate the proposed methodology with some empirical evaluations in the regression and classification scenarios, where it shows that superior numerical performance can be achieved in popular deep learning benchmarks, i.e. FashionMNIST, CIFAR-10, and draw connections with the widely used fine-tuning heuristic. In addition, we give more insights on the properties of a general homotopy method when used in combination with Stochastic Gradient Descent by conducting a general local theoretical analysis in a simplified setting.

Transferring Optimality Across Data Distributions via Homotopy Methods

Matilde Gargiani, Andrea Zanelli, Quoc Tran Dinh, Moritz Diehl, Frank Hutter

Similar Papers

Escaping Saddle Points Faster with Stochastic Momentum

Jun-Kun Wang, Chi-Heng Lin, Jacob Abernethy,

Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity

Jingzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie,

ProxSGD: Training Structured Neural Networks under Regularization and Constraints

Yang Yang, Yaxiong Yuan, Avraam Chatzimichailidis, Ruud JG van Sloun, Lei Lei, Symeon Chatzinotas,

Gradients as Features for Deep Representation Learning

Fangzhou Mu, Yingyu Liang, Yin Li,