Once for All: Train One Network and Specialize it for Efficient Deployment

Han Cai; Chuang Gan; Tianzhe Wang; Zhekai Zhang; Song Han

Once for All: Train One Network and Specialize it for Efficient Deployment

Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

Keywords: automl, imagenet, neural architecture search

Abstract Paper Code Reviews Chat

Wed Session 4 (17:00-19:00 GMT) [Live QA] [Cal]

Wed Session 5 (20:00-22:00 GMT) [Live QA] [Cal]

Abstract: We address the challenging problem of efficient deep learning model deployment across many devices, where the goal is to design neural network architectures that can fit diverse hardware platform constraints: from the cloud to the edge. Most of the traditional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally expensive and unscalable. Our key idea is to decouple model training from architecture search to save the cost. To this end, we propose to train a once-for-all network (OFA) that supports diverse architectural settings (depth, width, kernel size, and resolution). Given a deployment scenario, we can then quickly get a specialized sub-network by selecting from the OFA network without additional training. To prevent interference between many sub-networks during training, we also propose a novel progressive shrinking algorithm, which can train a surprisingly large number of sub-networks ($> 10^{19}$) simultaneously. Extensive experiments on various hardware platforms (CPU, GPU, mCPU, mGPU, FPGA accelerator) show that OFA consistently outperforms SOTA NAS methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNetV3) while reducing orders of magnitude GPU hours and $CO_2$ emission. In particular, OFA achieves a new SOTA 80.0% ImageNet top1 accuracy under the mobile setting ($<$600M FLOPs). Code and pre-trained models are released at https://github.com/mit-han-lab/once-for-all.

Once for All: Train One Network and Specialize it for Efficient Deployment

Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

Similar Papers

Fast Neural Network Adaptation via Parameter Remapping and Architecture Search

Jiemin Fang, Yuzhu Sun, Kangjian Peng, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang,

Picking Winning Tickets Before Training by Preserving Gradient Flow

Chaoqi Wang, Guodong Zhang, Roger Grosse,

PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search

Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, Hongkai Xiong,

Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing

Jinyuan Jia, Xiaoyu Cao, Binghui Wang, Neil Zhenqiang Gong,