On the Convergence of FedAvg on Non-IID Data

Xiang Li; Kaixuan Huang; Wenhao Yang; Shusen Wang; Zhihua Zhang

On the Convergence of FedAvg on Non-IID Data

Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang

Keywords: federated learning, gradient descent, learning rate, optimization

Abstract Paper Code Reviews Chat

Mon Session 1 (05:00-07:00 GMT) [Live QA] [Cal]

Mon Session 2 (08:00-10:00 GMT) [Live QA] [Cal]

Monday: Optimisation Theory

Abstract: Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realistic settings. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. Importantly, our bound demonstrates a trade-off between communication-efficiency and convergence rate. As user devices may be disconnected from the server, we relax the assumption of full device participation to partial device participation and study different averaging schemes; low device participation rate can be achieved without severely slowing down the learning. Our results indicate that heterogeneity of data slows down the convergence, which matches empirical observations. Furthermore, we provide a necessary condition for \texttt{FedAvg} on non-iid data: the learning rate $\eta$ must decay, even if full-gradient is used; otherwise, the solution will be $\Omega (\eta)$ away from the optimal.

On the Convergence of FedAvg on Non-IID Data

Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang

Similar Papers

Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei Zhang, Xiaodong Cui, Payel Das, Tianbao Yang,

Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

Yuanhao Wang, Kefan Dong, Xiaoyu Chen, Liwei Wang,

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Pan Xu, Felicia Gao, Quanquan Gu,

Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks

Ziwei Ji, Matus Telgarsky,