Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing

Jinyuan Jia; Xiaoyu Cao; Binghui Wang; Neil Zhenqiang Gong

Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing

Jinyuan Jia, Xiaoyu Cao, Binghui Wang, Neil Zhenqiang Gong

Keywords: adversarial, imagenet, perturbation, randomized smoothing, robustness

Abstract Paper Code Reviews Chat

Tues Session 4 (17:00-19:00 GMT) [Live QA] [Cal]

Tues Session 5 (20:00-22:00 GMT) [Live QA] [Cal]

Abstract: It is well-known that classifiers are vulnerable to adversarial perturbations. To defend against adversarial perturbations, various certified robustness results have been derived. However, existing certified robustnesses are limited to top-1 predictions. In many real-world applications, top-$k$ predictions are more relevant. In this work, we aim to derive certified robustness for top-$k$ predictions. In particular, our certified robustness is based on randomized smoothing, which turns any classifier to a new classifier via adding noise to an input example. We adopt randomized smoothing because it is scalable to large-scale neural networks and applicable to any classifier. We derive a tight robustness in $\ell_2$ norm for top-$k$ predictions when using randomized smoothing with Gaussian noise. We find that generalizing the certified robustness from top-1 to top-$k$ predictions faces significant technical challenges. We also empirically evaluate our method on CIFAR10 and ImageNet. For example, our method can obtain an ImageNet classifier with a certified top-5 accuracy of 62.8\% when the $\ell_2$-norms of the adversarial perturbations are less than 0.5 (=127/255). Our code is publicly available at: \url{https://github.com/jjy1994/Certify_Topk}.

Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing

Jinyuan Jia, Xiaoyu Cao, Binghui Wang, Neil Zhenqiang Gong

Similar Papers

Provable robustness against all adversarial $l_p$-perturbations for $p\geq 1$

Francesco Croce, Matthias Hein,

MMA Training: Direct Input Space Margin Maximization through Adversarial Training

Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, Ruitong Huang,

BREAKING CERTIFIED DEFENSES: SEMANTIC ADVERSARIAL EXAMPLES WITH SPOOFED ROBUSTNESS CERTIFICATES

Amin Ghiasi, Ali Shafahi, Tom Goldstein,

Implicit Bias of Gradient Descent based Adversarial Training on Separable Data

Yan Li, Ethan X.Fang, Huan Xu, Tuo Zhao,