Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control

Tsui-Wei Weng; Krishnamurthy (Dj) Dvijotham; Jonathan Uesato; Kai Xiao; Sven Gowal; Robert Stanforth; Pushmeet Kohli

Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control

Tsui-Wei Weng, Krishnamurthy (Dj) Dvijotham, Jonathan Uesato, Kai Xiao, Sven Gowal, Robert Stanforth, Pushmeet Kohli

Keywords: adversarial, adversarial attacks, continuous control, perturbation, reinforcement learning, robustness

Abstract Paper Reviews Chat

Mon Session 3 (12:00-14:00 GMT) [Live QA] [Cal]

Mon Session 4 (17:00-19:00 GMT) [Live QA] [Cal]

Abstract: Deep reinforcement learning has achieved great success in many previously difficult reinforcement learning tasks, yet recent studies show that deep RL agents are also unavoidably susceptible to adversarial perturbations, similar to deep neural networks in classification tasks. Prior works mostly focus on model-free adversarial attacks and agents with discrete actions. In this work, we study the problem of continuous control agents in deep RL with adversarial attacks and propose the first two-step algorithm based on learned model dynamics. Extensive experiments on various MuJoCo domains (Cartpole, Fish, Walker, Humanoid) demonstrate that our proposed framework is much more effective and efficient than model-free based attacks baselines in degrading agent performance as well as driving agents to unsafe states.

Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control

Tsui-Wei Weng, Krishnamurthy (Dj) Dvijotham, Jonathan Uesato, Kai Xiao, Sven Gowal, Robert Stanforth, Pushmeet Kohli

Similar Papers

Defending Against Physically Realizable Attacks on Image Classification

Tong Wu, Liang Tong, Yevgeniy Vorobeychik,

Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks

Jiadong Lin, Chuanbiao Song, Kun He, Liwei Wang, John E. Hopcroft,

Adversarial Policies: Attacking Deep Reinforcement Learning

Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell,

Unrestricted Adversarial Examples via Semantic Manipulation

Anand Bhattad, Min Jin Chong, Kaizhao Liang, Bo Li, D. A. Forsyth,