Risk Averse Robust Adversarial Reinforcement Learning

Xinlei Pan, Daniel Seita, Yang Gao and John Canny

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2019-164
December 1, 2019

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-164.pdf

Deep reinforcement learning has recently made significant progress in solving computer games and robotic control tasks. A known problem, though, is that policies overfit to the training environ- ment and may not avoid rare, catastrophic events such as automotive accidents. A classical technique for improving the robustness of reinforcement learning algorithms is to train on a set of randomized environments, but this approach only guards against common situations. Recently, robust adversarial reinforcement learning (RARL) was developed, which allows efficient applications of random and systematic perturbations by a trained adversary. A limitation of RARL is that only the expected control objective is optimized; there is no explicit modeling or optimization of risk. Thus the agents do not consider the probability of catastrophic events (i.e., those inducing abnormally large negative reward), except through their effect on the expected objective. In this paper we introduce risk-averse robust adversarial reinforcement learning (RARARL), using a risk-averse protagonist and a risk-seeking adversary. We test our approach on a self-driving vehicle controller. We use an ensemble of policy networks to model risk as the variance of value functions. We show through experiments that a risk-averse agent is better equipped to handle a risk-seeking adversary, and experiences substantially fewer crashes compared to agents trained without an adversary.

Advisor: John F. Canny


BibTeX citation:

@mastersthesis{Pan:EECS-2019-164,
    Author = {Pan, Xinlei and Seita, Daniel and Gao, Yang and Canny, John},
    Title = {Risk Averse Robust Adversarial Reinforcement Learning},
    School = {EECS Department, University of California, Berkeley},
    Year = {2019},
    Month = {Dec},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-164.html},
    Number = {UCB/EECS-2019-164},
    Abstract = {Deep reinforcement learning has recently made significant progress in solving computer games and robotic control tasks. A known problem, though, is that policies overfit to the training environ- ment and may not avoid rare, catastrophic events such as automotive accidents. A classical technique for improving the robustness of reinforcement learning algorithms is to train on a set of randomized environments, but this approach only guards against common situations. Recently, robust adversarial reinforcement learning (RARL) was developed, which allows efficient applications of random and systematic perturbations by a trained adversary. A limitation of RARL is that only the expected control objective is optimized; there is no explicit modeling or optimization of risk. Thus the agents do not consider the probability of catastrophic events (i.e., those inducing abnormally large negative reward), except through their effect on the expected objective. In this paper we introduce risk-averse robust adversarial reinforcement learning (RARARL), using a risk-averse protagonist and a risk-seeking adversary. We test our approach on a self-driving vehicle controller. We use an ensemble of policy networks to model risk as the variance of value functions. We show through experiments that a risk-averse agent is better equipped to handle a risk-seeking adversary, and experiences substantially fewer crashes compared to agents trained without an adversary.}
}

EndNote citation:

%0 Thesis
%A Pan, Xinlei
%A Seita, Daniel
%A Gao, Yang
%A Canny, John
%T Risk Averse Robust Adversarial Reinforcement Learning
%I EECS Department, University of California, Berkeley
%D 2019
%8 December 1
%@ UCB/EECS-2019-164
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-164.html
%F Pan:EECS-2019-164