Empirical Evaluation of Adversarial Surprise

Samyak Parajuli

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-203

August 17, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-203.pdf

In this report, we describe experiments supporting a new unsupervised reinforcement learning method, Adversarial Surprise, which has two policies with opposite objectives take turns controlling a single agent. The Explore policy maximizes entropy, putting the agent into surprising or unfamiliar situations. Then, the Control policy takes over and seeks to recover from those situations by minimizing entropy. Through multi-agent competition, this adversarial game between the two policies allows for the agent to both find increasingly surprising parts of the environment as well as learn to gain mastery over them. We show empirically that our method leads to more effective exploration of stochastic, partially-observed environments, is able to perform meaningful control to minimize surprise in these environments, and allows for the emergence of complex skills within these environments. We show that Adversarial Surprise is able to outperform existing intrinsic motivation methods based on active inference (SMiRL), novelty-seeking (Random Network Distillation (RND)), and multi-agent unsupervised RL (Asymmetric Self-Play (ASP)) in MiniGrid, Atari and VizDoom environments.

Advisors: Alexandre Bayen

BibTeX citation:

@mastersthesis{Parajuli:EECS-2021-203,
    Author= {Parajuli, Samyak},
    Title= {Empirical Evaluation of Adversarial Surprise},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {Aug},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-203.html},
    Number= {UCB/EECS-2021-203},
    Abstract= {In this report, we describe experiments supporting a new unsupervised reinforcement learning method, Adversarial Surprise, which has two policies with opposite objectives take turns controlling a single agent. The Explore policy maximizes entropy, putting the agent into surprising or unfamiliar situations. Then, the Control policy takes over and seeks to recover from those situations by minimizing entropy. Through multi-agent competition, this adversarial game between the two policies allows for the agent to both find increasingly surprising parts of the environment as well as learn to gain mastery over them. We show empirically that our method leads to more effective exploration of stochastic, partially-observed environments, is able to perform meaningful control to minimize surprise in these environments, and allows for the emergence of complex skills within these environments. We show that Adversarial Surprise is able to outperform existing intrinsic motivation methods based on active inference (SMiRL), novelty-seeking (Random Network Distillation (RND)), and multi-agent unsupervised RL (Asymmetric Self-Play (ASP)) in MiniGrid, Atari and VizDoom environments.},
}

EndNote citation:

%0 Thesis
%A Parajuli, Samyak 
%T Empirical Evaluation of Adversarial Surprise
%I EECS Department, University of California, Berkeley
%D 2021
%8 August 17
%@ UCB/EECS-2021-203
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-203.html
%F Parajuli:EECS-2021-203