Empirical Evaluation of Adversarial Surprise
Samyak Parajuli
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2021-203
August 17, 2021
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-203.pdf
In this report, we describe experiments supporting a new unsupervised reinforcement learning method, Adversarial Surprise, which has two policies with opposite objectives take turns controlling a single agent. The Explore policy maximizes entropy, putting the agent into surprising or unfamiliar situations. Then, the Control policy takes over and seeks to recover from those situations by minimizing entropy. Through multi-agent competition, this adversarial game between the two policies allows for the agent to both find increasingly surprising parts of the environment as well as learn to gain mastery over them. We show empirically that our method leads to more effective exploration of stochastic, partially-observed environments, is able to perform meaningful control to minimize surprise in these environments, and allows for the emergence of complex skills within these environments. We show that Adversarial Surprise is able to outperform existing intrinsic motivation methods based on active inference (SMiRL), novelty-seeking (Random Network Distillation (RND)), and multi-agent unsupervised RL (Asymmetric Self-Play (ASP)) in MiniGrid, Atari and VizDoom environments.
Advisors: Alexandre Bayen
BibTeX citation:
@mastersthesis{Parajuli:EECS-2021-203, Author= {Parajuli, Samyak}, Title= {Empirical Evaluation of Adversarial Surprise}, School= {EECS Department, University of California, Berkeley}, Year= {2021}, Month= {Aug}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-203.html}, Number= {UCB/EECS-2021-203}, Abstract= {In this report, we describe experiments supporting a new unsupervised reinforcement learning method, Adversarial Surprise, which has two policies with opposite objectives take turns controlling a single agent. The Explore policy maximizes entropy, putting the agent into surprising or unfamiliar situations. Then, the Control policy takes over and seeks to recover from those situations by minimizing entropy. Through multi-agent competition, this adversarial game between the two policies allows for the agent to both find increasingly surprising parts of the environment as well as learn to gain mastery over them. We show empirically that our method leads to more effective exploration of stochastic, partially-observed environments, is able to perform meaningful control to minimize surprise in these environments, and allows for the emergence of complex skills within these environments. We show that Adversarial Surprise is able to outperform existing intrinsic motivation methods based on active inference (SMiRL), novelty-seeking (Random Network Distillation (RND)), and multi-agent unsupervised RL (Asymmetric Self-Play (ASP)) in MiniGrid, Atari and VizDoom environments.}, }
EndNote citation:
%0 Thesis %A Parajuli, Samyak %T Empirical Evaluation of Adversarial Surprise %I EECS Department, University of California, Berkeley %D 2021 %8 August 17 %@ UCB/EECS-2021-203 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-203.html %F Parajuli:EECS-2021-203