Theory and Application of Bonus-based Exploration in Reinforcement Learning

Bryan Chen

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-57

May 12, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-57.pdf

In this work, we strive to narrow the gap between theory and practice in bonus-based exploration, providing some new connections between the UCB algorithm and Random Network Distillation as well as some observations about pre- and post-projection reward bonuses. We propose an algorithm that reduces to UCB in the linear case and empirically evaluate the algorithm in challenging exploration environments. In the Randomised Chain and Maze environments, our algorithm consistently outperforms Random Network Distillation in reaching unseen states during training.

Advisors: Jiantao Jiao

BibTeX citation:

@mastersthesis{Chen:EECS-2021-57,
    Author= {Chen, Bryan},
    Title= {Theory and Application of Bonus-based Exploration in Reinforcement Learning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-57.html},
    Number= {UCB/EECS-2021-57},
    Abstract= {In this work, we strive to narrow the gap between theory and practice in bonus-based exploration, providing some new connections between the UCB algorithm and Random Network Distillation as well as some observations about pre- and post-projection reward bonuses.  We propose an algorithm that reduces to UCB in the linear case and empirically evaluate the algorithm in challenging exploration environments. In the Randomised Chain and Maze  environments, our algorithm consistently outperforms Random Network Distillation in reaching unseen states during training.},
}

EndNote citation:

%0 Thesis
%A Chen, Bryan 
%T Theory and Application of Bonus-based Exploration in Reinforcement Learning
%I EECS Department, University of California, Berkeley
%D 2021
%8 May 12
%@ UCB/EECS-2021-57
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-57.html
%F Chen:EECS-2021-57