Theory and Application of Bonus-based Exploration in Reinforcement Learning
Bryan Chen
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2021-57
May 12, 2021
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-57.pdf
In this work, we strive to narrow the gap between theory and practice in bonus-based exploration, providing some new connections between the UCB algorithm and Random Network Distillation as well as some observations about pre- and post-projection reward bonuses. We propose an algorithm that reduces to UCB in the linear case and empirically evaluate the algorithm in challenging exploration environments. In the Randomised Chain and Maze environments, our algorithm consistently outperforms Random Network Distillation in reaching unseen states during training.
Advisors: Jiantao Jiao
BibTeX citation:
@mastersthesis{Chen:EECS-2021-57, Author= {Chen, Bryan}, Title= {Theory and Application of Bonus-based Exploration in Reinforcement Learning}, School= {EECS Department, University of California, Berkeley}, Year= {2021}, Month= {May}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-57.html}, Number= {UCB/EECS-2021-57}, Abstract= {In this work, we strive to narrow the gap between theory and practice in bonus-based exploration, providing some new connections between the UCB algorithm and Random Network Distillation as well as some observations about pre- and post-projection reward bonuses. We propose an algorithm that reduces to UCB in the linear case and empirically evaluate the algorithm in challenging exploration environments. In the Randomised Chain and Maze environments, our algorithm consistently outperforms Random Network Distillation in reaching unseen states during training.}, }
EndNote citation:
%0 Thesis %A Chen, Bryan %T Theory and Application of Bonus-based Exploration in Reinforcement Learning %I EECS Department, University of California, Berkeley %D 2021 %8 May 12 %@ UCB/EECS-2021-57 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-57.html %F Chen:EECS-2021-57