Mutual Information for Exploration in Reinforcement Learning
Abhinav Gopal
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2022-54
May 10, 2022
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-54.pdf
Reinforcement Learning (RL) is a rapidly growing area of interest in the Artificial Intelligence community, with tremendous applications. As a result, there is a need to improve efficiency and exploration in RL algorithms to promote quicker and improved learning. We introduce MIRL: Mutual Information for Beneficial Exploration in RL, which considers the use of the mutual information between an action and the expected "future" from a given state as an additional reward to improve exploration. Using MIRL, agents learn to exploit "decision states" that lead to highly specialized futures.
Advisors: John F. Canny
BibTeX citation:
@mastersthesis{Gopal:EECS-2022-54, Author= {Gopal, Abhinav}, Title= {Mutual Information for Exploration in Reinforcement Learning}, School= {EECS Department, University of California, Berkeley}, Year= {2022}, Month= {May}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-54.html}, Number= {UCB/EECS-2022-54}, Abstract= {Reinforcement Learning (RL) is a rapidly growing area of interest in the Artificial Intelligence community, with tremendous applications. As a result, there is a need to improve efficiency and exploration in RL algorithms to promote quicker and improved learning. We introduce MIRL: Mutual Information for Beneficial Exploration in RL, which considers the use of the mutual information between an action and the expected "future" from a given state as an additional reward to improve exploration. Using MIRL, agents learn to exploit "decision states" that lead to highly specialized futures.}, }
EndNote citation:
%0 Thesis %A Gopal, Abhinav %T Mutual Information for Exploration in Reinforcement Learning %I EECS Department, University of California, Berkeley %D 2022 %8 May 10 %@ UCB/EECS-2022-54 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-54.html %F Gopal:EECS-2022-54