Abhinav Gopal

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-54

May 10, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-54.pdf

Reinforcement Learning (RL) is a rapidly growing area of interest in the Artificial Intelligence community, with tremendous applications. As a result, there is a need to improve efficiency and exploration in RL algorithms to promote quicker and improved learning. We introduce MIRL: Mutual Information for Beneficial Exploration in RL, which considers the use of the mutual information between an action and the expected "future" from a given state as an additional reward to improve exploration. Using MIRL, agents learn to exploit "decision states" that lead to highly specialized futures.

Advisors: John F. Canny


BibTeX citation:

@mastersthesis{Gopal:EECS-2022-54,
    Author= {Gopal, Abhinav},
    Title= {Mutual Information for Exploration in Reinforcement Learning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-54.html},
    Number= {UCB/EECS-2022-54},
    Abstract= {Reinforcement Learning (RL) is a rapidly growing area of interest in the Artificial Intelligence community, with tremendous applications. As a result, there is a need to improve efficiency and exploration in RL algorithms to promote quicker and improved learning. We introduce MIRL: Mutual Information for Beneficial Exploration in RL, which considers the use of the mutual information between an action and the expected "future" from a given state as an additional reward to improve exploration. Using MIRL, agents learn to exploit "decision states" that lead to highly specialized futures.},
}

EndNote citation:

%0 Thesis
%A Gopal, Abhinav 
%T Mutual Information for Exploration in Reinforcement Learning
%I EECS Department, University of California, Berkeley
%D 2022
%8 May 10
%@ UCB/EECS-2022-54
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-54.html
%F Gopal:EECS-2022-54