Goal-Directed Exploration and Skill Reuse
Vitchyr Pong
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2021-176
August 10, 2021
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-176.pdf
Reinforcement learning is a powerful paradigm for training agents to acquire complex behaviors, but it assumes that an external reward is provided by the environment. In practice, this task supervision is often hand-crafted by a user, a process that is time-consuming to repeat for every possible task and that makes manual engineering a primary bottleneck for behavior acquisition. This thesis describes how agents can acquire and reuse goal-directed behaviors in a completely self-supervised manner. It discusses challenges that arise when scaling up these methods to complex environments: How can an agent set goals for itself when it does not even know the set of possible states to explore? How does an agent autonomously reward itself for reaching a goal? How can an agent reuse this goal-directed behavior to decompose a new task into easier goal-reaching tasks? This thesis presents methods that I have developed to address these problems and share results that apply the methods to image-based, robot environments.
Advisors: Sergey Levine
BibTeX citation:
@phdthesis{Pong:EECS-2021-176, Author= {Pong, Vitchyr}, Title= {Goal-Directed Exploration and Skill Reuse}, School= {EECS Department, University of California, Berkeley}, Year= {2021}, Month= {Aug}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-176.html}, Number= {UCB/EECS-2021-176}, Abstract= {Reinforcement learning is a powerful paradigm for training agents to acquire complex behaviors, but it assumes that an external reward is provided by the environment. In practice, this task supervision is often hand-crafted by a user, a process that is time-consuming to repeat for every possible task and that makes manual engineering a primary bottleneck for behavior acquisition. This thesis describes how agents can acquire and reuse goal-directed behaviors in a completely self-supervised manner. It discusses challenges that arise when scaling up these methods to complex environments: How can an agent set goals for itself when it does not even know the set of possible states to explore? How does an agent autonomously reward itself for reaching a goal? How can an agent reuse this goal-directed behavior to decompose a new task into easier goal-reaching tasks? This thesis presents methods that I have developed to address these problems and share results that apply the methods to image-based, robot environments.}, }
EndNote citation:
%0 Thesis %A Pong, Vitchyr %T Goal-Directed Exploration and Skill Reuse %I EECS Department, University of California, Berkeley %D 2021 %8 August 10 %@ UCB/EECS-2021-176 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-176.html %F Pong:EECS-2021-176