Goal-Directed Exploration and Skill Reuse

Vitchyr Pong

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2021-176
August 10, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-176.pdf

Reinforcement learning is a powerful paradigm for training agents to acquire complex behaviors, but it assumes that an external reward is provided by the environment. In practice, this task supervision is often hand-crafted by a user, a process that is time-consuming to repeat for every possible task and that makes manual engineering a primary bottleneck for behavior acquisition. This thesis describes how agents can acquire and reuse goal-directed behaviors in a completely self-supervised manner. It discusses challenges that arise when scaling up these methods to complex environments: How can an agent set goals for itself when it does not even know the set of possible states to explore? How does an agent autonomously reward itself for reaching a goal? How can an agent reuse this goal-directed behavior to decompose a new task into easier goal-reaching tasks? This thesis presents methods that I have developed to address these problems and share results that apply the methods to image-based, robot environments.

Advisor: Sergey Levine


BibTeX citation:

@phdthesis{Pong:EECS-2021-176,
    Author = {Pong, Vitchyr},
    Title = {Goal-Directed Exploration and Skill Reuse},
    School = {EECS Department, University of California, Berkeley},
    Year = {2021},
    Month = {Aug},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-176.html},
    Number = {UCB/EECS-2021-176},
    Abstract = {Reinforcement learning is a powerful paradigm for training agents to acquire complex behaviors, but it assumes that an external reward is provided by the environment. In practice, this task supervision is often hand-crafted by a user, a process that is time-consuming to repeat for every possible task and that makes manual engineering a primary bottleneck for behavior acquisition. This thesis describes how agents can acquire and reuse goal-directed behaviors in a completely self-supervised manner. It discusses challenges that arise when scaling up these methods to complex environments: How can an agent set goals for itself when it does not even know the set of possible states to explore? How does an agent autonomously reward itself for reaching a goal? How can an agent reuse this goal-directed behavior to decompose a new task into easier goal-reaching tasks? This thesis presents methods that I have developed to address these problems and share results that apply the methods to image-based, robot environments.}
}

EndNote citation:

%0 Thesis
%A Pong, Vitchyr
%T Goal-Directed Exploration and Skill Reuse
%I EECS Department, University of California, Berkeley
%D 2021
%8 August 10
%@ UCB/EECS-2021-176
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-176.html
%F Pong:EECS-2021-176