Goal-Directed Exploration and Skill Reuse

Vitchyr Pong

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-176

August 10, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-176.pdf

Reinforcement learning is a powerful paradigm for training agents to acquire complex behaviors, but it assumes that an external reward is provided by the environment. In practice, this task supervision is often hand-crafted by a user, a process that is time-consuming to repeat for every possible task and that makes manual engineering a primary bottleneck for behavior acquisition. This thesis describes how agents can acquire and reuse goal-directed behaviors in a completely self-supervised manner. It discusses challenges that arise when scaling up these methods to complex environments: How can an agent set goals for itself when it does not even know the set of possible states to explore? How does an agent autonomously reward itself for reaching a goal? How can an agent reuse this goal-directed behavior to decompose a new task into easier goal-reaching tasks? This thesis presents methods that I have developed to address these problems and share results that apply the methods to image-based, robot environments.

Advisors: Sergey Levine

BibTeX citation:

@phdthesis{Pong:EECS-2021-176,
    Author= {Pong, Vitchyr},
    Title= {Goal-Directed Exploration and Skill Reuse},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {Aug},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-176.html},
    Number= {UCB/EECS-2021-176},
    Abstract= {Reinforcement learning is a powerful paradigm for training agents to acquire complex behaviors, but it assumes that an external reward is provided by the environment. In practice, this task supervision is often hand-crafted by a user, a process that is time-consuming to repeat for every possible task and that makes manual engineering a primary bottleneck for behavior acquisition. This thesis describes how agents can acquire and reuse goal-directed behaviors in a completely self-supervised manner. It discusses challenges that arise when scaling up these methods to complex environments: How can an agent set goals for itself when it does not even know the set of possible states to explore? How does an agent autonomously reward itself for reaching a goal? How can an agent reuse this goal-directed behavior to decompose a new task into easier goal-reaching tasks? This thesis presents methods that I have developed to address these problems and share results that apply the methods to image-based, robot environments.},
}

EndNote citation:

%0 Thesis
%A Pong, Vitchyr 
%T Goal-Directed Exploration and Skill Reuse
%I EECS Department, University of California, Berkeley
%D 2021
%8 August 10
%@ UCB/EECS-2021-176
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-176.html
%F Pong:EECS-2021-176