Alexander Li

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2020-110

May 29, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-110.pdf

Machine learning has made great strides on a variety of tasks, including image classification, natural language understanding, robotic control, and game-playing. However, much less progress has been made on efficient multi-task learning: generalist algorithms that can leverage experience on previous tasks to quickly excel at new ones. Such algorithms are necessary in order to efficiently perform new tasks when data, compute, time, or energy is limited. In this thesis, we develop two novel algorithms for multi-task reinforcement learning.

First, we examine the potential for improving cross-task generalization in hierarchical reinforcement learning. We derive a new hierarchical policy gradient with an unbiased latent-dependent baseline, and we introduce Hierarchical Proximal Policy Optimization (HiPPO), an on-policy method to efficiently train all levels of the hierarchy jointly. This allows us to discover robust and transferable skills, and quickly learn how to perform a new task by finetuning skills learned on similar environments.

Second, we introduce Generalized Hindsight, which is based on the insight that unsuccessful attempts to solve one task are often a rich source of information for other tasks. Generalized Hindsight is an approximate inverse reinforcement learning technique that matches generated behaviors with the tasks they are best suited for, before being used by an off-policy RL optimizer. Generalized Hindsight is substantially more sample-efficient than standard relabeling techniques, which we empirically demonstrate on a suite of multi-task navigation and manipulation tasks.

Advisors: Pieter Abbeel


BibTeX citation:

@mastersthesis{Li:EECS-2020-110,
    Author= {Li, Alexander},
    Editor= {Abbeel, Pieter},
    Title= {Algorithms for Multi-task Reinforcement Learning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2020},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-110.html},
    Number= {UCB/EECS-2020-110},
    Abstract= {Machine learning has made great strides on a variety of tasks, including image classification, natural language understanding, robotic control, and game-playing. However, much less progress has been made on efficient multi-task learning: generalist algorithms that can leverage experience on previous tasks to quickly excel at new ones. Such algorithms are necessary in order to efficiently perform new tasks when data, compute, time, or energy is limited. In this thesis, we develop two novel algorithms for multi-task reinforcement learning.

First, we examine the potential for improving cross-task generalization in hierarchical reinforcement learning. We derive a new hierarchical policy gradient with an unbiased latent-dependent baseline, and we introduce Hierarchical Proximal Policy Optimization (HiPPO), an on-policy method to efficiently train all levels of the hierarchy jointly. This allows us to discover robust and transferable skills, and quickly learn how to perform a new task by finetuning skills learned on similar environments. 

Second, we introduce Generalized Hindsight, which is based on the insight that unsuccessful attempts to solve one task are often a rich source of information for other tasks. Generalized Hindsight is an approximate inverse reinforcement learning technique that matches generated behaviors with the tasks they are best suited for, before being used by an off-policy RL optimizer. Generalized Hindsight is substantially more sample-efficient than standard relabeling techniques, which we empirically demonstrate on a suite of multi-task navigation and manipulation tasks.},
}

EndNote citation:

%0 Thesis
%A Li, Alexander 
%E Abbeel, Pieter 
%T Algorithms for Multi-task Reinforcement Learning
%I EECS Department, University of California, Berkeley
%D 2020
%8 May 29
%@ UCB/EECS-2020-110
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-110.html
%F Li:EECS-2020-110