Contrastive Learning for Context-Based Off-Policy Actor-Critic Reinforcement Learning

Bernie Wang

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2020-219
December 18, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-219.pdf

Conventional reinforcement learning focuses on learning an optimal policy for a single task. Learning a large repertoire of tasks with this paradigm is inefficient in terms of the number of environment interactions. Meta-reinforcement learning aims to solve this problem by learning the underlying structure shared within a family of tasks. Recent work reformulates meta-reinforcement learning as contextual reinforcement learning where the contextual latent space is learned end-to-end. Representation of context is important and determines the ability of a policy to generalize. End-to-end reinforcement learning often struggle to generalize to unseen tasks because of the challenge of end-to-end policy optimization and representation learning. We introduce CoCOA: contrastive learning for context-based off-policy actor critic, which builds a contrastive learning framework on top of existing off-policy meta-RL. We evaluate CoCOA on a variety of continuous control and robotic manipulation tasks and show that adding a contrastive auxiliary task improves upon the policy returns and sample efficiency of end-to-end reinforcement learning.

Advisor: Kurt Keutzer


BibTeX citation:

@mastersthesis{Wang:EECS-2020-219,
    Author = {Wang, Bernie},
    Title = {Contrastive Learning for Context-Based Off-Policy Actor-Critic Reinforcement Learning},
    School = {EECS Department, University of California, Berkeley},
    Year = {2020},
    Month = {Dec},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-219.html},
    Number = {UCB/EECS-2020-219},
    Abstract = {Conventional reinforcement learning focuses on learning an optimal policy for a single task. Learning a large repertoire of tasks with this paradigm is inefficient in terms of the number of environment interactions. Meta-reinforcement learning aims to solve this problem by learning the underlying structure shared within a family of tasks. Recent work reformulates meta-reinforcement learning as contextual reinforcement learning where the contextual latent space is learned end-to-end. Representation of context is important and determines the ability of a policy to generalize. End-to-end reinforcement learning often struggle to generalize to unseen tasks because of the challenge of end-to-end policy optimization and representation learning. We introduce CoCOA: contrastive learning for context-based off-policy actor critic, which builds a contrastive learning framework on top of existing off-policy meta-RL. We evaluate CoCOA on a variety of continuous control and robotic manipulation tasks and show that adding a contrastive auxiliary task improves upon the policy returns and sample efficiency of end-to-end reinforcement learning.}
}

EndNote citation:

%0 Thesis
%A Wang, Bernie
%T Contrastive Learning for Context-Based Off-Policy Actor-Critic Reinforcement Learning
%I EECS Department, University of California, Berkeley
%D 2020
%8 December 18
%@ UCB/EECS-2020-219
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-219.html
%F Wang:EECS-2020-219