Contrastive Learning for Context-Based Off-Policy Actor-Critic Reinforcement Learning

Bernie Wang

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2020-219

December 18, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-219.pdf

Conventional reinforcement learning focuses on learning an optimal policy for a single task. Learning a large repertoire of tasks with this paradigm is inefficient in terms of the number of environment interactions. Meta-reinforcement learning aims to solve this problem by learning the underlying structure shared within a family of tasks. Recent work reformulates meta-reinforcement learning as contextual reinforcement learning where the contextual latent space is learned end-to-end. Representation of context is important and determines the ability of a policy to generalize. End-to-end reinforcement learning often struggle to generalize to unseen tasks because of the challenge of end-to-end policy optimization and representation learning. We introduce CoCOA: contrastive learning for context-based off-policy actor critic, which builds a contrastive learning framework on top of existing off-policy meta-RL. We evaluate CoCOA on a variety of continuous control and robotic manipulation tasks and show that adding a contrastive auxiliary task improves upon the policy returns and sample efficiency of end-to-end reinforcement learning.

Advisors: Kurt Keutzer

BibTeX citation:

@mastersthesis{Wang:EECS-2020-219,
    Author= {Wang, Bernie},
    Title= {Contrastive Learning for Context-Based Off-Policy Actor-Critic Reinforcement Learning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2020},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-219.html},
    Number= {UCB/EECS-2020-219},
    Abstract= {Conventional reinforcement learning focuses on learning an optimal policy for a single task. Learning a large repertoire of tasks with this paradigm is inefficient in terms of the number of environment interactions. Meta-reinforcement learning aims to solve this problem by learning the underlying structure shared within a family of tasks. Recent work reformulates meta-reinforcement learning as contextual reinforcement learning where the contextual latent space is learned end-to-end. Representation of context is important and determines the ability of a policy to generalize. End-to-end reinforcement learning often struggle to generalize to unseen tasks because of the challenge of end-to-end policy optimization and representation learning. We introduce CoCOA: contrastive learning for context-based off-policy actor critic, which builds a contrastive learning framework on top of existing off-policy meta-RL. We evaluate CoCOA on a variety of continuous control and robotic manipulation tasks and show that adding a contrastive auxiliary task improves upon the policy returns and sample efficiency of end-to-end reinforcement learning.},
}

EndNote citation:

%0 Thesis
%A Wang, Bernie 
%T Contrastive Learning for Context-Based Off-Policy Actor-Critic Reinforcement Learning
%I EECS Department, University of California, Berkeley
%D 2020
%8 December 18
%@ UCB/EECS-2020-219
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-219.html
%F Wang:EECS-2020-219