Contrastive Learning for Context-Based Off-Policy Actor-Critic Reinforcement Learning
Bernie Wang
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2020-219
December 18, 2020
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-219.pdf
Conventional reinforcement learning focuses on learning an optimal policy for a single task. Learning a large repertoire of tasks with this paradigm is inefficient in terms of the number of environment interactions. Meta-reinforcement learning aims to solve this problem by learning the underlying structure shared within a family of tasks. Recent work reformulates meta-reinforcement learning as contextual reinforcement learning where the contextual latent space is learned end-to-end. Representation of context is important and determines the ability of a policy to generalize. End-to-end reinforcement learning often struggle to generalize to unseen tasks because of the challenge of end-to-end policy optimization and representation learning. We introduce CoCOA: contrastive learning for context-based off-policy actor critic, which builds a contrastive learning framework on top of existing off-policy meta-RL. We evaluate CoCOA on a variety of continuous control and robotic manipulation tasks and show that adding a contrastive auxiliary task improves upon the policy returns and sample efficiency of end-to-end reinforcement learning.
Advisors: Kurt Keutzer
BibTeX citation:
@mastersthesis{Wang:EECS-2020-219, Author= {Wang, Bernie}, Title= {Contrastive Learning for Context-Based Off-Policy Actor-Critic Reinforcement Learning}, School= {EECS Department, University of California, Berkeley}, Year= {2020}, Month= {Dec}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-219.html}, Number= {UCB/EECS-2020-219}, Abstract= {Conventional reinforcement learning focuses on learning an optimal policy for a single task. Learning a large repertoire of tasks with this paradigm is inefficient in terms of the number of environment interactions. Meta-reinforcement learning aims to solve this problem by learning the underlying structure shared within a family of tasks. Recent work reformulates meta-reinforcement learning as contextual reinforcement learning where the contextual latent space is learned end-to-end. Representation of context is important and determines the ability of a policy to generalize. End-to-end reinforcement learning often struggle to generalize to unseen tasks because of the challenge of end-to-end policy optimization and representation learning. We introduce CoCOA: contrastive learning for context-based off-policy actor critic, which builds a contrastive learning framework on top of existing off-policy meta-RL. We evaluate CoCOA on a variety of continuous control and robotic manipulation tasks and show that adding a contrastive auxiliary task improves upon the policy returns and sample efficiency of end-to-end reinforcement learning.}, }
EndNote citation:
%0 Thesis %A Wang, Bernie %T Contrastive Learning for Context-Based Off-Policy Actor-Critic Reinforcement Learning %I EECS Department, University of California, Berkeley %D 2020 %8 December 18 %@ UCB/EECS-2020-219 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-219.html %F Wang:EECS-2020-219