Overcoming Model-Bias in Reinforcement Learning

Ignasi Clavera Gilaberte

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2020-234
December 19, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-234.pdf

Autonomous skill acquisition has the potential to dramatically expand the tasks robots can perform in settings ranging from manufacturing to household robotics. Reinforcement learning onullers a general framework that enables skill acquisition solely from environment interaction with little human supervision. As a result, reinforcement learning presents itself as a scalable approach for widespread adoption of robotic agents. While reinforcement learning has achieved tremendous success, it has been limited to simulated domains; such as video games, computer graphics, and board games. Its most promising methods typically require large amount of interaction with the environment to learn optimal policies. In real robotic systems, signi cant interaction can cause wear and tear, create unsafe scenarios during the learning process, or become prohibitively time consuming to enable potential applications. One promising venue to minimize the interaction between the agent and environment are the methods under the umbrella of model-based reinforcement learning. Model-based methods are characterized by learning a predictive model of the environment that is used for learning a policy or planning. By exploiting the structure of the reinforcement learning problem and making a better use of the collected data, model-based methods can achieve better sample complexity. Previous to this work, model-based methods were limited to simple environments and tended to achieve lower performance than model-free methods. In here, we illustrate the model-bias problem: the set of dinullculties that prevent typical model- based methods to achieve optimal policies; and propose solutions that tackle model-bias. The methods proposed are able to achieve the same asymptotic performance as model- free methods while being two orders of magnitude more sample enullcient. We unify these methods into an asynchronous model-based framework that allow fast and enullcient learning. We successfully learn manipulation policies, such as block stacking and shape matching, on the real PR2 robot within 10 min of wall-clock time. Finally, we take a further step towards real-world robotics and propose a method that can enullciently adapt to changes in the environment. We showcase it on a real 6-legged robot navigating on dinullerent terrains, like grass and rock.

Advisor: Pieter Abbeel


BibTeX citation:

@phdthesis{Clavera Gilaberte:EECS-2020-234,
    Author = {Clavera Gilaberte, Ignasi},
    Title = {Overcoming Model-Bias in Reinforcement Learning},
    School = {EECS Department, University of California, Berkeley},
    Year = {2020},
    Month = {Dec},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-234.html},
    Number = {UCB/EECS-2020-234},
    Abstract = {Autonomous skill acquisition has the potential to dramatically expand the tasks robots
can perform in settings ranging from manufacturing to household robotics. Reinforcement
learning oers a general framework that enables skill acquisition solely from environment
interaction with little human supervision. As a result, reinforcement learning presents itself
as a scalable approach for widespread adoption of robotic agents. While reinforcement
learning has achieved tremendous success, it has been limited to simulated domains; such as
video games, computer graphics, and board games. Its most promising methods typically
require large amount of interaction with the environment to learn optimal policies. In real
robotic systems, signicant interaction can cause wear and tear, create unsafe scenarios
during the learning process, or become prohibitively time consuming to enable potential
applications.
One promising venue to minimize the interaction between the agent and environment
are the methods under the umbrella of model-based reinforcement learning. Model-based
methods are characterized by learning a predictive model of the environment that is used
for learning a policy or planning. By exploiting the structure of the reinforcement learning
problem and making a better use of the collected data, model-based methods can achieve
better sample complexity. Previous to this work, model-based methods were limited to
simple environments and tended to achieve lower performance than model-free methods. In
here, we illustrate the model-bias problem: the set of diculties that prevent typical model-
based methods to achieve optimal policies; and propose solutions that tackle model-bias.
The methods proposed are able to achieve the same asymptotic performance as model-
free methods while being two orders of magnitude more sample ecient. We unify these
methods into an asynchronous model-based framework that allow fast and ecient learning.
We successfully learn manipulation policies, such as block stacking and shape matching,
on the real PR2 robot within 10 min of wall-clock time. Finally, we take a further step
towards real-world robotics and propose a method that can eciently adapt to changes in
the environment. We showcase it on a real 6-legged robot navigating on dierent terrains,
like grass and rock.}
}

EndNote citation:

%0 Thesis
%A Clavera Gilaberte, Ignasi
%T Overcoming Model-Bias in Reinforcement Learning
%I EECS Department, University of California, Berkeley
%D 2020
%8 December 19
%@ UCB/EECS-2020-234
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-234.html
%F Clavera Gilaberte:EECS-2020-234