Overcoming Model-Bias in Reinforcement Learning
Ignasi Clavera Gilaberte
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2020-234
December 19, 2020
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-234.pdf
Autonomous skill acquisition has the potential to dramatically expand the tasks robots can perform in settings ranging from manufacturing to household robotics. Reinforcement learning oers a general framework that enables skill acquisition solely from environment interaction with little human supervision. As a result, reinforcement learning presents itself as a scalable approach for widespread adoption of robotic agents. While reinforcement learning has achieved tremendous success, it has been limited to simulated domains; such as video games, computer graphics, and board games. Its most promising methods typically require large amount of interaction with the environment to learn optimal policies. In real robotic systems, signicant interaction can cause wear and tear, create unsafe scenarios during the learning process, or become prohibitively time consuming to enable potential applications. One promising venue to minimize the interaction between the agent and environment are the methods under the umbrella of model-based reinforcement learning. Model-based methods are characterized by learning a predictive model of the environment that is used for learning a policy or planning. By exploiting the structure of the reinforcement learning problem and making a better use of the collected data, model-based methods can achieve better sample complexity. Previous to this work, model-based methods were limited to simple environments and tended to achieve lower performance than model-free methods. In here, we illustrate the model-bias problem: the set of diculties that prevent typical model- based methods to achieve optimal policies; and propose solutions that tackle model-bias. The methods proposed are able to achieve the same asymptotic performance as model- free methods while being two orders of magnitude more sample ecient. We unify these methods into an asynchronous model-based framework that allow fast and ecient learning. We successfully learn manipulation policies, such as block stacking and shape matching, on the real PR2 robot within 10 min of wall-clock time. Finally, we take a further step towards real-world robotics and propose a method that can eciently adapt to changes in the environment. We showcase it on a real 6-legged robot navigating on dierent terrains, like grass and rock.
Advisors: Pieter Abbeel
BibTeX citation:
@phdthesis{Clavera Gilaberte:EECS-2020-234, Author= {Clavera Gilaberte, Ignasi}, Title= {Overcoming Model-Bias in Reinforcement Learning}, School= {EECS Department, University of California, Berkeley}, Year= {2020}, Month= {Dec}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-234.html}, Number= {UCB/EECS-2020-234}, Abstract= {Autonomous skill acquisition has the potential to dramatically expand the tasks robots can perform in settings ranging from manufacturing to household robotics. Reinforcement learning oers a general framework that enables skill acquisition solely from environment interaction with little human supervision. As a result, reinforcement learning presents itself as a scalable approach for widespread adoption of robotic agents. While reinforcement learning has achieved tremendous success, it has been limited to simulated domains; such as video games, computer graphics, and board games. Its most promising methods typically require large amount of interaction with the environment to learn optimal policies. In real robotic systems, signicant interaction can cause wear and tear, create unsafe scenarios during the learning process, or become prohibitively time consuming to enable potential applications. One promising venue to minimize the interaction between the agent and environment are the methods under the umbrella of model-based reinforcement learning. Model-based methods are characterized by learning a predictive model of the environment that is used for learning a policy or planning. By exploiting the structure of the reinforcement learning problem and making a better use of the collected data, model-based methods can achieve better sample complexity. Previous to this work, model-based methods were limited to simple environments and tended to achieve lower performance than model-free methods. In here, we illustrate the model-bias problem: the set of diculties that prevent typical model- based methods to achieve optimal policies; and propose solutions that tackle model-bias. The methods proposed are able to achieve the same asymptotic performance as model- free methods while being two orders of magnitude more sample ecient. We unify these methods into an asynchronous model-based framework that allow fast and ecient learning. We successfully learn manipulation policies, such as block stacking and shape matching, on the real PR2 robot within 10 min of wall-clock time. Finally, we take a further step towards real-world robotics and propose a method that can eciently adapt to changes in the environment. We showcase it on a real 6-legged robot navigating on dierent terrains, like grass and rock.}, }
EndNote citation:
%0 Thesis %A Clavera Gilaberte, Ignasi %T Overcoming Model-Bias in Reinforcement Learning %I EECS Department, University of California, Berkeley %D 2020 %8 December 19 %@ UCB/EECS-2020-234 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-234.html %F Clavera Gilaberte:EECS-2020-234