Learning Generalized Reactive Policies using Deep Neural Networks

Edward Groshev

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2017-152
August 27, 2017

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-152.pdf

We consider the problem of learning for planning, where knowledge acquired while planning is reused to plan faster in new problem instances. For robotic tasks, among others, plan execution can be captured as a sequence of visual images. For such domains, we propose to use deep neural networks in learning for planning, based on learning a reactive policy that imitates execution traces produced by a planner. We investigate architectural properties of deep networks that are suitable for learning long-horizon planning behavior, and explore how to learn, in addition to the policy, a heuristic function that can be used with classical planners or search algorithms such as A*. Our results on the challenging Sokoban domain show that, with a suitable network design, complex decision making policies and powerful heuristic functions can be learned through imitation.

Advisor: Pieter Abbeel


BibTeX citation:

@mastersthesis{Groshev:EECS-2017-152,
    Author = {Groshev, Edward},
    Title = {Learning Generalized Reactive Policies using Deep Neural Networks},
    School = {EECS Department, University of California, Berkeley},
    Year = {2017},
    Month = {Aug},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-152.html},
    Number = {UCB/EECS-2017-152},
    Abstract = {We consider the problem of learning for planning, where knowledge acquired while planning is reused to plan faster in new problem instances. For robotic tasks, among others, plan execution can be captured as a sequence of visual images. For such domains, we propose to use deep neural networks in learning for planning, based on learning a reactive policy that imitates execution traces produced by a planner. We investigate architectural properties of deep networks that are suitable for learning long-horizon planning behavior, and explore how to learn, in addition to the policy, a heuristic function that can be used with classical planners or search algorithms such as A*. Our results on the challenging Sokoban domain show that, with a suitable network design, complex decision making policies and powerful heuristic functions can be learned through imitation.}
}

EndNote citation:

%0 Thesis
%A Groshev, Edward
%T Learning Generalized Reactive Policies using Deep Neural Networks
%I EECS Department, University of California, Berkeley
%D 2017
%8 August 27
%@ UCB/EECS-2017-152
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-152.html
%F Groshev:EECS-2017-152