Accelerate Then Imitate: Learning from Task and Motion Planing

Michael McDonald

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-96

May 14, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-96.pdf

The capabilities of both imitation and reinforcement learning for robotics have burgeoned with the advent of deep learning, but these methods still struggle to extend to tasks with long time horizons. Hierarchical policy learning and goal-conditioning in policies have offered great promise in overcoming this limitation, but still cannot match the horizons or reliability of classical planners. Task and motion planning remains the gold standard for high-precision, multi-step tasks but suffers from computational burden and difficulties in planning directly from sensor data - limitations that neural networks do not have. In this work, we propose an asynchronous training method to integrate imitation learning into task and motion planning. Our method trains goal-conditioned hierarchical policies to emulate the planner, and in turn uses those policies to accelerate the planner and generation of training data. In robotic manipulation tasks, the partially trained policies achieve a 2x reduction in the combined time for motion plan refinement and simulated execution. For 7 DOF robotic pick-place tasks, our method produces end-to-end policies capable of placing four objects with an 86% success rate. And for 2d navigational pick-place tasks with high-dimension goals, our method can place five objects with an 88% success rate when working from state observations or an 83% success rate for three objects when using camera images.

Advisors: Anca Dragan

BibTeX citation:

@mastersthesis{McDonald:EECS-2021-96,
    Author= {McDonald, Michael},
    Editor= {Hadfield-Menell, Dylan},
    Title= {Accelerate Then Imitate: Learning from Task and Motion Planing},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-96.html},
    Number= {UCB/EECS-2021-96},
    Abstract= {The capabilities of both imitation and reinforcement learning for robotics have burgeoned with the advent of deep learning, but these methods still struggle to extend to tasks with long time horizons. Hierarchical policy learning and goal-conditioning in policies have offered great promise in overcoming this limitation, but still cannot match the horizons or reliability of classical planners. Task and motion planning remains the gold standard for high-precision, multi-step tasks but suffers from computational burden and difficulties in planning directly from sensor data  - limitations that neural networks do not have. In this work, we propose an asynchronous training method to integrate imitation learning into task and motion planning. Our method trains goal-conditioned hierarchical policies to emulate the planner, and in turn uses those policies to accelerate the planner and generation of training data. In robotic manipulation tasks, the partially trained policies achieve a 2x reduction in the combined time for motion plan refinement and simulated execution. For 7 DOF robotic pick-place tasks, our method produces end-to-end policies capable of placing four objects with an 86% success rate. And for 2d navigational pick-place tasks with high-dimension goals, our method can place five objects with an 88% success rate when working from state observations or an 83% success rate for three objects when using camera images.},
}

EndNote citation:

%0 Thesis
%A McDonald, Michael 
%E Hadfield-Menell, Dylan 
%T Accelerate Then Imitate: Learning from Task and Motion Planing
%I EECS Department, University of California, Berkeley
%D 2021
%8 May 14
%@ UCB/EECS-2021-96
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-96.html
%F McDonald:EECS-2021-96