Curriculum Distillation to Teach Playing Atari

Chen Tang and John F. Canny

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2018-161
December 1, 2018

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-161.pdf

We propose a framework of curriculum distillation in the setting of deep reinforcement learning. By selecting samples in its training history, a machine teacher sends those samples to a learner to improve its learning progress. In this paper, we investigate the idea on how to select these samples to maximize learner's progress. One key idea is to apply the Zone of Proximal Development principle to guide the learner with samples slightly in advance of its current performance level. Another idea is to use the samples where teacher itself makes the biggest progress in its parameter space. To foster robust teaching and learning, we adapt such framework to distill curriculum from multiple teachers. We test such framework on a few Atari games. We show that those samples selected are both interpretable for humans, and are able to help machine learners converge faster in the training process.

Advisor: John F. Canny


BibTeX citation:

@mastersthesis{Tang:EECS-2018-161,
    Author = {Tang, Chen and Canny, John F.},
    Title = {Curriculum Distillation to Teach Playing Atari},
    School = {EECS Department, University of California, Berkeley},
    Year = {2018},
    Month = {Dec},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-161.html},
    Number = {UCB/EECS-2018-161},
    Abstract = {We propose a framework of curriculum distillation in the setting of deep reinforcement learning. By selecting samples in its training history, a machine teacher sends those samples to a learner to improve its learning progress. In this paper, we investigate the idea on how to select these samples to maximize learner's progress. One key idea is to apply the Zone of Proximal Development principle to guide the learner with samples slightly in advance of its current performance level. Another idea is to use the samples where teacher itself makes the biggest progress in its parameter space. To foster robust teaching and learning, we adapt such framework to distill curriculum from multiple teachers. We test such framework on a few Atari games. We show that those samples selected are both interpretable for humans, and are able to help machine learners converge faster in the training process.}
}

EndNote citation:

%0 Thesis
%A Tang, Chen
%A Canny, John F.
%T Curriculum Distillation to Teach Playing Atari
%I EECS Department, University of California, Berkeley
%D 2018
%8 December 1
%@ UCB/EECS-2018-161
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-161.html
%F Tang:EECS-2018-161