Perry Dong

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-22

April 26, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-22.pdf

Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correction data for addressing the distributional shift challenges that afflict naïve behavioral cloning, can enjoy good performance both in theory and practice without requiring manually specified reward functions and other components of full reinforcement learning methods. In this paper, we explore how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert. We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Additional ablations also empirically show that the performance of our method is associated with the choice of intervention model and suboptimality of the expert.

Advisors: Yi Ma


BibTeX citation:

@mastersthesis{Dong:EECS-2024-22,
    Author= {Dong, Perry},
    Title= {Interactive Imitation Learning as Reinforcement Learning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2024},
    Month= {Apr},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-22.html},
    Number= {UCB/EECS-2024-22},
    Abstract= {Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correction data for addressing the distributional shift challenges that afflict naïve behavioral cloning, can enjoy good performance both in theory and practice without requiring manually specified reward functions and other components of full reinforcement learning methods. In this paper, we explore how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert. We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Additional ablations also empirically show that the performance of our method is associated with the choice of intervention model and suboptimality of the expert.},
}

EndNote citation:

%0 Thesis
%A Dong, Perry 
%T Interactive Imitation Learning as Reinforcement Learning
%I EECS Department, University of California, Berkeley
%D 2024
%8 April 26
%@ UCB/EECS-2024-22
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-22.html
%F Dong:EECS-2024-22