Robust Imitation Learning for Risk-Aware Behavior and Sim2Real Transfer

Zaynah Javed

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-48

May 10, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-48.pdf

Learning from demonstrations circumvents the difficult and error-prone task of manually specifying a reward function. However, there are many issues that can arise. In some cases, not enough demonstration data exists. This can lead to ambiguous, imperfect demonstrations where the data gives rise to uncertainty over the true goal. There can be many different reward functions that explain this data, giving uncertainty over the true reward function that should be learned from the data. Most policy optimization approaches handle this uncertainty by optimizing for expected performance, but many applications demand risk-averse behavior. We derive a novel policy gradient-style robust optimization approach, PG-BROIL, that optimizes a soft-robust objective that balances expected performance and risk. To the best of our knowledge, PG-BROIL is the first policy optimization algorithm robust to a distribution of reward hypotheses which can scale to continuous MDPs. Another issue that may arise with demonstrations is sim2real transfer, where demonstrations and training may be done via simulation, but the robot exists in the real world. Sim2Real transfer has emerged as a successful method to train robotic control policies for a wide variety of tasks, however it is often challenging to determine when policies trained in simulation are ready to be transferred to the physical world. Deploying policies which have been trained with very little simulation data can result in unreliable behaviors on real world hardware. On the other hand, excessive training in simulation can cause policies to overfit to the dynamics and visual appearance of the simulator. We study strategies to automatically determine when imitation learning policies trained in simulation can be reliably transferred to a physical robot. We study these ideas in the context of a robotic fabric manipulation task, in which successful sim2real transfer is challenging due to the difficulties of precisely modeling fabric.

Advisors: Ken Goldberg

BibTeX citation:

@mastersthesis{Javed:EECS-2022-48,
Author= {Javed, Zaynah},
Title= {Robust Imitation Learning for Risk-Aware Behavior and Sim2Real Transfer},
School= {EECS Department, University of California, Berkeley},
Year= {2022},
Month= {May},
Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-48.html},
Number= {UCB/EECS-2022-48},
Abstract= {Learning from demonstrations circumvents the difficult and error-prone task of manually specifying a reward function. However, there are many issues that can arise. In some cases, not enough demonstration data exists. This can lead to ambiguous, imperfect demonstrations where the data gives rise to uncertainty over the true goal. There can be many different reward functions that explain this data, giving uncertainty over the true reward function that should be learned from the data. Most policy optimization approaches handle this uncertainty by optimizing for expected performance, but many applications demand risk-averse behavior. We derive a novel policy gradient-style robust optimization approach, PG-BROIL, that optimizes a soft-robust objective that balances expected performance and risk. To the best of our knowledge, PG-BROIL is the first policy optimization algorithm robust to a distribution of reward hypotheses which can scale to continuous MDPs. Another issue that may arise with demonstrations is sim2real transfer, where demonstrations and training may be done via simulation, but the robot exists in the real world. Sim2Real transfer has emerged as a successful method to train robotic control policies for a wide variety of tasks, however it is often challenging to determine when policies trained in simulation are ready to be transferred to the physical world. Deploying policies which have been trained with very little simulation data can result in unreliable behaviors on real world hardware. On the other hand, excessive training in simulation can cause policies to overfit to the dynamics and visual appearance of the simulator. We study strategies to automatically determine when imitation learning policies trained in simulation can be reliably transferred to a physical robot. We study these ideas in the context of a robotic fabric manipulation task, in which successful sim2real transfer is challenging due to the difficulties of precisely modeling fabric.},
}

EndNote citation:

%0 Thesis
%A Javed, Zaynah 
%T Robust Imitation Learning for Risk-Aware Behavior and Sim2Real Transfer
%I EECS Department, University of California, Berkeley
%D 2022
%8 May 10
%@ UCB/EECS-2022-48
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-48.html
%F Javed:EECS-2022-48