Inverse Reinforcement Learning for Dynamics

McKane Andrus

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2019-98

May 26, 2019

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-98.pdf

Inverse Reinforcement Learning (IRL), as standardly defined, entails learning an unknown reward function of a Markov Decision Process (MDP) from demonstrations. As Herman et al. [10] describe, however, learning a reward function from demonstrations is highly reliant on having a correct transition function, which is not always a given. We take interest in this problem from the perspective of Human-Robot Interaction, where we can often observe demonstrations of known tasks but do not have a necessarily correct model of human capability. As such, we specifically consider the case where the goal, or reward, of the demonstrations is known, but the dynamics, or transition function, are not. We refer to this alternate formulation as Inverse Reinforcement Learning for Dynamics (IRLD). In Chapter 2 we formalize the IRLD problem statement and provide a method that is able to learn unobserved dynamics in environments of limited size and scope. We compare this method to one that estimates both the reward and the dynamics to show the advantages of incorporating knowledge of a goal. In Chapter 3 we propose an alternate, scalable approach to the IRLD problem statement that permits numerous variants of Maximum Likelihood Estimation algorithms using function approximators. Though we were not able to produce a generally applicable method, in chapter 4 we provide a roadmap of the various algorithms we developed and issues that must be addressed for this approach to succeed in future work. Chapter 5 contains a reflection section where our work is positioned in a broader social context.

Advisors: Anca Dragan

BibTeX citation:

@mastersthesis{Andrus:EECS-2019-98,
    Author= {Andrus, McKane},
    Title= {Inverse Reinforcement Learning for Dynamics},
    School= {EECS Department, University of California, Berkeley},
    Year= {2019},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-98.html},
    Number= {UCB/EECS-2019-98},
    Abstract= {Inverse Reinforcement Learning (IRL), as standardly defined, entails learning an unknown reward function of a Markov Decision Process (MDP) from demonstrations. As Herman et al. [10] describe, however, learning a reward function from demonstrations is highly reliant on having a correct transition function, which is not always a given. We take interest in this problem from the perspective of Human-Robot Interaction, where we can often observe demonstrations of known tasks but do not have a necessarily correct model of human capability. As such, we specifically consider the case where the goal, or reward, of the demonstrations is known, but the dynamics, or transition function, are not. We refer to this alternate formulation as Inverse Reinforcement Learning for Dynamics (IRLD).
In Chapter 2 we formalize the IRLD problem statement and provide a method that is able to learn unobserved dynamics in environments of limited size and scope. We compare this method to one that estimates both the reward and the dynamics to show the advantages of incorporating knowledge of a goal. In Chapter 3 we propose an alternate, scalable approach to the IRLD problem statement that permits numerous variants of Maximum Likelihood Estimation algorithms using function approximators. Though we were not able to produce a generally applicable method, in chapter 4 we provide a roadmap of the various algorithms we developed and issues that must be addressed for this approach to succeed in future work. Chapter 5 contains a reflection section where our work is positioned in a broader social context.},
}

EndNote citation:

%0 Thesis
%A Andrus, McKane 
%T Inverse Reinforcement Learning for Dynamics
%I EECS Department, University of California, Berkeley
%D 2019
%8 May 26
%@ UCB/EECS-2019-98
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-98.html
%F Andrus:EECS-2019-98