Specifications from Demonstrations: Learning, Teaching, and Control

Marcell Vazquez-Chanlatte

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-107

May 13, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-107.pdf

This dissertation considers the problem of learning and teaching Boolean task specifications, such as automata, using demonstrations. The resulting framework bridges grammatical inference and maximum-entropy inverse reinforcement learning with applications in human-robot interaction, formal synthesis, and multi-task reinforcement learning.

In the context of inverse reinforcement learning, Boolean task specifications are a class of sparse memory augmented rewards with explicit support for temporal and Boolean composition. These properties make task specifications immune to certain classes of reward hacking bugs that emerge from ad-hoc composition or perturbations to the dynamics. Unfortunately, the discrete nature of task specifications combined with an a-priori ignorance of what historical features are needed to encode the demonstrated task make existing approaches to learning rewards from demonstrations inapplicable.

In the context of specification mining and grammatical inference, demonstrations provide an ergonomic and sample efficient means to communicate formal languages and specifications. For example, this dissertation enables a user to partially specify the desired behavior of a system as example demonstrations and then find an automata or program that explains the user's behavior. Conversely, by synthesizing pedagogic demonstrations, this dissertation enables communicating the nuances of a specification that may be hard to intuit from a formal description.

In either case, this thesis contributes a collection of algorithms and theoretical machinery for systematically mitigating combinatorial explosions inherent in (1) finding specifications that explain an agent's behavior (2) finding pedagogic demonstrations that help humans infer the specification and (3) robustly predicting the behavior of an agent adhering to a specification.

Advisors: Sanjit A. Seshia

BibTeX citation:

@phdthesis{Vazquez-Chanlatte:EECS-2022-107,
    Author= {Vazquez-Chanlatte, Marcell},
    Title= {Specifications from Demonstrations: Learning, Teaching, and Control},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-107.html},
    Number= {UCB/EECS-2022-107},
    Abstract= {This dissertation considers the problem of learning and teaching Boolean task specifications, such as automata, using demonstrations. The resulting framework bridges grammatical inference and maximum-entropy inverse reinforcement learning with applications in human-robot interaction, formal synthesis, and multi-task reinforcement learning. 

In the context of inverse reinforcement learning, Boolean task specifications are a class of sparse memory augmented rewards with explicit support for temporal and Boolean composition. These properties make task specifications immune to certain classes of reward hacking bugs that emerge from ad-hoc composition or perturbations to the dynamics. Unfortunately, the discrete nature of task specifications combined with an a-priori ignorance of what historical features are needed to encode the demonstrated task make existing approaches to learning rewards from demonstrations inapplicable.

In the context of specification mining and grammatical inference, demonstrations provide an ergonomic and sample efficient means to communicate formal languages and specifications. For example, this dissertation enables a user to partially specify the desired behavior of a system as example demonstrations and then find an automata or program that explains the user's behavior.  Conversely, by synthesizing pedagogic demonstrations, this dissertation enables communicating the nuances of a specification that may be hard to intuit from a formal description.

In either case, this thesis contributes a collection of algorithms and theoretical machinery for systematically mitigating combinatorial explosions inherent in (1) finding specifications that explain an agent's behavior (2) finding pedagogic demonstrations that help humans infer the specification and (3) robustly predicting the behavior of an agent adhering to a specification.},
}

EndNote citation:

%0 Thesis
%A Vazquez-Chanlatte, Marcell 
%T Specifications from Demonstrations: Learning, Teaching, and Control
%I EECS Department, University of California, Berkeley
%D 2022
%8 May 13
%@ UCB/EECS-2022-107
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-107.html
%F Vazquez-Chanlatte:EECS-2022-107