Smitha Milli

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-190

August 10, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-190.pdf

Specifying the correct objective function for a machine learning system is often difficult and error-prone. This thesis focuses on learning objective functions from many kinds of human inputs. It is comprised of three parts. First, we present our formalism, reward-rational choice (RRC), that unifies reward learning from many diverse signals. The key insight is that human behavior can often be modeled as a reward-rational implicit choice – a choice from an implicit set of options, that is approximately rational for the intended reward. We show how much of the prior work, despite using many diverse modalities of feedback, can be seen as an instantiation of RRC.

In the second part, we discuss implications of the RRC formalism. In particular, it allows us to learn from multiple feedback types at once. Through case studies and experiments, we show how RRC can be used to combine and actively select from feedback types. Furthermore, once a person has access to multiple types of feedback, even their choice of feedback type itself provides information to learn the reward function from. We use RRC to formalize and learn from a person’s meta-choice, the choice of feedback type itself.

Finally, in the third part, we study settings in which the human may violate the reward-rational assumption. First, we consider the case where the human may be pedagogic, i.e., optimizing for teaching the reward function. We show that the reward-rational assumption provides robust reward inference even when the human is pedagogic. Second, we consider the case where the human may face temptation and act in ways that systematically deviate from their target preferences. We theoretically analyze such a setting and show that, with the right feedback type, one can still efficiently recover the individual’s preferences. Lastly, we consider the recommender system setting. There, it is difficult to model all user behaviors as rational, but by leveraging one strong, explicit signal (e.g. “don’t show me this”), we are still able to operationalize and optimize for a notion of “value” on these systems.

Advisors: Anca Dragan and Moritz Hardt


BibTeX citation:

@phdthesis{Milli:EECS-2022-190,
    Author= {Milli, Smitha},
    Title= {Learning Objective Functions from Many Diverse Signals},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {Aug},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-190.html},
    Number= {UCB/EECS-2022-190},
    Abstract= {Specifying the correct objective function for a machine learning system is often difficult and error-prone. This thesis focuses on learning objective functions from many kinds of human inputs. It is comprised of three parts. First, we present our formalism, reward-rational choice (RRC), that unifies reward learning from many diverse signals. The key insight is that human behavior can often be modeled as a reward-rational implicit choice – a choice from an implicit set of options, that is approximately rational for the intended reward. We show how much of the prior work, despite using many diverse modalities of feedback, can be seen as an instantiation of RRC.

In the second part, we discuss implications of the RRC formalism. In particular, it allows us to learn from multiple feedback types at once. Through case studies and experiments, we show how RRC can be used to combine and actively select from feedback types. Furthermore, once a person has access to multiple types of feedback, even their choice of feedback type itself provides information to learn the reward function from. We use RRC to formalize and learn from a person’s meta-choice, the choice of feedback type itself.

Finally, in the third part, we study settings in which the human may violate the reward-rational assumption. First, we consider the case where the human may be pedagogic, i.e., optimizing for teaching the reward function. We show that the reward-rational assumption provides robust reward inference even when the human is pedagogic. Second, we consider the case where the human may face temptation and act in ways that systematically deviate from their target preferences. We theoretically analyze such a setting and show that, with the right feedback type, one can still efficiently recover the individual’s preferences. Lastly, we consider the recommender system setting. There, it is difficult to model all user behaviors as rational, but by leveraging one strong, explicit signal (e.g. “don’t show me this”), we are still able to operationalize and optimize for a notion of “value” on these systems.},
}

EndNote citation:

%0 Thesis
%A Milli, Smitha 
%T Learning Objective Functions from Many Diverse Signals
%I EECS Department, University of California, Berkeley
%D 2022
%8 August 10
%@ UCB/EECS-2022-190
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-190.html
%F Milli:EECS-2022-190