Eric Mazumdar

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-166

July 22, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-166.pdf

The ability to learn from data and make decisions in real-time has led to the rapid deployment of machine learning algorithms across many aspects of everyday life. Despite their potential to enable new services and address persistent societal issues, the widespread use of these algorithms has led to unintended consequences like flash crashes in financial markets or price collusion on e-commerce platforms. These consequences are the inevitable result of deploying algorithms--- that were designed to operate in isolation--- in uncertain dynamic environments in which they interact with other autonomous agents, algorithms, and human decision makers.

To address these issues, it is necessary to develop an understanding of the fundamental limits of learning algorithms in societal-scale systems. The work in this thesis is divided into three parts, each addressing a different aspect of learning and decision-making in societal-scale systems: (i) learning in the presence of strategic agents, (ii) learning and decision-making in uncertain and dynamic environments, and (iii) learning models of human decision-making from data.

In the first part, we blend ideas from game theory and optimization to demonstrate both theoretically and empirically how current machine learning approaches fail in multi-agent settings. We then leverage our understanding of the underlying structure of competitive settings to design efficient algorithms with provable guarantees of performance.

In the second part of this thesis, we combine ideas from statistics--- namely the analysis of Langevin Markov Chain Monte Carlo Algorithms (MCMC)--- and machine learning to design a versatile and computationally efficient model-based algorithm for the multi-armed bandit problem that has guarantees of optimal performance. In, particular, we develop new characterizations of posteriors in log-concave families of likelihoods and priors and finite-time convergence rates for Langevin MCMC algorithms and use these theoretical results to show that approximate sampling algorithms like Langevin MCMC can be integrated into Thompson Sampling (the original multi-armed bandit algorithm) without sacrificing performance.

In the final part of this thesis we bring together ideas from behavioral economics and reinforcement learning to develop a method for inverse risk-sensitive reinforcement learning. We first develop a forward model that combines ideas from prospect theory with reinforcement learning to capture the nuances of risk-sensitive decision-making in dynamic environments. We then propose an algorithm for solving the inverse problem of learning a model of an agents' decision-making process from observations of their sequential decisions in dynamic environments.

Altogether, this thesis represents a small step in an emerging research area at the intersection of economics, statistics, machine learning, and control. We conclude with a short discussion of emerging problems and themes in this wider area.

Advisors: S. Shankar Sastry and Michael Jordan


BibTeX citation:

@phdthesis{Mazumdar:EECS-2021-166,
    Author= {Mazumdar, Eric},
    Title= {Designing Algorithms for Learning and Decision-Making in Societal Systems},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {Jul},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-166.html},
    Number= {UCB/EECS-2021-166},
    Abstract= {The ability to learn from data and make decisions in real-time has led to the rapid deployment of machine learning algorithms across many aspects of everyday life. Despite their potential to enable new services and address persistent societal issues, the widespread use of these algorithms has led to unintended consequences like flash crashes in financial markets or price collusion on e-commerce platforms. These consequences are the inevitable result of deploying algorithms--- that were  designed to operate in isolation---  in uncertain dynamic environments in which they interact with other autonomous agents, algorithms, and human decision makers.

To address these issues, it is necessary to develop an understanding of the fundamental limits of learning algorithms in societal-scale systems. The work in this thesis is divided into three parts, each addressing a different aspect of learning and decision-making in societal-scale systems:  (i) learning in the presence of strategic agents, (ii)  learning and decision-making in uncertain and dynamic environments, and (iii) learning models of human decision-making from data.

In the first part, we blend ideas from game theory and optimization to demonstrate both theoretically and empirically how current machine learning approaches fail in multi-agent settings. We then leverage our understanding of the underlying structure of competitive settings to design efficient algorithms with provable guarantees of performance.

In the second part of this thesis, we combine ideas from statistics--- namely the analysis of Langevin Markov Chain Monte Carlo Algorithms (MCMC)--- and machine learning to design a versatile and computationally efficient model-based algorithm for the multi-armed bandit problem that has guarantees of optimal performance. In, particular, we develop new characterizations of posteriors in log-concave families of likelihoods and priors and finite-time convergence rates for Langevin MCMC algorithms and use these theoretical results to show that approximate sampling algorithms like Langevin MCMC can be integrated into Thompson Sampling (the original multi-armed bandit algorithm) without sacrificing performance.

In the final part of this thesis we bring together ideas from behavioral economics and reinforcement learning to develop a method for inverse risk-sensitive reinforcement learning. We first develop a forward model that combines ideas from prospect theory with reinforcement learning to capture the nuances of risk-sensitive decision-making in dynamic environments. We then propose an algorithm for solving the inverse problem of learning a model of an agents' decision-making process from observations of their sequential decisions in dynamic environments. 

Altogether, this thesis represents a small step in an emerging research area at the intersection of economics, statistics, machine learning, and control. We conclude with a short discussion of emerging problems and themes in this wider area.},
}

EndNote citation:

%0 Thesis
%A Mazumdar, Eric 
%T Designing Algorithms for Learning and Decision-Making in Societal Systems
%I EECS Department, University of California, Berkeley
%D 2021
%8 July 22
%@ UCB/EECS-2021-166
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-166.html
%F Mazumdar:EECS-2021-166