### Vidya Muthukumar

###
EECS Department

University of California, Berkeley

Technical Report No. UCB/EECS-2020-174

August 18, 2020

### http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-174.pdf

This dissertation initiates fundamental lines of inquiry into better understanding learning from data provided by other agents who have a possible strategic incentive. In many use cases of modern machine learning (ML), these agents will not be acting in isolation, and it is critical for them to directly interact with other strategic agents. For example, several pioneering works in cognitive radio designed multi-agent mechanisms whose equilibrium outcome was efficient spectrum sharing among multiple cognitive radio agents. However, the attainment of these equilibria requires co-design among the agents: they have to know each other’s utility functions and basic strategic nature, which could be stochastic, adversarial, competitive, or cooperative. In the conceptualized utopia of spectrum sharing, both of these will be unknown a-priori and will need to be learned through repeated interaction. Nor does this scenario arise only in cognitive radio: applications of swarm robotics, reinforcement learning and e-commerce all involve the intersection of principles of ML with multi-agent systems that are not co-designed. The ensuing twin questions of how agents should learn from strategically generated data, as well as how such strategic behavior will manifest, are well-posed and non-trivial even under the simplest possible instance cases of ML, such as predicting a binary sequence generated by an unknown environment. We consider the first three categories of strategic nature: stochastic, adversarial, and competitive. The first part of this thesis designs algorithms for “optimal" learning in an unknown environment, where the notion of optimality is defined as being able to adapt to the nature of the environment on-the-fly without knowing this nature beforehand. The on-the-fly nature of this goal is formalized in the classical framework of online learning. While the traditional goal of online learning is regret minimization with respect to a given model class, I motivate that adapting the model class, itself, in an online fashion is essential to transition from guarantees on regret to guarantees on reward. Accordingly, we design robust approaches, inspired by seminal approaches to purely stochastic model selection, to work in both stochastic and adversarial environments for online learning with full-information and limited-information feedback. The second part of this thesis considers a strategic, but non-adversarial agent generating the data that is being used for learning. Such an agent is typically selfish and rational, rather than being simply malicious — thus, their behavior manifests in a more complex manner than an adversarial agent’s. I introduce a new frequentist framework to approximately express such an agent’s incentives and trade-offs involved in reaching the Stackelberg equilibrium of the ensuing non-zero-sum game. This is in agreement with the classical Bayesian-repeated- game asymptotic theory, now with constructive strategies for both players. Interestingly, through this framework we can show that the agent is incentivized to reveal, not obfuscate, her information to the learner. This thesis concludes by showing a surprising divergent outcome in day-to-day behavior that is fundamental to the property of no-regret online learning when deployed in multi-player, game-theoretic environments. This suggests a possible re-examination of learning dynamics, inspired by behavioral game theory, in future work.

**Advisor:** Anant Sahai

BibTeX citation:

@phdthesis{Muthukumar:EECS-2020-174, Author = {Muthukumar, Vidya}, Title = {Learning from an unknown environment}, School = {EECS Department, University of California, Berkeley}, Year = {2020}, Month = {Aug}, URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-174.html}, Number = {UCB/EECS-2020-174}, Abstract = {This dissertation initiates fundamental lines of inquiry into better understanding learning from data provided by other agents who have a possible strategic incentive. In many use cases of modern machine learning (ML), these agents will not be acting in isolation, and it is critical for them to directly interact with other strategic agents. For example, several pioneering works in cognitive radio designed multi-agent mechanisms whose equilibrium outcome was efficient spectrum sharing among multiple cognitive radio agents. However, the attainment of these equilibria requires co-design among the agents: they have to know each other’s utility functions and basic strategic nature, which could be stochastic, adversarial, competitive, or cooperative. In the conceptualized utopia of spectrum sharing, both of these will be unknown a-priori and will need to be learned through repeated interaction. Nor does this scenario arise only in cognitive radio: applications of swarm robotics, reinforcement learning and e-commerce all involve the intersection of principles of ML with multi-agent systems that are not co-designed. The ensuing twin questions of how agents should learn from strategically generated data, as well as how such strategic behavior will manifest, are well-posed and non-trivial even under the simplest possible instance cases of ML, such as predicting a binary sequence generated by an unknown environment. We consider the first three categories of strategic nature: stochastic, adversarial, and competitive. The first part of this thesis designs algorithms for “optimal" learning in an unknown environment, where the notion of optimality is defined as being able to adapt to the nature of the environment on-the-fly without knowing this nature beforehand. The on-the-fly nature of this goal is formalized in the classical framework of online learning. While the traditional goal of online learning is regret minimization with respect to a given model class, I motivate that adapting the model class, itself, in an online fashion is essential to transition from guarantees on regret to guarantees on reward. Accordingly, we design robust approaches, inspired by seminal approaches to purely stochastic model selection, to work in both stochastic and adversarial environments for online learning with full-information and limited-information feedback. The second part of this thesis considers a strategic, but non-adversarial agent generating the data that is being used for learning. Such an agent is typically selfish and rational, rather than being simply malicious — thus, their behavior manifests in a more complex manner than an adversarial agent’s. I introduce a new frequentist framework to approximately express such an agent’s incentives and trade-offs involved in reaching the Stackelberg equilibrium of the ensuing non-zero-sum game. This is in agreement with the classical Bayesian-repeated- game asymptotic theory, now with constructive strategies for both players. Interestingly, through this framework we can show that the agent is incentivized to reveal, not obfuscate, her information to the learner. This thesis concludes by showing a surprising divergent outcome in day-to-day behavior that is fundamental to the property of no-regret online learning when deployed in multi-player, game-theoretic environments. This suggests a possible re-examination of learning dynamics, inspired by behavioral game theory, in future work.} }

EndNote citation:

%0 Thesis %A Muthukumar, Vidya %T Learning from an unknown environment %I EECS Department, University of California, Berkeley %D 2020 %8 August 18 %@ UCB/EECS-2020-174 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-174.html %F Muthukumar:EECS-2020-174