Model Selection for Contextual Bandits and Reinforcement Learning

Aldo Pacchiano

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-164

June 25, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-164.pdf

In many domains ranging from internet commerce, to robotics and computational biology, many algorithms have been developed that make decisions with the objective of maximizing a reward, while learning how to make better decisions in the future. In hopes of realizing this objective a vast literature focused on the study of Bandits and Reinforcement Learning algorithms has arisen. Although in most practical applications, precise knowledge of the nature of the problem faced by the learner may not be known in advance most of this work has chiefly focused on designing algorithms with provable regret guarantees that work under specific modeling assumptions. Less work has been spent on the problem of model selection where the objective is to design algorithms that can select in an online fashion the best suitable algorithm among a set of candidates to deal with a specific problem instance.

In this thesis we provide a comprehensive set of algorithmic approaches to the problem of model selection in stochastic contextual bandits and reinforcement learning. We propose and analyze two distinct approaches to the problem. First, we introduce Stochastic CORRAL, an algorithm that successfully combines an adversarial EXP3 or CORRAL master with multiple stochastic bandit algorithms. Second, we introduce three distinct stochastic master algorithms: Explore-Commit-Eliminate (ECE), Regret Balancing, and Regret Bound Balancing and Elimination (RBBE) that recover the rates of Stochastic CORRAL under an EXP3 and a CORRAL master but with the advantage the model selection guarantees of RBBE extend to the setting of contextual linear bandits with adversarial contexts. We complement our algorithmic results with a variety of lower bounds designed to explore the theoretical limits of model selection in Contextual Bandits and Reinforcement Learning.

Advisors: Michael Jordan and Peter Bartlett

BibTeX citation:

@phdthesis{Pacchiano:EECS-2021-164,
    Author= {Pacchiano, Aldo},
    Title= {Model Selection for Contextual Bandits and Reinforcement Learning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {Jun},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-164.html},
    Number= {UCB/EECS-2021-164},
    Abstract= { In many domains ranging from internet commerce, to robotics and computational biology, many algorithms have been developed that make decisions with the objective of maximizing a reward, while learning how to make better decisions in the future. In hopes of realizing this objective a vast literature focused on the study of Bandits and Reinforcement Learning algorithms has arisen. Although in most practical applications, precise knowledge of the nature of the problem faced by the learner may not be known in advance most of this work has chiefly focused on designing algorithms with provable regret guarantees that work under specific modeling assumptions. Less work has been spent on the problem of model selection where the objective is to design algorithms that can select in an online fashion the best suitable algorithm among a set of candidates to deal with a specific problem instance. 

In this thesis we provide a comprehensive set of algorithmic approaches to the problem of model selection in stochastic contextual bandits and reinforcement learning. We propose and analyze two distinct approaches to the problem. First, we introduce Stochastic CORRAL, an algorithm that successfully combines an adversarial EXP3 or CORRAL master with multiple stochastic bandit algorithms. Second, we introduce three distinct stochastic master algorithms: Explore-Commit-Eliminate (ECE), Regret Balancing, and Regret Bound Balancing and Elimination (RBBE) that recover the rates of Stochastic CORRAL under an EXP3 and a CORRAL master but with the advantage the model selection guarantees of RBBE extend to the setting of contextual linear bandits with adversarial contexts. We complement our algorithmic results with a variety of lower bounds designed to explore the theoretical limits of model selection in Contextual Bandits and Reinforcement Learning.},
}

EndNote citation:

%0 Thesis
%A Pacchiano, Aldo 
%T Model Selection for Contextual Bandits and Reinforcement Learning
%I EECS Department, University of California, Berkeley
%D 2021
%8 June 25
%@ UCB/EECS-2021-164
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-164.html
%F Pacchiano:EECS-2021-164