Reinforcement Learning from Static Datasets: Algorithms, Analysis, and Applications

Aviral Kumar

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-223

August 11, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-223.pdf

Reinforcement learning (RL) provides a formalism for learning-based control. By attempting to learn behavioral policies that can optimize a user-specified reward function, RL methods have been able to acquire novel decision-making strategies that can outperform the best humans even with highly complex dynamics and even when the space of all possible outcomes is huge (e.g., robotic manipulation, chip floorplanning). Yet RL has had a limited applicability compared to standard machine learning (ML) in real-world scenarios. Why? The central issue with RL is that it relies crucially on running large amounts of trial-and-error active data collection for learning policies. Unfortunately though, in the real world, active data collection is generally very expensive (e.g., running wet lab experiments for drug design), and/or dangerous (e.g., robots operating around humans), and accurate simulators are hard to build. Overall, this means that while RL carries the potential to broadly unlock ML in real-world decision-making problems, we are unable to realize this potential via current RL techniques.

To realize this potential of RL, in this dissertation, we develop an alternate paradigm that aims to utilizes static datasets of experience for learning policies. Such a ``dataset-driven'' paradigm broadens the applicability of RL to a variety of decision-making problems where historical datasets already exist or can be collected via domain-specific strategies. It also brings the scalability and reliability benefits that modern supervised and unsupervised ML methods enjoy into RL. That said, instantiating this paradigm is challenging as it requires reconciling the {static} nature of learning from a dataset with the traditionally active nature of RL, which results in challenges of distributional shift, generalization, and optimization. After theoretically and empirically understanding these challenges, we develop algorithmic ideas for addressing thee challenges and discuss several extensions to convert these ideas into practical methods that can train modern high-capacity neural network function approximators on large and diverse datasets. Finally, we show how the techniques can enable us to pre-train generalist policies for real robots and video games and enable fast and efficient hardware accelerator design.

Advisors: Sergey Levine

BibTeX citation:

@phdthesis{Kumar:EECS-2023-223,
    Author= {Kumar, Aviral},
    Title= {Reinforcement Learning from Static Datasets: Algorithms, Analysis, and Applications},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {Aug},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-223.html},
    Number= {UCB/EECS-2023-223},
    Abstract= {Reinforcement learning (RL) provides a formalism for learning-based control. By attempting to learn behavioral policies that can optimize a user-specified reward function, RL methods have been able to acquire novel decision-making strategies that can outperform the best humans even with highly complex dynamics and even when the space of all possible outcomes is huge (e.g., robotic manipulation, chip floorplanning). Yet RL has had a limited applicability compared to standard machine learning (ML) in real-world scenarios. Why? The central issue with RL is that it relies crucially on running large amounts of trial-and-error active data collection for learning policies. Unfortunately though, in the real world, active data collection is generally very expensive (e.g., running wet lab experiments for drug design), and/or dangerous (e.g., robots operating around humans), and accurate simulators are hard to build. Overall, this means that while RL carries the potential to broadly unlock ML in real-world decision-making problems, we are unable to realize this potential via current RL techniques.           

To realize this potential of RL, in this dissertation, we develop an alternate paradigm that aims to utilizes static datasets of experience for learning policies. Such a ``dataset-driven'' paradigm broadens the applicability of RL to a variety of decision-making problems where historical datasets already exist or can be collected via domain-specific strategies. It also brings the scalability and reliability benefits that modern supervised and unsupervised ML methods enjoy into RL. That said, instantiating this paradigm is challenging as it requires reconciling the {static} nature of learning from a dataset with the traditionally active nature of RL, which results in challenges of distributional shift, generalization, and optimization. After theoretically and empirically understanding these challenges, we develop algorithmic ideas for addressing thee challenges and discuss several extensions to convert these ideas into practical methods that can train modern high-capacity neural network function approximators on large and diverse datasets. Finally, we show how the techniques can enable us to pre-train generalist policies for real robots and video games and enable fast and efficient hardware accelerator design.},
}

EndNote citation:

%0 Thesis
%A Kumar, Aviral 
%T Reinforcement Learning from Static Datasets: Algorithms, Analysis, and Applications
%I EECS Department, University of California, Berkeley
%D 2023
%8 August 11
%@ UCB/EECS-2023-223
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-223.html
%F Kumar:EECS-2023-223