Nived Rajaraman

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2025-142

July 17, 2025

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-142.pdf

Classical supervised learning paradigms typically assume that training data samples are independently drawn from a target distribution. However, real-world scenarios often violate this assumption, presenting data that are temporally correlated, dynamically evolving, or a result of strategic interaction. Learning in these settings is often significantly more challenging, both from a theoretical and a practical viewpoint. Recent advances in reinforcement learning (RL) have shown that it is possible to train agents which can operate and generalize in settings where the number of possible outcomes is huge. However, there are a number of challenges: these approaches rely on collecting a large amount of ``exploration'' data, resulting from interaction with a dynamic environment. This active data collection is often prohibitively expensive in practice, making mistakes may be costly, such as in settings involving human interaction, and data collected this way may be hard to reuse. Mitigating these concerns requires developing new frameworks for RL.

In this dissertation, we develop algorithms and analyses for an alternate learning paradigm that aims to utilize static datasets generated by a demonstrator for training policies. This broadens the applicability of RL to a variety of decision-making problems where historical datasets already exist or can be collected via domain-specific strategies. That said, instantiating this paradigm is challenging as it requires reconciling the static nature of learning from offline datasets (against a fixed distribution of problem instances) with the traditionally interactive nature of RL.

Imitation Learning (IL) techniques have found a home in several areas, from policy initialization in game-solving agents like AlphaGo (Silver et al. 2017), and more recently as a fine-tuning backbone in the form of supervised fine‐tuning (SFT) for large language models (LLMs) (Brown et al. 2020). The key challenge in all these domains is obtaining sufficiently large, diverse, and high‐quality demonstration datasets. While more data typically yields better performance, expert data can be expensive to collect. We see this challenge manifest in several forms: in robotics and control, acquiring teleoperated or human‐guided trajectories often requires specialized hardware (e.g. motion‐capture rigs or force‐feedback devices), limiting the scale of dataset collection (Argall et al. 2009, Ross et al. 2011). In autonomous driving, critical ``edge‐case'' scenarios (e.g. collision avoidance in unusual weather) are inherently rare, yet essential for safety; collecting them either in simulation or on‐road is time‐consuming and costly (Codevilla et al. 2018, Codevilla et al. 2019). For training LLMs: fine‐tuning large language models relies on human‐annotated data, which is hard to parallelize and incurs substantial annotation time (Stiennon et al. 2020).

Thus, it is pertinent to understand how best to utilize the dataset and leverage favorable properties of the environment and demonstrator. In this thesis, we will build an understanding of these questions by studying Imitation Learning from a theoretical point of view. We will formulate a statistical question and analyze the best achievable error under various feedback models. We will scale up these algorithmic insights and leverage the expressivity and representation power offered by modern function approximators to develop performant, practical algorithms. Along the way, we will develop various insights into the landscape of the IL problem, build a comprehensive understanding and unification of algorithms that have already been successfully deployed in practice, such as Behavior Cloning (Pomerleau 1988, Ross et al. 2010) and GAIL (Ho et al. 2016), and provide principled improvements to these approaches.

Advisors: Kannan Ramchandran and Jiantao Jiao


BibTeX citation:

@phdthesis{Rajaraman:EECS-2025-142,
    Author= {Rajaraman, Nived},
    Title= {Bridging Demonstrations and Decisions: Theory and Algorithms for Provable Imitation Learning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2025},
    Month= {Jul},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-142.html},
    Number= {UCB/EECS-2025-142},
    Abstract= {Classical supervised learning paradigms typically assume that training data samples are independently drawn from a target distribution. However, real-world scenarios often violate this assumption, presenting data that are temporally correlated, dynamically evolving, or a result of strategic interaction. Learning in these settings is often significantly more challenging, both from a theoretical and a practical viewpoint. Recent advances in reinforcement learning (RL) have shown that it is possible to train agents which can operate and generalize in settings where the number of possible outcomes is huge. However, there are a number of challenges: these approaches rely on collecting a large amount of ``exploration'' data, resulting from interaction with a dynamic environment. This active data collection is often prohibitively expensive in practice, making mistakes may be costly, such as in settings involving human interaction, and data collected this way may be hard to reuse. Mitigating these concerns requires developing new frameworks for RL.

In this dissertation, we develop algorithms and analyses for an alternate learning paradigm that aims to utilize static datasets generated by a demonstrator for training policies. This broadens the applicability of RL to a variety of decision-making problems where historical datasets already exist or can be collected via domain-specific strategies. That said, instantiating this paradigm is challenging as it requires reconciling the static nature of learning from offline datasets (against a fixed distribution of problem instances) with the traditionally interactive nature of RL.

Imitation Learning (IL) techniques have found a home in several areas, from policy initialization in game-solving agents like AlphaGo (Silver et al. 2017), and more recently as a fine-tuning backbone in the form of supervised fine‐tuning (SFT) for large language models (LLMs) (Brown et al. 2020). The key challenge in all these domains is obtaining sufficiently large, diverse, and high‐quality demonstration datasets. While more data typically yields better performance, expert data can be expensive to collect. We see this challenge manifest in several forms: in robotics and control, acquiring teleoperated or human‐guided trajectories often requires specialized hardware (e.g. motion‐capture rigs or force‐feedback devices), limiting the scale of dataset collection (Argall et al. 2009, Ross et al. 2011). In autonomous driving, critical ``edge‐case'' scenarios (e.g. collision avoidance in unusual weather) are inherently rare, yet essential for safety; collecting them either in simulation or on‐road is time‐consuming and costly (Codevilla et al. 2018, Codevilla et al. 2019). For training LLMs: fine‐tuning large language models relies on human‐annotated data, which is hard to parallelize and incurs substantial annotation time (Stiennon et al. 2020).

Thus, it is pertinent to understand how best to utilize the dataset and leverage favorable properties of the environment and demonstrator. In this thesis, we will build an understanding of these questions by studying Imitation Learning from a theoretical point of view. We will formulate a statistical question and analyze the best achievable error under various feedback models. We will scale up these algorithmic insights and leverage the expressivity and representation power offered by modern function approximators to develop performant, practical algorithms. Along the way, we will develop various insights into the landscape of the IL problem, build a comprehensive understanding and unification of algorithms that have already been successfully deployed in practice, such as Behavior Cloning (Pomerleau 1988, Ross et al. 2010) and GAIL (Ho et al. 2016), and provide principled improvements to these approaches.},
}

EndNote citation:

%0 Thesis
%A Rajaraman, Nived 
%T Bridging Demonstrations and Decisions: Theory and Algorithms for Provable Imitation Learning
%I EECS Department, University of California, Berkeley
%D 2025
%8 July 17
%@ UCB/EECS-2025-142
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-142.html
%F Rajaraman:EECS-2025-142