Environment Generation for Autonomous Agents for Sequential Decision Making

Abdus Salam Azad

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-167

August 9, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-167.pdf

Autonomous agents have seen tremendous advancements in solving sequential decision-making problems in recent years, primarily driven by Reinforcement Learning (RL), and more recently, by Large Generative Models. The capability of these autonomous agents depends crucially on the quality and diversity of the learning environments they are trained in. This thesis presents research on designing frameworks and algorithms to formulate and systematically generate environments that improve the generalization capabilities of autonomous agents in solving sequential decision-making tasks. First, we explore the benefits of human-guided programmatic environment generation for training, testing, and debugging autonomous agents in complex real-time strategic (RTS) environments. We present a novel framework that, for the first time, demonstrates the benefits of using scenario specification languages (e.g., SCENIC) for systematic modeling and generation of realistic and diverse RTS RL environments (e.g., Soccer). Next, we discuss a class of algorithms called adaptive teacher Unsupervised Environment Design (UED), which automatically generates training tasks with an RL teacher agent. UED shows promising zero-shot generalization by simultaneously learning a task distribution (i.e., curriculum) and agent policies on the generated tasks. This is a non-stationary process where the task distribution evolves along with agent policies, creating instability over time. While prior works demonstrated the potential of such approaches, training the teacher remained a practical challenge. To this end, we introduce Curriculum Learning via Unsupervised Task Representation Learning (CLUTR): a novel unsupervised curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization to solve the training instability by pretraining a latent task manifold. Following that, we present MultiModal Reasoning and Critique for web navigation (MMRC), which introduces augmented environments with multimodal critic agents to enhance the performance of Large Foundational Multimodal Language agents on autonomous web navigation tasks. Together, these approaches portray the importance and usefulness of environment formulation and generation encompassing traditional RL-based and contemporary LLM-based agents.

Advisors: Ion Stoica

BibTeX citation:

@phdthesis{Azad:EECS-2024-167,
    Author= {Azad, Abdus Salam},
    Title= {Environment Generation for Autonomous Agents for Sequential Decision Making},
    School= {EECS Department, University of California, Berkeley},
    Year= {2024},
    Month= {Aug},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-167.html},
    Number= {UCB/EECS-2024-167},
    Abstract= {Autonomous agents have seen tremendous advancements in solving sequential decision-making problems in recent years, primarily driven by Reinforcement Learning (RL), and more recently, by Large Generative Models. The capability of these autonomous agents depends crucially on the quality and diversity of the learning environments they are trained in. This thesis presents research on designing frameworks and algorithms to formulate and systematically generate environments that improve the generalization capabilities of autonomous agents in solving sequential decision-making tasks. First, we explore the benefits of human-guided programmatic environment generation for training, testing, and debugging autonomous agents in complex real-time strategic (RTS) environments. We present a novel framework that, for the first time, demonstrates the benefits of using scenario specification languages (e.g., SCENIC) for systematic modeling and generation of realistic and diverse RTS RL environments (e.g., Soccer). Next, we discuss a class of algorithms called adaptive teacher Unsupervised Environment Design (UED), which automatically generates training tasks with an RL teacher agent. UED shows promising zero-shot generalization by simultaneously learning a task distribution (i.e., curriculum) and agent policies on the generated tasks. This is a non-stationary process where the task distribution evolves along with agent policies, creating instability over time. While prior works demonstrated the potential of such approaches, training the teacher remained a practical challenge. To this end, we introduce Curriculum Learning via Unsupervised Task Representation Learning (CLUTR): a novel unsupervised curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization to solve the training instability by pretraining a latent task manifold. Following that, we present MultiModal Reasoning and Critique for web navigation (MMRC), which introduces augmented environments with multimodal critic agents to enhance the performance of Large Foundational Multimodal Language agents on autonomous web navigation tasks. Together, these approaches portray the importance and usefulness of environment formulation and generation encompassing traditional RL-based and contemporary LLM-based agents.},
}

EndNote citation:

%0 Thesis
%A Azad, Abdus Salam 
%T Environment Generation for Autonomous Agents for Sequential Decision Making
%I EECS Department, University of California, Berkeley
%D 2024
%8 August 9
%@ UCB/EECS-2024-167
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-167.html
%F Azad:EECS-2024-167