Towards Building Safe and Robust Human AI Systems

Zhiyang He

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-229

December 20, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-229.pdf

Virtual and physical AI systems are increasingly woven into our daily lives, from self-driving cars and assistive healthcare robots to virtual assistants powered by large language models. However, the dynamic and unpredictable nature of human environments poses significant challenges to their robustness and safety. For example, children or elderly individuals interacting with a house-cleaning robot might inadvertently perform actions that result in hazardous outcomes. Similarly, virtual assistants may exhibit biased responses to certain inputs, such as demographic information, raising concerns about fairness, reliability, and inclusivity. This prompts a critical question: Can these systems operate safely and effectively amidst unpredictable human interactions?

This thesis seeks to address this critical question by drawing on insights from robotics research and applying them to general Human-AI systems. Specifically, it utilizes the reward-rational agent framework—a method commonly used in robotics to predict human actions—and extends it to the design of robust Human-AI systems. The work tackles key vulnerabilities of the framework: (1) reward functions may be mislearned due to causal confusions in the data, and (2) AI policies may be susceptible to exploitation by adversarial human behaviors or misleading contextual information.

To address these challenges, this thesis introduces techniques for active environment synthesis and active human behavior generation, enabling AI systems to anticipate and adapt to unforeseen and edge-case scenarios. Additionally, it explores test-time adaptation to accommodate out-of-distribution users, ensuring greater flexibility in real-world applications. A novel mechanism is also proposed to enable AI systems to controllably focus on relevant aspects of the context, reducing the impact of irrelevant or misleading information. In summary, this thesis tackles the dual challenges of adaptability and robustness, presenting a comprehensive suite of methods to enhance the safety and reliability of Human-AI interactions. It paves the way for future applications where humans and AI agents collaborate seamlessly and effectively to accomplish complex and diverse tasks.

Advisors: Anca Dragan

BibTeX citation:

@phdthesis{He:EECS-2024-229,
    Author= {He, Zhiyang},
    Title= {Towards Building Safe and Robust Human AI Systems},
    School= {EECS Department, University of California, Berkeley},
    Year= {2024},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-229.html},
    Number= {UCB/EECS-2024-229},
    Abstract= {Virtual and physical AI systems are increasingly woven into our daily lives, from self-driving cars and assistive healthcare robots to virtual assistants powered by large language models. However, the dynamic and unpredictable nature of human environments poses significant challenges to their robustness and safety. For example, children or elderly individuals interacting with a house-cleaning robot might inadvertently perform actions that result in hazardous outcomes. Similarly, virtual assistants may exhibit biased responses to certain inputs, such as demographic information, raising concerns about fairness, reliability, and inclusivity. This prompts a critical question: Can these systems operate safely and effectively amidst unpredictable human interactions?

This thesis seeks to address this critical question by drawing on insights from robotics research and applying them to general Human-AI systems. Specifically, it utilizes the reward-rational agent framework—a method commonly used in robotics to predict human actions—and extends it to the design of robust Human-AI systems. The work tackles key vulnerabilities of the framework: (1) reward functions may be mislearned due to causal confusions in the data, and (2) AI policies may be susceptible to exploitation by adversarial human behaviors or misleading contextual information.

To address these challenges, this thesis introduces techniques for active environment synthesis and active human behavior generation, enabling AI systems to anticipate and adapt to unforeseen and edge-case scenarios. Additionally, it explores test-time adaptation to accommodate out-of-distribution users, ensuring greater flexibility in real-world applications. A novel mechanism is also proposed to enable AI systems to controllably focus on relevant aspects of the context, reducing the impact of irrelevant or misleading information. In summary, this thesis tackles the dual challenges of adaptability and robustness, presenting a comprehensive suite of methods to enhance the safety and reliability of Human-AI interactions. It paves the way for future applications where humans and AI agents collaborate seamlessly and effectively to accomplish complex and diverse tasks.},
}

EndNote citation:

%0 Thesis
%A He, Zhiyang 
%T Towards Building Safe and Robust Human AI Systems
%I EECS Department, University of California, Berkeley
%D 2024
%8 December 20
%@ UCB/EECS-2024-229
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-229.html
%F He:EECS-2024-229