David Dalrymple and Joar Skalse and Yoshua Bengio and Stuart J. Russell and Max Tegmark and Sanjit A. Seshia and Steve Omohundro and Christian Szegedy and Alessandro Abate and Joseph Halpern and Clark Barrett and Ding Zhao and Ben Goldhaber and Nora Ammann

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-45

May 4, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-45.pdf

Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. We outline a number of strategies for achieving this goal, describe the main technical challenges, and suggest a number of potential solutions. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches. Overall, despite a number of difficult technical challenges, GS AI offers a promising path for ensuring robust AI safety through formal methods.


BibTeX citation:

@techreport{Dalrymple:EECS-2024-45,
    Author= {Dalrymple, David and Skalse, Joar and Bengio, Yoshua and Russell, Stuart J. and Tegmark, Max and Seshia, Sanjit A. and Omohundro, Steve and Szegedy, Christian and Abate, Alessandro and Halpern, Joseph and Barrett, Clark and Zhao, Ding and Goldhaber, Ben and Ammann, Nora},
    Title= {Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems},
    Year= {2024},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-45.html},
    Number= {UCB/EECS-2024-45},
    Abstract= {Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. We outline a number of strategies for achieving this goal, describe the main technical challenges, and suggest a number of potential solutions. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches. Overall, despite a number of difficult technical challenges, GS AI offers a promising path for ensuring robust AI safety through formal methods.},
}

EndNote citation:

%0 Report
%A Dalrymple, David 
%A Skalse, Joar 
%A Bengio, Yoshua 
%A Russell, Stuart J. 
%A Tegmark, Max 
%A Seshia, Sanjit A. 
%A Omohundro, Steve 
%A Szegedy, Christian 
%A Abate, Alessandro 
%A Halpern, Joseph 
%A Barrett, Clark 
%A Zhao, Ding 
%A Goldhaber, Ben 
%A Ammann, Nora 
%T Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
%I EECS Department, University of California, Berkeley
%D 2024
%8 May 4
%@ UCB/EECS-2024-45
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-45.html
%F Dalrymple:EECS-2024-45