Trustworthy ML: Robustness and Foresight

Saurav Kadavath

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-245

December 1, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-245.pdf

Over the last decade, AI has empowered people, businesses, and governments to create some of the most impressive products and solve some of the most challenging problems - self-driving cars, virtual assistants, and pharmaceutical drug discovery to name a few. It is clear that AI will have a long-lasting impact on humanity. In this context, our goal is to understand how we can ensure that this impact is as positive as possible. This is a very broad problem, but we can tease it apart into several smaller goals that can fit under one of three umbrellas: (1) Making AI competent (e.g. making models robust, reliable, and understand humans), (2) Making AI aligned, and (3) Coping with the effects of AI (Paul Christiano: Current Work in AI Alignment 2020).

Part 1. In the first part of this report, we work on making AI competent, focusing on computer vision and making progress in improving model robustness. In Chapter 1, we develop a novel data augmentation strategy, called DeepAugment. Whereas previous data augmentation strategies relied on simple image transformations such as rotations, color shift, shearing, resizing, etc..., DeepAugment creates augmentations using a single forward pass through a pretrained and perturbed image-to-image neural network. This allows us to generate augmented images using transformations that are much more complex than the simple transforms used in previous work. We extend DeepAugment with Noise2Net, a method of creating image augmentations using a completely randomly initialized neural network.

We also introduce the ImageNet-Renditions benchmark, a new evaluation benchmark for ImageNet models. ImageNet-Renditions (ImageNet-R) contains 30,000 test set images of various renditions (e.g., paintings, embroidery, etc.) of ImageNet objects from 200 out of the 1000 total classes. We show DeepAugment and Noise2Net to be effective in improving ImageNet-R performance.

Part 2. In Part 2, we work develop new datasets and benchmarks to understand the performance of large state-of-the-art language models at various reasoning tasks. It is important for the AI community to carefully monitor the state of AI at advanced reasoning tasks, since models approaching human performance here could have profound social and economic implications. So far, modern large-scale language models such as GPT (Brown et al. 2020), T5 (Raffel et al. 2020), BERT (Devlin et al. 2019), etc. have demonstrated impressive capabilities across a variety of text-based tasks. Performance on a wide variety of benchmarks have been shown to be growing steadily as model sizes increase (A. Wang et al. 2019; Zellers et al. 2019; Huang et al. 2019; Bisk et al. 2019; Hendrycks, Basart, et al. 2020; Hendrycks, Burns, Basart, Zou, et al. 2021; Hendrycks, Burns, Basart, Critch, et al. 2021).

We introduce MATH, a dataset and benchmark for mathematical problem solving. Mathematics problems are valuable tests for problem-solving ability: the ability to analyze a problem, pick out good heuristics from a large set of possibilities, and chain them together to produce an answer. This contrasts with plug-and- chug calculations, a skill which ML models can already exhibit (Henighan et al. 2020). Additionally, our benchmark does not involve multiple-choice answers - models are trained to output the complete answer on its own.

We show that simply scaling up models aggressively probably will not give us solutions for MATH. Our results suggest that further advancements need to be made in order to create models that can reason at a human’s level.

We leave the 2nd question of making AI aligned to future work.

Advisors: Dawn Song

BibTeX citation:

@mastersthesis{Kadavath:EECS-2021-245,
Author= {Kadavath, Saurav},
Title= {Trustworthy ML: Robustness and Foresight},
School= {EECS Department, University of California, Berkeley},
Year= {2021},
Month= {Dec},
Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-245.html},
Number= {UCB/EECS-2021-245},
Abstract= {Over the last decade, AI has empowered people, businesses, and governments to create some of the most impressive products and solve some of the most challenging problems - self-driving cars, virtual assistants, and pharmaceutical drug discovery to name a few. It is clear that AI will have a long-lasting impact on humanity. In this context, our goal is to understand how we can ensure that this impact is as positive as possible. This is a very broad problem, but we can tease it apart into several smaller goals that can fit under one of three umbrellas: (1) Making AI competent (e.g. making models robust, reliable, and understand humans), (2) Making AI aligned, and (3) Coping with the effects of AI (Paul Christiano: Current Work in AI Alignment 2020).

We leave the 2nd question of making AI aligned to future work.},
}

EndNote citation:

%0 Thesis
%A Kadavath, Saurav 
%T Trustworthy ML: Robustness and Foresight
%I EECS Department, University of California, Berkeley
%D 2021
%8 December 1
%@ UCB/EECS-2021-245
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-245.html
%F Kadavath:EECS-2021-245