Rising Stars 2020:

Shibani Santurkar

PhD Candidate

Massachusetts Institute of Technology


Areas of Interest

  • Artificial Intelligence

Poster

From ImageNet to Image Classification: Contextualizing Progress on Benchmarks

Abstract

Large-scale benchmarks have been instrumental in guiding recent progress in machine learning and are typically regarded as the ground truth during model development. But is this de-facto treatment of benchmarks as the gold standard truly justified?

Through a case study on the popular ImageNet dataset, we demonstrate an inherent tension between the goal of building challenging, realistic datasets and leveraging scalable dataset collection pipelines to do so. Specifically, we study how design choices in the ImageNet creation process impact the fidelity of the resulting dataset---including the introduction of biases that state-of-the-art models exploit. Our analysis pinpoints how a noisy data collection pipeline can thus lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for.

Joint work with Logan Engstrom, Andrew Ilyas, Aleksander Madry and Dimitris Tsipras.

Bio

Shibani Santurkar is a PhD student in the MIT EECS Department, advised by Aleksander Mądry and Nir Shavit. Her research revolves around two broad themes: developing a precise understanding of widely-used deep learning techniques; and identifying avenues to make machine learning be robust and reliable. Prior to joining MIT, she received a bachelors degree in electrical engineering from IIT Bombay, India. She is a recipient of the Google Fellowship.

Personal home page