[an error occurred while processing this directive]
[an error occurred while processing this directive]
[an error occurred while processing this directive]
[an error occurred while processing this directive]
Research Scientist
[an error occurred while processing this directive]
Google
[an error occurred while processing this directive]
PhD '20 University of Oxford
[an error occurred while processing this directive]
[an error occurred while processing this directive]
[an error occurred while processing this directive]
Artificial Intelligence
Computer Vision
Machine Learning
[an error occurred while processing this directive]
Speech2Action: Cross-modal Supervision for Action Recognition
[an error occurred while processing this directive]
Our experience of the world is multimodal, however deep learning networks have been traditionally designed for and trained on unimodal inputs such as images, audio segments or text. In this work we investigate the link between spoken words and actions in movies. Using a form of cross-modal supervision, data labels from a supervision-rich modality are used to learn representations in another, supervision-starved target modality, eschewing the need for costly manual annotation in the target modality domain. By using a text-based model to predict actions from speech segments alone, we demonstrate superior action recognition performance from video on standard action recognition benchmarks, without using a single manually labelled action example.
[an error occurred while processing this directive]
Arsha Nagrani is a Research Scientist at Google Research. She obtained her PhD from the VGG group in the University of Oxford and her BA and MEng degrees from Cambridge Uni, UK. Her research interests lie at the intersection of computer vision and speech technology, focusing on cross-modal and multi-modal machine learning techniques for video recognition. She has also spent time as a visiting researcher at the Wadhwani AI Research non-profit organisation in Mumbai and has a keen interest in AI for Social Good. Her work has been recognised by a Best Student Paper Award at Interspeech, a Google PhD Fellowship and a Townsend Scholarship, and has been covered by major outlets such as The New Scientist, MIT Tech review and Verdict.
[an error occurred while processing this directive]
Personal home page
[an error occurred while processing this directive]
[an error occurred while processing this directive]
