Rising Stars 2020:

Shruti Palaskar

PhD Candidate

Carnegie Mellon University

Areas of Interest

  • Artificial Intelligence


Learning Semantic Concepts for Video Summarization


Human cognition is inherently multimodal where we watch, listen, and read semantic cues to build context and understand what is around us. Video understanding is a field of multimodal learning research that tries to imitate this learning using the video signal. While most current video understanding tasks have achieved great success, they do not model a multimodal multilevel recognition model, a necessary component to emulate human multimodal learning. One contribution of my thesis is to propose a novel video understanding task that learns multilevel semantic concepts from a video and uses these concepts to generate abstractive video summaries as a downstream task, demonstrating the usefulness of such video understanding models. We show strong benefits of fused multimodal models over unimodal, using all the speech, text, and, vision modalities. Furthermore, we observe significant gains in the abstractive summarization downstream task with the proposed semantic concept learning task.


Shruti is a Ph.D. student at the Language Technologies Institute, Carnegie Mellon University, advised by Prof. Florian Metze and Prof. Alan Black. Her research aims towards enabling machines to automatically learn from multiple modalities of data like audio, video, speech, text or semantics, as humans naturally do. Shruti is a recipient of the Facebook Fellowship and the Center for Machine Learning and Health Fellowship. Prior to starting her Ph.D., she received her Master's in Language Technologies from Carnegie Mellon University and Bachelors in Computer Engineering from Pune Institute of Computer Technology.

Personal home page