[an error occurred while processing this directive] Shruti Palaskar [an error occurred while processing this directive]

[an error occurred while processing this directive] [an error occurred while processing this directive]

[an error occurred while processing this directive] [an error occurred while processing this directive] PhD Candidate [an error occurred while processing this directive] Carnegie Mellon University [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive]

Artificial Intelligence

[an error occurred while processing this directive] Learning Semantic Concepts for Video Summarization [an error occurred while processing this directive] Human cognition is inherently multimodal where we watch, listen, and read semantic cues to build context and understand what is around us. Video understanding is a field of multimodal learning research that tries to imitate this learning using the video signal. While most current video understanding tasks have achieved great success, they do not model a multimodal multilevel recognition model, a necessary component to emulate human multimodal learning. One contribution of my thesis is to propose a novel video understanding task that learns multilevel semantic concepts from a video and uses these concepts to generate abstractive video summaries as a downstream task, demonstrating the usefulness of such video understanding models. We show strong benefits of fused multimodal models over unimodal, using all the speech, text, and, vision modalities. Furthermore, we observe significant gains in the abstractive summarization downstream task with the proposed semantic concept learning task. [an error occurred while processing this directive] Shruti is a Ph.D. student at the Language Technologies Institute, Carnegie Mellon University, advised by Prof. Florian Metze and Prof. Alan Black. Her research aims towards enabling machines to automatically learn from multiple modalities of data like audio, video, speech, text or semantics, as humans naturally do. Shruti is a recipient of the Facebook Fellowship and the Center for Machine Learning and Health Fellowship. Prior to starting her Ph.D., she received her Master's in Language Technologies from Carnegie Mellon University and Bachelors in Computer Engineering from Pune Institute of Computer Technology. [an error occurred while processing this directive] Personal home page [an error occurred while processing this directive] [an error occurred while processing this directive]