Harshayu Girase

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-13

January 17, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-13.pdf

Following the successes of deep networks for image-based tasks, there has been an emphasis on developing video models to achieve similar feats. Many common tasks include video recognition, detection, and segmentation. However, only a few works study the task of video anticipation, which requires not only video understanding but also modeling of future behavior. In this work, we propose a novel self-supervised method to anticipate future actions in the short-term given a video input. As anticipation is a real-time problem, we highlight the importance of considering latency when developing and evaluating such models. In the first part of this article, we describe the task of short-term anticipation, dive into the current methodology, and propose a new metric and model for better evaluation of this task.

Furthermore, we also explore a specific application of short-term action anticipation: trajectory prediction. To demonstrate how the efficacy of short-term anticipation models can actually be utilized in practice, we create our own trajectory prediction dataset for both humans and vehicles. This dataset contains not only labeled trajectories for each agent but also detailed action labels. We propose a model that performs joint trajectory and future short-term action prediction. We demonstrate how the task of action anticipation can assist and improve the primary trajectory prediction task.

Advisors: Jitendra Malik


BibTeX citation:

@mastersthesis{Girase:EECS-2023-13,
    Author= {Girase, Harshayu},
    Editor= {Mangalam, Karttikeya and Malik, Jitendra},
    Title= {Latency-Aware Short-Term Video Action Anticipation and its Application in Trajectory Prediction},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {Jan},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-13.html},
    Number= {UCB/EECS-2023-13},
    Abstract= {Following the successes of deep networks for image-based tasks, there has been an emphasis on developing video models to achieve similar feats. Many common tasks include video recognition, detection, and segmentation. However, only a few works study the task of video anticipation, which requires not only video understanding but also modeling of future behavior. In this work, we propose a novel self-supervised method to anticipate future actions in the short-term given a video input. As anticipation is a real-time problem, we highlight the importance of considering latency when developing and evaluating such models. In the first part of this article, we describe the task of short-term anticipation, dive into the current methodology, and propose a new metric and model for better evaluation of this task.

Furthermore, we also explore a specific application of short-term action anticipation: trajectory prediction. To demonstrate how the efficacy of short-term anticipation models can actually be utilized in practice, we create our own trajectory prediction dataset for both humans and vehicles. This dataset contains not only labeled trajectories for each agent but also detailed action labels. We propose a model that performs joint trajectory and future short-term action prediction. We demonstrate how the task of action anticipation can assist and improve the primary trajectory prediction task.},
}

EndNote citation:

%0 Thesis
%A Girase, Harshayu 
%E Mangalam, Karttikeya 
%E Malik, Jitendra 
%T Latency-Aware Short-Term Video Action Anticipation and its Application in Trajectory Prediction
%I EECS Department, University of California, Berkeley
%D 2023
%8 January 17
%@ UCB/EECS-2023-13
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-13.html
%F Girase:EECS-2023-13