Learning Factorizable Representation for Dynamical Visual Scene
Zeyu Yun and Bruno Olshausen
EECS Department, University of California, Berkeley
Technical Report No. UCB/
May 1, 2024
http://www2.eecs.berkeley.edu/Pubs/TechRpts/Hold/25e7fdc1bf4cc3bf8a190db520d27726.pdf
Understanding the efficient representation of dynamical visual scenes is a crucial and open research problem in many fields such as neuroscience, computer vision, and machine learning. In this thesis, we show how to build such an efficient representation using the latent variable model and dictionary learning. We set priors on the latent variable to encourage the separation between the motion component and the form component in the video. We present some preliminary qualitative results on the learned representation through visualization. When tested on synthetic video data with known transformations, the model can learn the ideal representation and precisely model these transformations. When the model is trained on the natural movie, it learns a dictionary that resembles the receptive field of V1 simple cells and complex cells. Moreover, it is able to model some of the transformations in natural video and factorize them to form a stable representation. Lastly, we discuss future directions including hierarchical modeling of motion and form, and potential application of the model such as video compression and video understanding.
Advisors: Bruno Olshausen and Sergey Levine
BibTeX citation:
@mastersthesis{Yun:31341, Author= {Yun, Zeyu and Olshausen, Bruno}, Title= {Learning Factorizable Representation for Dynamical Visual Scene}, School= {EECS Department, University of California, Berkeley}, Year= {2024}, Number= {UCB/}, Abstract= {Understanding the efficient representation of dynamical visual scenes is a crucial and open research problem in many fields such as neuroscience, computer vision, and machine learning. In this thesis, we show how to build such an efficient representation using the latent variable model and dictionary learning. We set priors on the latent variable to encourage the separation between the motion component and the form component in the video. We present some preliminary qualitative results on the learned representation through visualization. When tested on synthetic video data with known transformations, the model can learn the ideal representation and precisely model these transformations. When the model is trained on the natural movie, it learns a dictionary that resembles the receptive field of V1 simple cells and complex cells. Moreover, it is able to model some of the transformations in natural video and factorize them to form a stable representation. Lastly, we discuss future directions including hierarchical modeling of motion and form, and potential application of the model such as video compression and video understanding.}, }
EndNote citation:
%0 Thesis %A Yun, Zeyu %A Olshausen, Bruno %T Learning Factorizable Representation for Dynamical Visual Scene %I EECS Department, University of California, Berkeley %D 2024 %8 May 1 %@ UCB/ %F Yun:31341