Tech Reports | EECS at UC Berkeley

Saurabh Gupta

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2018-104

August 8, 2018

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-104.pdf

In recent times, computer vision has made great leaps towards 2D understanding of sparse visual snapshots of the world. This is insufficient for robots that need to exist and act in the 3D world around them based on a continuous stream of multi-modal inputs. In this work, we present efforts towards bridging this gap between computer vision and robotics. We show how thinking about computer vision and robotics together brings out limitations of current computer vision tasks and techniques, and motivates joint study of perception and action. We present some initial efforts towards this and investigate a) how we can move from 2D understanding of images to 3D understanding of the underlying scene, b) how recent advances in representation learning for images can be extended to obtain representations for varied sensing modalities useful in robotics, and c) how thinking about vision and action together can lead to more effective solutions for the classical problem of visual navigation.

Advisors: Jitendra Malik

BibTeX citation:

@phdthesis{Gupta:EECS-2018-104,
    Author= {Gupta, Saurabh},
    Title= {Representations for Visually Guided Actions},
    School= {EECS Department, University of California, Berkeley},
    Year= {2018},
    Month= {Aug},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-104.html},
    Number= {UCB/EECS-2018-104},
    Abstract= {In recent times, computer vision has made great leaps towards 2D understanding of sparse visual snapshots of the world. This is insufficient for robots that need to exist and act in the 3D world around them based on a continuous stream of multi-modal inputs. In this work, we present efforts towards bridging this gap between computer vision and robotics. We show how thinking about computer vision and robotics together brings out limitations of current computer vision tasks and techniques, and motivates joint study of perception and action. We present some initial efforts towards this and investigate a) how we can move from 2D understanding of images to 3D understanding of the underlying scene, b) how recent advances in representation learning for images can be extended to obtain representations for varied sensing modalities useful in robotics, and c) how thinking about vision and action together can lead to more effective solutions for the classical problem of visual navigation.},
}

EndNote citation:

%0 Thesis
%A Gupta, Saurabh 
%T Representations for Visually Guided Actions
%I EECS Department, University of California, Berkeley
%D 2018
%8 August 8
%@ UCB/EECS-2018-104
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-104.html
%F Gupta:EECS-2018-104