Scalable Representations for Vision and Robotics

Tete Xiao

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2023-189
June 9, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-189.pdf

Artificial intelligence systems have shown remarkable advancements in recent years. However, the challenge of scalability and generalization to real-world problems remains a significant issue. In this thesis, we explore the three key components of building scalable artificial intelligence systems for computer vision, including model optimizability, learning objectives, and large-scale datasets, and apply these outcomes for robotics.

Our work begins with an examination of the optimizability of vision transformers, proposing a new set of optimizability metrics and an alternative design for their patchify stem. Next, we introduce a contrastive self-supervised learning objective that reduces inductive biases in self-supervised learning, resulting in superior performance across various datasets. We then showcase the effectiveness of self-supervised visual pre-training from real-world images for learning motor control tasks from pixels, outperforming supervised baselines and matching oracle state performance.

Expanding on this, we explore self-supervised visual pre-training on images from diverse, in-the-wild videos for real-world robotic tasks, demonstrating the effectiveness of pre-trained representations across a range of tasks and embodiments. In addition, we present a sim-to-real learning-based approach for real-world humanoid locomotion using a causal Transformer, marking the first fully learning-based method for real-world full-sized humanoid locomotion. Finally, we conclude the thesis and discuss potential future directions for further research in the field.

Advisor: Trevor Darrell


BibTeX citation:

@phdthesis{Xiao:EECS-2023-189,
    Author = {Xiao, Tete},
    Title = {Scalable Representations for Vision and Robotics},
    School = {EECS Department, University of California, Berkeley},
    Year = {2023},
    Month = {Jun},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-189.html},
    Number = {UCB/EECS-2023-189},
    Abstract = {Artificial intelligence systems have shown remarkable advancements in recent years. However, the challenge of scalability and generalization to real-world problems remains a significant issue. In this thesis, we explore the three key components of building scalable artificial intelligence systems for computer vision, including model optimizability, learning objectives, and large-scale datasets, and apply these outcomes for robotics.

Our work begins with an examination of the optimizability of vision transformers, proposing a new set of optimizability metrics and an alternative design for their patchify stem. Next, we introduce a contrastive self-supervised learning objective that reduces inductive biases in self-supervised learning, resulting in superior performance across various datasets. We then showcase the effectiveness of self-supervised visual pre-training from real-world images for learning motor control tasks from pixels, outperforming supervised baselines and matching oracle state performance.

Expanding on this, we explore self-supervised visual pre-training on images from diverse, in-the-wild videos for real-world robotic tasks, demonstrating the effectiveness of pre-trained representations across a range of tasks and embodiments. In addition, we present a sim-to-real learning-based approach for real-world humanoid locomotion using a causal Transformer, marking the first fully learning-based method for real-world full-sized humanoid locomotion. Finally, we conclude the thesis and discuss potential future directions for further research in the field.}
}

EndNote citation:

%0 Thesis
%A Xiao, Tete
%T Scalable Representations for Vision and Robotics
%I EECS Department, University of California, Berkeley
%D 2023
%8 June 9
%@ UCB/EECS-2023-189
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-189.html
%F Xiao:EECS-2023-189