Efficient and Scalable Neural Architectures for Visual Recognition

Zhuang Liu

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-205

August 12, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-205.pdf

The successful application of ConvNets and other neural architectures to computer vision is central to the AI revolution seen in the past decade. There have been strong needs for scaling vision architectures to be both smaller and larger. Small models represent the demand for efficiency, as the deployment of visual recognition systems is often on edge devices; large models highlight the pursuit for scalability - the ability to utilize increasingly abundant compute and data to achieve ever-higher accuracy. Research in both directions are fruitful, producing many useful design principles, and the quest for more performant models never stops. Meanwhile, the very fast development pace in the literature can sometimes obscure the main mechanism responsible for certain methods' favorable results.

In this dissertation, we will present our research from two aspects in this area: (1) developing intuitive algorithms for efficient and flexible ConvNet model inference; (2) studying baseline approaches to reveal what is behind popular scaling methods' success. First, we will introduce our work on one of the first anytime algorithm for dense prediction. We will then examine the effectiveness of model pruning algorithms by comparing them with an extremely simple baseline, and argue their true value may lie in learning architectures. Finally, We present our work on questioning whether self-attention is responsible for Transformer's recent exceptional scalability in vision, by modernizing a traditional ConvNet with design techniques adapted from Transformers.

Advisors: Trevor Darrell

BibTeX citation:

@phdthesis{Liu:EECS-2022-205,
    Author= {Liu, Zhuang},
    Title= {Efficient and Scalable Neural Architectures for Visual Recognition},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {Aug},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-205.html},
    Number= {UCB/EECS-2022-205},
    Abstract= {The successful application of ConvNets and other neural architectures to computer vision is central to the AI revolution seen in the past decade. There have been strong needs for scaling vision architectures to be both smaller and larger. Small models represent the demand for efficiency, as the deployment of visual recognition systems is often on edge devices; large models highlight the pursuit for scalability - the ability to utilize increasingly abundant compute and data to achieve ever-higher accuracy. Research in both directions are fruitful, producing many useful design principles, and the quest for more performant models never stops. Meanwhile, the very fast development pace in the literature can sometimes obscure the main mechanism responsible for certain methods' favorable results.

In this dissertation, we will present our research from two aspects in this area: (1) developing intuitive algorithms for efficient and flexible ConvNet model inference; (2) studying baseline approaches to reveal what is behind popular scaling methods' success. First, we will introduce our work on one of the first anytime algorithm for  dense prediction. We will then examine the effectiveness of model pruning algorithms by comparing them with an extremely simple baseline, and argue their true value may lie in learning architectures. Finally, We present our work on questioning whether self-attention is responsible for Transformer's recent exceptional scalability in vision, by modernizing a traditional ConvNet with design techniques adapted from Transformers.},
}

EndNote citation:

%0 Thesis
%A Liu, Zhuang 
%T Efficient and Scalable Neural Architectures for Visual Recognition
%I EECS Department, University of California, Berkeley
%D 2022
%8 August 12
%@ UCB/EECS-2022-205
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-205.html
%F Liu:EECS-2022-205