Tyler Zhu

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-104

May 11, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-104.pdf

The increasing prevalence of a unified architecture for machine learning, i.e. the transformer, raises an important question: can a single architecture really do it all? Simultaneously, the growing size of datasets and deep learning models has made faster and memory-efficient training crucial. One recently proposed line of work is reversible networks, which leverage reversible transformations to perfectly reconstruct inputs from outputs while requiring very minimal changes to existing architectures. In this work, we present an in-depth analysis of reversible transformers and demonstrate that they can be more accurate, efficient, and fast than their vanilla counterparts. We introduce a new method of reversible backpropagation which is faster and scales better with memory than previous techniques, and also demonstrate new results which show that reversible transformers transfer better to downstream visual tasks.

Advisors: Jitendra Malik


BibTeX citation:

@mastersthesis{Zhu:EECS-2023-104,
    Author= {Zhu, Tyler},
    Title= {Making Reversible Transformers Accurate, Efficient, and Fast},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-104.html},
    Number= {UCB/EECS-2023-104},
    Abstract= {The increasing prevalence of a unified architecture for machine learning, i.e. the transformer, raises an important question: can a single architecture really do it all?
Simultaneously, the growing size of datasets and deep learning models has made faster and memory-efficient training crucial. 
One recently proposed line of work is reversible networks, which leverage reversible transformations to perfectly reconstruct inputs from outputs while requiring very minimal changes to existing architectures.
In this work, we present an in-depth analysis of reversible transformers and demonstrate that they can be more accurate, efficient, and fast than their vanilla counterparts.
We introduce a new method of reversible backpropagation which is faster and scales better with memory than previous techniques, and also demonstrate new results which show that reversible transformers transfer better to downstream visual tasks.},
}

EndNote citation:

%0 Thesis
%A Zhu, Tyler 
%T Making Reversible Transformers Accurate, Efficient, and Fast
%I EECS Department, University of California, Berkeley
%D 2023
%8 May 11
%@ UCB/EECS-2023-104
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-104.html
%F Zhu:EECS-2023-104