Advancements in Efficient Training Strategies for Modern Deep Learning: From Implicit Deep Learning to Language Models and Beyond

Tanmay Gautam

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-60

May 7, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-60.pdf

In the rapidly evolving landscape of machine learning, the surge in computing power and data has propelled deep learning to the forefront of academic research. As the scale of models and datasets continues to expand, an increasing emphasis is placed on algorithmic enhancements to tackle the growing compute and memory requirements. Moreover, owing to its success across a wide range of applications, the domain has seen a proliferation of diverse neural network architectures each with their own unique training challenges. This thesis introduces efficient training methods for prevalent neural network architectures that leverage model structure for both resource and algorithmic efficiency. In the first part, we first present novel training algorithms with reduced computational and memory demands for implicit deep learning models and transformer-based language models. Specifically, we start by proposing an efficient sequential training method for implicit equilibrium models, which eliminates the need to solve computationally expensive fixed-point equations and projection steps within the existing training process. We then introduce variance-reduced zeroth-order methods to effectively fine-tune large language models using only memory-efficient inference passes. In the second part, we shift our focus to exploring the application of differentiable optimization to enhance training within meta-optimization and vector quantization. Specifically, for the former, we propose a means to use structure presented by differentiable convex optimization to parameterize novel first-order optimizers. For the latter, we introduce differentiable convex optimization as a technique to improve backpropagation through vector quantization layers. We hope that this work will offer fresh viewpoints to the research community and serve as a foundation to further develop efficient training strategies for deep learning.

BibTeX citation:

@techreport{Gautam:EECS-2024-60,
    Author= {Gautam, Tanmay},
    Title= {Advancements in Efficient Training Strategies for Modern Deep Learning: From Implicit Deep Learning to Language Models and Beyond},
    Year= {2024},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-60.html},
    Number= {UCB/EECS-2024-60},
    Abstract= {In the rapidly evolving landscape of machine learning, the
surge in computing power and data has propelled deep learning to the
forefront of academic research. As the scale of models and datasets
continues to expand, an increasing emphasis is placed on algorithmic
enhancements to tackle the growing compute and memory requirements.
Moreover, owing to its success across a wide range of applications,
the domain has seen a proliferation of diverse neural network
architectures each with their own unique training challenges. This
thesis introduces efficient training methods for prevalent neural
network architectures that leverage model structure for both resource
and algorithmic efficiency. In the first part, we first present novel
training algorithms with reduced computational and memory demands for
implicit deep learning models and transformer-based language models.
Specifically, we start by proposing an efficient sequential training
method for implicit equilibrium models, which eliminates the need to
solve computationally expensive fixed-point equations and projection
steps within the existing training process. We then introduce
variance-reduced zeroth-order methods to effectively fine-tune large
language models using only memory-efficient inference passes. In the
second part, we shift our focus to exploring the application of
differentiable optimization to enhance training within
meta-optimization and vector quantization. Specifically, for the
former, we propose a means to use structure presented by
differentiable convex optimization to parameterize novel first-order
optimizers. For the latter, we introduce differentiable convex
optimization as a technique to improve backpropagation through vector
quantization layers. We hope that this work will offer fresh
viewpoints to the research community and serve as a foundation to
further develop efficient training strategies for deep learning.},
}

EndNote citation:

%0 Report
%A Gautam, Tanmay 
%T Advancements in Efficient Training Strategies for Modern Deep Learning: From Implicit Deep Learning to Language Models and Beyond
%I EECS Department, University of California, Berkeley
%D 2024
%8 May 7
%@ UCB/EECS-2024-60
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-60.html
%F Gautam:EECS-2024-60