Advancements in Efficient Training Strategies for Modern Deep Learning: From Implicit Deep Learning to Language Models and Beyond

Tanmay Gautam

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-20

April 25, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-20.pdf

In the rapidly evolving landscape of machine learning, the surge in computing power and data has propelled deep learning to the forefront of academic research. As the scale of models and datasets continues to expand, an increasing emphasis is placed on algorithmic enhancements to tackle the growing compute and memory requirements. Moreover, owing to its success across a wide range of applications, the domain has seen a proliferation of diverse neural network architectures each with their own unique training challenges. This thesis introduces efficient training methods for prevalent neural network architectures that leverage model structure for both resource and algorithmic efficiency. In the first part, we first present novel training algorithms with reduced computational and memory demands for implicit deep learning models and transformer-based language models. Specifically, we start by proposing an efficient sequential training method for implicit equilibrium models, which eliminates the need to solve computationally expensive fixed-point equations and projection steps within the existing training process. We then introduce variance-reduced zeroth-order methods to effectively fine-tune large language models using only memory-efficient inference passes. In the second part, we shift our focus to exploring the application of differentiable optimization to enhance training within meta-optimization and vector quantization. Specifically, for the former, we propose a means to use structure presented by differentiable convex optimization to parameterize novel first-order optimizers. For the latter, we introduce differentiable convex optimization as a technique to improve backpropagation through vector quantization layers. We hope that this work will offer fresh viewpoints to the research community and serve as a foundation to further develop efficient training strategies for deep learning.

Advisors: Somayeh Sojoudi

BibTeX citation:

@phdthesis{Gautam:EECS-2024-20,
    Author= {Gautam, Tanmay},
    Title= {Advancements in Efficient Training Strategies for Modern Deep Learning: From Implicit Deep Learning to Language Models and Beyond},
    School= {EECS Department, University of California, Berkeley},
    Year= {2024},
    Month= {Apr},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-20.html},
    Number= {UCB/EECS-2024-20},
    Abstract= {In the rapidly evolving landscape of machine learning, the surge in computing power and data has propelled deep learning to the forefront of academic research. As the scale of models and datasets continues to expand, an increasing emphasis is placed on algorithmic enhancements to tackle the growing compute and memory requirements. Moreover, owing to its success across a wide range of applications, the domain has seen a proliferation of diverse neural network architectures each with their own unique training challenges. This thesis introduces efficient training methods for prevalent neural network architectures that leverage model structure for both resource and algorithmic efficiency. In the first part, we first present novel training algorithms with reduced computational and memory demands for implicit deep learning models and transformer-based language models. Specifically, we start by proposing an efficient sequential training method for implicit equilibrium models, which eliminates the need to solve computationally expensive fixed-point equations and projection steps within the existing training process. We then introduce variance-reduced zeroth-order methods to effectively fine-tune large language models using only memory-efficient inference passes. In the second part, we shift our focus to exploring the application of differentiable optimization to enhance training within meta-optimization and vector quantization. Specifically, for the former, we propose a means to use structure presented by differentiable convex optimization to parameterize novel first-order optimizers. For the latter, we introduce differentiable convex optimization as a technique to improve backpropagation through vector quantization layers. We hope that this work will offer fresh viewpoints to the research community and serve as a foundation to further develop efficient training strategies for deep learning.},
}

EndNote citation:

%0 Thesis
%A Gautam, Tanmay 
%T Advancements in Efficient Training Strategies for Modern Deep Learning: From Implicit Deep Learning to Language Models and Beyond
%I EECS Department, University of California, Berkeley
%D 2024
%8 April 25
%@ UCB/EECS-2024-20
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-20.html
%F Gautam:EECS-2024-20