Towards Robust and Scalable Large Language Models

Paras Jain

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2023-180
May 18, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-180.pdf

This dissertation addresses two significant challenges of large language models (LLMs): robustness and scalability. Firstly, we focus on improving large language model robustness through the lens of learning code representations. I highlight our work on ContraCode which learns representations of code that are robust to label-preserving edits. Secondly, we tackle scalability challenges from a systems perspective. We present Checkmate, a system to support training models beyond GPU memory capacity limits through optimal rematerialization. Furthermore, Skyplane, a system that optimizes bulk data transfers between cloud object stores, enables training models on larger pre-training datasets in the cloud. Together, these contributions present a roadmap for enhancing the robustness and scalability of large language models.

Advisor: Ion Stoica and Joseph Gonzalez


BibTeX citation:

@phdthesis{Jain:EECS-2023-180,
    Author = {Jain, Paras},
    Title = {Towards Robust and Scalable Large Language Models},
    School = {EECS Department, University of California, Berkeley},
    Year = {2023},
    Month = {May},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-180.html},
    Number = {UCB/EECS-2023-180},
    Abstract = {This dissertation addresses two significant challenges of large language models (LLMs): robustness and scalability. Firstly, we focus on improving large language model robustness through the lens of learning code representations. I highlight our work on ContraCode which learns representations of code that are robust to label-preserving edits. Secondly, we tackle scalability challenges from a systems perspective. We present Checkmate, a system to support training models beyond GPU memory capacity limits through optimal rematerialization. Furthermore, Skyplane, a system that optimizes bulk data transfers between cloud object stores, enables training models on larger pre-training datasets in the cloud. Together, these contributions present a roadmap for enhancing the robustness and scalability of large language models.}
}

EndNote citation:

%0 Thesis
%A Jain, Paras
%T Towards Robust and Scalable Large Language Models
%I EECS Department, University of California, Berkeley
%D 2023
%8 May 18
%@ UCB/EECS-2023-180
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-180.html
%F Jain:EECS-2023-180