Towards Robust and Scalable Large Language Models
Paras Jain
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2023-180
May 18, 2023
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-180.pdf
This dissertation addresses two significant challenges of large language models (LLMs): robustness and scalability. Firstly, we focus on improving large language model robustness through the lens of learning code representations. I highlight our work on ContraCode which learns representations of code that are robust to label-preserving edits. Secondly, we tackle scalability challenges from a systems perspective. We present Checkmate, a system to support training models beyond GPU memory capacity limits through optimal rematerialization. Furthermore, Skyplane, a system that optimizes bulk data transfers between cloud object stores, enables training models on larger pre-training datasets in the cloud. Together, these contributions present a roadmap for enhancing the robustness and scalability of large language models.
Advisors: Ion Stoica and Joseph Gonzalez
BibTeX citation:
@phdthesis{Jain:EECS-2023-180, Author= {Jain, Paras}, Title= {Towards Robust and Scalable Large Language Models}, School= {EECS Department, University of California, Berkeley}, Year= {2023}, Month= {May}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-180.html}, Number= {UCB/EECS-2023-180}, Abstract= {This dissertation addresses two significant challenges of large language models (LLMs): robustness and scalability. Firstly, we focus on improving large language model robustness through the lens of learning code representations. I highlight our work on ContraCode which learns representations of code that are robust to label-preserving edits. Secondly, we tackle scalability challenges from a systems perspective. We present Checkmate, a system to support training models beyond GPU memory capacity limits through optimal rematerialization. Furthermore, Skyplane, a system that optimizes bulk data transfers between cloud object stores, enables training models on larger pre-training datasets in the cloud. Together, these contributions present a roadmap for enhancing the robustness and scalability of large language models.}, }
EndNote citation:
%0 Thesis %A Jain, Paras %T Towards Robust and Scalable Large Language Models %I EECS Department, University of California, Berkeley %D 2023 %8 May 18 %@ UCB/EECS-2023-180 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-180.html %F Jain:EECS-2023-180