Towards Principled Training and Serving of Large Language Models

Banghua Zhu

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2025-6

February 8, 2025

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-6.pdf

Large language models (LLMs), powered by neural networks with billions to trillions of parameters, face critical challenges in training efficiency and deployment scalability due to their computational demands. This thesis addresses these challenges through two key contributions: advancing reinforcement learning from human feedback (RLHF) for posttraining and optimizing LLM serving via novel caching strategies.

First, we provide a comprehensive theoretical analysis of RLHF, proposing algorithms with near-optimal sample complexity for reward learning. We validate these proposed algorithms through real-world case studies, including the development of Starling-7B, an RLHF-aligned model that demonstrates strong performance in human preference benchmarks.

Second, we design near-optimal caching algorithms tailored for LLM inference, reducing computational overhead while preserving output quality. Our framework achieves significant latency reductions in LLM serving environments.

Our work bridges theoretical analysis with practical implementation, offering insights into scalable alignment techniques and efficient deployment strategies. The results highlight the viability of RLHF for LLM post-training and the importance of system-level optimizations for sustainable LLM adoption.

Advisors: Michael Jordan and Jiantao Jiao

BibTeX citation:

@phdthesis{Zhu:EECS-2025-6,
    Author= {Zhu, Banghua},
    Title= {Towards Principled Training and Serving of Large Language Models},
    School= {EECS Department, University of California, Berkeley},
    Year= {2025},
    Month= {Feb},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-6.html},
    Number= {UCB/EECS-2025-6},
    Abstract= {Large language models (LLMs), powered by neural networks with billions to trillions of parameters, face critical challenges in training efficiency and deployment scalability due to their computational demands. This thesis addresses these challenges through two key contributions: advancing reinforcement learning from human feedback (RLHF) for posttraining and optimizing LLM serving via novel caching strategies.

First, we provide a comprehensive theoretical analysis of RLHF, proposing algorithms with near-optimal sample complexity for reward learning. We validate these proposed algorithms through real-world case studies, including the development of Starling-7B, an RLHF-aligned model that demonstrates strong performance in human preference benchmarks.

Second, we design near-optimal caching algorithms tailored for LLM inference, reducing computational overhead while preserving output quality. Our framework achieves significant latency reductions in LLM serving environments.

Our work bridges theoretical analysis with practical implementation, offering insights into scalable alignment techniques and efficient deployment strategies. The results highlight the viability of RLHF for LLM post-training and the importance of system-level optimizations for sustainable LLM adoption.},
}

EndNote citation:

%0 Thesis
%A Zhu, Banghua 
%T Towards Principled Training and Serving of Large Language Models
%I EECS Department, University of California, Berkeley
%D 2025
%8 February 8
%@ UCB/EECS-2025-6
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-6.html
%F Zhu:EECS-2025-6