Rachel Xin
EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2025-46
May 9, 2025
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-46.pdf
This thesis presents a unified approach to building scalable and intelligent generative systems by advancing two key areas: high-throughput inference for diffusion models and reinforcement learning-based training for large language models (LLMs). In the first part, we introduce DiT-Serve, a novel system architecture for video diffusion transformers designed to meet the high demand of generative workloads. We leverage denoising-level parallelisms to enable efficient batching and present a new attention algorithm. By addressing challenges in staggered request handling, multi-resolution support, and improvement of GPU utilization, DiT-Serve contributes to the improvement of throughput and responsiveness of generative systems in production environments. In the second part, we transition to explore how reinforcement learning (RL) can be used to enhance the reasoning capabilities of LLMs. We present a training methodology centered on curating high-quality, verifiable coding data and algorithmic and system optimizations. This integration of environment-based feedback and effective reward calculation represents the future of transforming small language models into powerful reasoning models. We introduce DeepCoder, a code reasoning model that matches the performance of much larger models, illustrating the potential of RL-based scaling. By coupling efficient inference infrastructure with intelligent training strategies, this thesis contributes toward the democratization of generative AI—making high-performance systems more accessible, interpretable, and adaptable across domains.
Advisor: Joseph Gonzalez
";
?>
BibTeX citation:
@mastersthesis{Xin:EECS-2025-46, Author = {Xin, Rachel}, Title = {DiT-Serve and DeepCoder: Enabling Video and Code Generation at Scale}, School = {EECS Department, University of California, Berkeley}, Year = {2025}, Month = {May}, URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-46.html}, Number = {UCB/EECS-2025-46}, Abstract = {This thesis presents a unified approach to building scalable and intelligent generative systems by advancing two key areas: high-throughput inference for diffusion models and reinforcement learning-based training for large language models (LLMs). In the first part, we introduce DiT-Serve, a novel system architecture for video diffusion transformers designed to meet the high demand of generative workloads. We leverage denoising-level parallelisms to enable efficient batching and present a new attention algorithm. By addressing challenges in staggered request handling, multi-resolution support, and improvement of GPU utilization, DiT-Serve contributes to the improvement of throughput and responsiveness of generative systems in production environments. In the second part, we transition to explore how reinforcement learning (RL) can be used to enhance the reasoning capabilities of LLMs. We present a training methodology centered on curating high-quality, verifiable coding data and algorithmic and system optimizations. This integration of environment-based feedback and effective reward calculation represents the future of transforming small language models into powerful reasoning models. We introduce DeepCoder, a code reasoning model that matches the performance of much larger models, illustrating the potential of RL-based scaling. By coupling efficient inference infrastructure with intelligent training strategies, this thesis contributes toward the democratization of generative AI—making high-performance systems more accessible, interpretable, and adaptable across domains.} }
EndNote citation:
%0 Thesis %A Xin, Rachel %T DiT-Serve and DeepCoder: Enabling Video and Code Generation at Scale %I EECS Department, University of California, Berkeley %D 2025 %8 May 9 %@ UCB/EECS-2025-46 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-46.html %F Xin:EECS-2025-46