Enhancing QUIC: Quality-of-Service LLM Serving and MASQUE Proxy Scheduling
Rithvik Chuppala
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2024-64
May 8, 2024
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-64.pdf
This thesis explores the QUIC network protocol, a transport-layer protocol positioned as TCP's successor in modern network architectures. Focusing on QUIC's key design aspect of stream scheduling, this research investigates two contemporary networking applications: Large Language Model Serving and Network Proxying. The first chapter presents various stream scheduling algorithms tailored to the unique demands of LLM serving, providing novel approaches to optimize data transmission and model service in resource-constrained environments. In the second chapter, this thesis demonstrates the role of stream scheduling in the context of MASQUE proxies, exploring ways to improve the performance and scalability of QUIC-based tunneling protocols. As new applications demand custom-tailored network functionalities, stream scheduling emerges as a fundamental primitive for delivering application-specific optimizations, blurring the lines between the end-host and the network infrastructure. At its core, QUIC itself departs from traditional conventions, relying on the plain datagram abstraction and assuming the responsibilities of reliable delivery, security, and application-level semantics, integrating Layers 4 through 7 in the OSI model. This paradigm shift emphasizes the importance of co-designing protocols and algorithms for application semantics. This work aims to enhance the efficacy of stream scheduling in QUIC, addressing the evolving demands of modern networking applications.
Advisors: Sylvia Ratnasamy
BibTeX citation:
@mastersthesis{Chuppala:EECS-2024-64, Author= {Chuppala, Rithvik}, Title= {Enhancing QUIC: Quality-of-Service LLM Serving and MASQUE Proxy Scheduling}, School= {EECS Department, University of California, Berkeley}, Year= {2024}, Month= {May}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-64.html}, Number= {UCB/EECS-2024-64}, Abstract= {This thesis explores the QUIC network protocol, a transport-layer protocol positioned as TCP's successor in modern network architectures. Focusing on QUIC's key design aspect of stream scheduling, this research investigates two contemporary networking applications: Large Language Model Serving and Network Proxying. The first chapter presents various stream scheduling algorithms tailored to the unique demands of LLM serving, providing novel approaches to optimize data transmission and model service in resource-constrained environments. In the second chapter, this thesis demonstrates the role of stream scheduling in the context of MASQUE proxies, exploring ways to improve the performance and scalability of QUIC-based tunneling protocols. As new applications demand custom-tailored network functionalities, stream scheduling emerges as a fundamental primitive for delivering application-specific optimizations, blurring the lines between the end-host and the network infrastructure. At its core, QUIC itself departs from traditional conventions, relying on the plain datagram abstraction and assuming the responsibilities of reliable delivery, security, and application-level semantics, integrating Layers 4 through 7 in the OSI model. This paradigm shift emphasizes the importance of co-designing protocols and algorithms for application semantics. This work aims to enhance the efficacy of stream scheduling in QUIC, addressing the evolving demands of modern networking applications.}, }
EndNote citation:
%0 Thesis %A Chuppala, Rithvik %T Enhancing QUIC: Quality-of-Service LLM Serving and MASQUE Proxy Scheduling %I EECS Department, University of California, Berkeley %D 2024 %8 May 8 %@ UCB/EECS-2024-64 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-64.html %F Chuppala:EECS-2024-64