Frank Luan
EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2024-219
December 19, 2024
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-219.pdf
The exponential growth in artificial intelligence (AI) compute demands has significantly outpaced the advancement in single-node processing power. This widening gap has made distributed heterogeneous processing essential for modern AI applications. However, existing distributed data processing systems struggle to effectively handle the complexities of heterogeneous execution.
This dissertation argues for extensibility as the key principle in designing systems for distributed heterogeneous processing. We build two libraries on top of Ray, a distributed execution system. First, we develop the streaming batch model, which enables efficient heterogeneous execution and dynamic adaptability to varying workloads. Second, we introduce Exoshuffle, a distributed shuffle library that enables flexible control of data semantics without sacrificing performance, demonstrating that complex data operations can be implemented efficiently as application libraries rather than requiring purpose-built systems. Both libraries are integrated into the opne-source framework Ray Data, which has been adopted by thousands of companies in the industry. Finally, we validate our the effectiveness of this architecture through the CloudSort benchmark, in which Exoshuffle-CloudSort set a new world record for the most cost-effective sorting of data on a public cloud. These results demonstrate that this extensible architecture can deliver both high performance and scalability while providing the flexibility required for heterogeneous workloads. This work provides a foundation for building efficient distributed heterogeneous processing systems capable of meeting the continuously growing computational demands of AI applications.
Advisor: Ion Stoica
"; ?>
BibTeX citation:
@phdthesis{Luan:EECS-2024-219, Author = {Luan, Frank}, Title = {An Extensible Architecture for Distributed Heterogeneous Processing}, School = {EECS Department, University of California, Berkeley}, Year = {2024}, Month = {Dec}, URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-219.html}, Number = {UCB/EECS-2024-219}, Abstract = {The exponential growth in artificial intelligence (AI) compute demands has significantly outpaced the advancement in single-node processing power. This widening gap has made distributed heterogeneous processing essential for modern AI applications. However, existing distributed data processing systems struggle to effectively handle the complexities of heterogeneous execution. This dissertation argues for extensibility as the key principle in designing systems for distributed heterogeneous processing. We build two libraries on top of Ray, a distributed execution system. First, we develop the streaming batch model, which enables efficient heterogeneous execution and dynamic adaptability to varying workloads. Second, we introduce Exoshuffle, a distributed shuffle library that enables flexible control of data semantics without sacrificing performance, demonstrating that complex data operations can be implemented efficiently as application libraries rather than requiring purpose-built systems. Both libraries are integrated into the opne-source framework Ray Data, which has been adopted by thousands of companies in the industry. Finally, we validate our the effectiveness of this architecture through the CloudSort benchmark, in which Exoshuffle-CloudSort set a new world record for the most cost-effective sorting of data on a public cloud. These results demonstrate that this extensible architecture can deliver both high performance and scalability while providing the flexibility required for heterogeneous workloads. This work provides a foundation for building efficient distributed heterogeneous processing systems capable of meeting the continuously growing computational demands of AI applications.} }
EndNote citation:
%0 Thesis %A Luan, Frank %T An Extensible Architecture for Distributed Heterogeneous Processing %I EECS Department, University of California, Berkeley %D 2024 %8 December 19 %@ UCB/EECS-2024-219 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-219.html %F Luan:EECS-2024-219