Ressort: An Auto-Tuning Framework for Parallel Shuffle Kernels

Eric Love

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2015-246
December 17, 2015

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-246.pdf

This thesis presents Ressort, an auto-tuning framework for computational patterns that perform any kind of data-dependent data reordering or transformation. These programs, which we call shuffle kernels, account for large fractions of the runtime of database workloads and other applications. Hardware-conscious optimizations of shuffle kernels are all alike in that they generally consist of choosing one of many possible ways to decompose a particular kernel into a pipeline of more primitive operations: a sort might consist of a hash-based partitioning followed by an in-cache quicksort, or a join might entail sorting and then merging, for example. Ressort presents a domain-specific language (DSL) that enables the succinct expression of these compositions while also exposing the nested array-style par- allelism that they imply, and supplies a compiler to exploit it. It also includes a parallel C-like intermediate representation that assists in the generation of performant shuffle kernel implementations. We present the design and implementation of these two language layers, and evaluate the performance of their output on a variety of hardware targets under different algorithmic requirements.

Advisor: Krste Asanović


BibTeX citation:

@mastersthesis{Love:EECS-2015-246,
    Author = {Love, Eric},
    Title = {Ressort: An Auto-Tuning Framework for Parallel Shuffle Kernels},
    School = {EECS Department, University of California, Berkeley},
    Year = {2015},
    Month = {Dec},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-246.html},
    Number = {UCB/EECS-2015-246},
    Abstract = {This thesis presents Ressort, an auto-tuning framework for computational patterns that perform any kind of data-dependent data reordering or transformation. These programs, which we call shuffle kernels, account for large fractions of the runtime of database workloads and other applications. Hardware-conscious optimizations of shuffle kernels are all alike in that they generally consist of choosing one of many possible ways to decompose a particular kernel into a pipeline of more primitive operations: a sort might consist of a hash-based partitioning followed by an in-cache quicksort, or a join might entail sorting and then merging, for example. Ressort presents a domain-specific language (DSL) that enables the succinct expression of these compositions while also exposing the nested array-style par- allelism that they imply, and supplies a compiler to exploit it. It also includes a parallel C-like intermediate representation that assists in the generation of performant shuffle kernel implementations. We present the design and implementation of these two language layers, and evaluate the performance of their output on a variety of hardware targets under different algorithmic requirements.}
}

EndNote citation:

%0 Thesis
%A Love, Eric
%T Ressort: An Auto-Tuning Framework for Parallel Shuffle Kernels
%I EECS Department, University of California, Berkeley
%D 2015
%8 December 17
%@ UCB/EECS-2015-246
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-246.html
%F Love:EECS-2015-246