Compiler-Controlled Multithreading for Lenient Parallel Languages

Klaus Erik Schauser, David E. Culler and Thorsten von Eicken

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-91-640
July 1991

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1991/CSD-91-640.pdf

Tolerance to communication latency and inexpensive synchronization are critical for general-purpose computing on large multiprocessors. Fast dynamic scheduling is required for powerful non-strict parallel languages. However, machines that support rapid switching between multiple execution threads remain a design challenge. This paper explores how multithreaded execution can be addressed as a compilation problem, to achieve switching rates approaching what hardware mechanisms might provide.

Compiler-controlled multithreading is examined through compilation of a lenient parallel language, Id90, for a threaded abstract machine, TAM. A key feature of TAM is that synchronization is explicit and occurs only at the start of a thread, so that a simple cost model can be applied. A scheduling hierarchy allows the compiler to schedule logically related threads closely together in time and to use registers across threads. Remote communication is via message sends and split-phase memory accesses. Messages and memory replies are received by compiler-generated message handlers which rapidly integrate these events with thread scheduling.

To compile Id90 for TAM, we employ a new parallel intermediate form, dual-graphs, with distinct control and data arcs. This provides a clean framework for partitioning the program into threads, scheduling threads, and managing registers under asynchronous execution. The compilation process is described and preliminary measurements of its effectiveness are discussed. Dynamic execution measurements are obtained via a second compilation step, which translates TAM into native code for existing machines with instrumentation incorporated. These measurements show that the cost of compiler-controlled multithreading is within a small factor of the cost of control flow in sequential languages.


BibTeX citation:

@techreport{Schauser:CSD-91-640,
    Author = {Schauser, Klaus Erik and Culler, David E. and von Eicken, Thorsten},
    Title = {Compiler-Controlled Multithreading for Lenient Parallel Languages},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {1991},
    Month = {Jul},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1991/6392.html},
    Number = {UCB/CSD-91-640},
    Abstract = {Tolerance to communication latency and inexpensive synchronization are critical for general-purpose computing on large multiprocessors. Fast dynamic scheduling is required for powerful non-strict parallel languages. However, machines that support rapid switching between multiple execution threads remain a design challenge. This paper explores how multithreaded execution can be addressed as a compilation problem, to achieve switching rates approaching what hardware mechanisms might provide. <p>Compiler-controlled multithreading is examined through compilation of a lenient parallel language, Id90, for a threaded abstract machine, TAM. A key feature of TAM is that synchronization is explicit and occurs only at the start of a thread, so that a simple cost model can be applied. A scheduling hierarchy allows the compiler to schedule logically related threads closely together in time and to use registers across threads. Remote communication is via message sends and split-phase memory accesses. Messages and memory replies are received by compiler-generated message handlers which rapidly integrate these events with thread scheduling. <p>To compile Id90 for TAM, we employ a new parallel intermediate form, dual-graphs, with distinct control and data arcs. This provides a clean framework for partitioning the program into threads, scheduling threads, and managing registers under asynchronous execution. The compilation process is described and preliminary measurements of its effectiveness are discussed. Dynamic execution measurements are obtained via a second compilation step, which translates TAM into native code for existing machines with instrumentation incorporated. These measurements show that the cost of compiler-controlled multithreading is within a small factor of the cost of control flow in sequential languages.}
}

EndNote citation:

%0 Report
%A Schauser, Klaus Erik
%A Culler, David E.
%A von Eicken, Thorsten
%T Compiler-Controlled Multithreading for Lenient Parallel Languages
%I EECS Department, University of California, Berkeley
%D 1991
%@ UCB/CSD-91-640
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1991/6392.html
%F Schauser:CSD-91-640