Scheduling and Optimizing Stream Programs on Multicore Machines by Exploiting High-Level Abstractions
Dai Bui
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2013-184
November 7, 2013
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-184.pdf
Real-time streaming of HD movies and TV via YouTube, Netflix, Apple TV and Xbox Live is gaining popularity. Stream programs often consume considerable amounts of energy due to their compute-intensive nature. Making stream programs energy-efficient is important, especially for energy-constrained computing devices such as mobile phones and tablets. The first part of this thesis focuses on exploiting the popular Synchronous Dataflow (SDF) high-level abstraction of stream programs to design adaptive stream programs for energy reduction on multicore machines. Observing that IO rates of stream programs can vary at runtime, we seek to make stream programs adaptive by transforming their internal structures to adapt required occupied computing resources, e.g., cores and memory, to workload changes at runtime. Our experiments show that adapting stream programs to IO rate changes can lead to significant energy reduction. In addition, we also show that the modularity and static attributes of stream programs' abstraction not only help map stream programs on multicore machines more easily but also enable energy-efficient routing schemes of high-bandwidth stream traffic on the interconnection fabric, such as networks on-chip.
While SDF abstractions can help optimize stream programs on multicore machines, SDF is more suitable for describing stream data-intensive computations such as FFT, DCT, and FIR and so on. Modern stream operations such as MPEG2 or MP3 encoders/decoders are often more sophisticated and composed of multiple such computations. Enabling operation synchronization between different such computations with different semantics leads to the need for control messaging. We extend previous work on control messaging and give a formal definition for control message latency via the semantics of information wavefronts. This control-operation-integrated SDF (COSDF) is able to model sophisticated stream programs more precisely. However, the conventional scheduling method developed for SDF is not sufficient to schedule COSDF applications. To schedule COSDF applications, we develop a scheduling method using dependency graphs and applying a periodic graph theory, based on reduced dependency graphs (RDG). This RDG scheduling method also helps extract parallelism of stream programs. The more precise abstraction of COSDF is expected to help synthesize and generate sophisticated stream programs more efficiently.
Although the SDF modularity property also improves programmability, it can come at a price of efficiency when SDF models are not compiled and run using model-based design environments. However, compiling large SDF models to mitigate the inefficiency can be prohibitive in the situations where even a small change in a model may lead to large recompilation overhead. We tackle the problem by proposing a method for incrementally compiling large SDF models that faithfully captures the executions of original SDF models to avoid potential artificial deadlocks of a naive compilation method.
Advisors: Edward A. Lee
BibTeX citation:
@phdthesis{Bui:EECS-2013-184, Author= {Bui, Dai}, Title= {Scheduling and Optimizing Stream Programs on Multicore Machines by Exploiting High-Level Abstractions}, School= {EECS Department, University of California, Berkeley}, Year= {2013}, Month= {Nov}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-184.html}, Number= {UCB/EECS-2013-184}, Abstract= {Real-time streaming of HD movies and TV via YouTube, Netflix, Apple TV and Xbox Live is gaining popularity. Stream programs often consume considerable amounts of energy due to their compute-intensive nature. Making stream programs energy-efficient is important, especially for energy-constrained computing devices such as mobile phones and tablets. The first part of this thesis focuses on exploiting the popular Synchronous Dataflow (SDF) high-level abstraction of stream programs to design adaptive stream programs for energy reduction on multicore machines. Observing that IO rates of stream programs can vary at runtime, we seek to make stream programs adaptive by transforming their internal structures to adapt required occupied computing resources, e.g., cores and memory, to workload changes at runtime. Our experiments show that adapting stream programs to IO rate changes can lead to significant energy reduction. In addition, we also show that the modularity and static attributes of stream programs' abstraction not only help map stream programs on multicore machines more easily but also enable energy-efficient routing schemes of high-bandwidth stream traffic on the interconnection fabric, such as networks on-chip. While SDF abstractions can help optimize stream programs on multicore machines, SDF is more suitable for describing stream data-intensive computations such as FFT, DCT, and FIR and so on. Modern stream operations such as MPEG2 or MP3 encoders/decoders are often more sophisticated and composed of multiple such computations. Enabling operation synchronization between different such computations with different semantics leads to the need for control messaging. We extend previous work on control messaging and give a formal definition for control message latency via the semantics of information wavefronts. This control-operation-integrated SDF (COSDF) is able to model sophisticated stream programs more precisely. However, the conventional scheduling method developed for SDF is not sufficient to schedule COSDF applications. To schedule COSDF applications, we develop a scheduling method using dependency graphs and applying a periodic graph theory, based on reduced dependency graphs (RDG). This RDG scheduling method also helps extract parallelism of stream programs. The more precise abstraction of COSDF is expected to help synthesize and generate sophisticated stream programs more efficiently. Although the SDF modularity property also improves programmability, it can come at a price of efficiency when SDF models are not compiled and run using model-based design environments. However, compiling large SDF models to mitigate the inefficiency can be prohibitive in the situations where even a small change in a model may lead to large recompilation overhead. We tackle the problem by proposing a method for incrementally compiling large SDF models that faithfully captures the executions of original SDF models to avoid potential artificial deadlocks of a naive compilation method.}, }
EndNote citation:
%0 Thesis %A Bui, Dai %T Scheduling and Optimizing Stream Programs on Multicore Machines by Exploiting High-Level Abstractions %I EECS Department, University of California, Berkeley %D 2013 %8 November 7 %@ UCB/EECS-2013-184 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-184.html %F Bui:EECS-2013-184