Flux: An Adaptive Partitioning Operator for Continuous Query Systems

Mehul A. Shah, Joseph M. Hellerstein, Sirish Chandrasekaran and Michael J. Franklin

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-02-1205
October 2002

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2002/CSD-02-1205.pdf

The long-running nature of continuous queries poses new scalability challenges for dataflow processing. CQ systems execute pipelined dataflows that may be shared across multiple queries. The scalability of these dataflows is limited by their constituent, stateful operators -- e.g. windowed joins or grouping operators. To scale such operators, a natural solution is to partition them across a shared-nothing platform. But in the CQ context, traditional, static techniques for partitioned parallelism can exhibit detrimental imbalances as workload and runtime conditions evolve. Longrunning CQ dataflows must continue to function robustly in the face of these imbalances.

To address this challenge, we introduce a dataflow operator called Flux that encapsulates adaptive state partitioning and dataflow routing. Flux is placed between producer-consumer stages in a dataflow pipeline to repartition stateful operators while the pipeline is still executing. We present the Flux architecture, along with repartitioning policies that can be used for CQ operators under shifting processing and memory loads. We show that the Flux mechanism and these policies can provide several factors improvement in throughput and orders of magnitude improvement in average latency over the static case.


BibTeX citation:

@techreport{Shah:CSD-02-1205,
    Author = {Shah, Mehul A. and Hellerstein, Joseph M. and Chandrasekaran, Sirish and Franklin, Michael J.},
    Title = {Flux: An Adaptive Partitioning Operator for Continuous Query Systems},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2002},
    Month = {Oct},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2002/5689.html},
    Number = {UCB/CSD-02-1205},
    Abstract = {The long-running nature of continuous queries poses new scalability challenges for dataflow processing. CQ systems execute pipelined dataflows that may be shared across multiple queries. The scalability of these dataflows is limited by their constituent, stateful operators -- e.g. windowed joins or grouping operators. To scale such operators, a natural solution is to partition them across a shared-nothing platform. But in the CQ context, traditional, static techniques for partitioned parallelism can exhibit detrimental imbalances as workload and runtime conditions evolve. Longrunning CQ dataflows must continue to function robustly in the face of these imbalances. <p>To address this challenge, we introduce a dataflow operator called Flux that encapsulates adaptive state partitioning and dataflow routing. Flux is placed between producer-consumer stages in a dataflow pipeline to repartition stateful operators while the pipeline is still executing. We present the Flux architecture, along with repartitioning policies that can be used for CQ operators under shifting processing and memory loads. We show that the Flux mechanism and these policies can provide several factors improvement in throughput and orders of magnitude improvement in average latency over the static case.}
}

EndNote citation:

%0 Report
%A Shah, Mehul A.
%A Hellerstein, Joseph M.
%A Chandrasekaran, Sirish
%A Franklin, Michael J.
%T Flux: An Adaptive Partitioning Operator for Continuous Query Systems
%I EECS Department, University of California, Berkeley
%D 2002
%@ UCB/CSD-02-1205
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2002/5689.html
%F Shah:CSD-02-1205