Yanpei Chen and Archana Sulochana Ganapathi and Armando Fox and Randy H. Katz and David A. Patterson

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2010-6

January 21, 2010

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-6.pdf

Energy efficiency is a growing concern in modern datacenters. As Internet services increasingly rely on MapReduce workloads to fuel their flagship businesses, there is a growing need for better MapReduce energy efficency evaluation mechanisms. We present a statistics-driven workload generation framework that distills summary statistics from production MapReduce traces and realistically reproduces representative workloads. These workloads help us evaluate design decisions with regard to scale, configuration, scheduling, and other issues. We use this framework to identify specific suggestions to improve MapReduce energy efficiency. Our key finding is that evaluations using trace-driven workloads reverse current design priorities in optimizing for data intensive synthetic jobs.


BibTeX citation:

@techreport{Chen:EECS-2010-6,
    Author= {Chen, Yanpei and Ganapathi, Archana Sulochana and Fox, Armando and Katz, Randy H. and Patterson, David A.},
    Title= {Statistical Workloads for Energy Efficient MapReduce},
    Year= {2010},
    Month= {Jan},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-6.html},
    Number= {UCB/EECS-2010-6},
    Abstract= {Energy efficiency is a growing concern in modern datacenters. As Internet services increasingly rely on MapReduce workloads to fuel their flagship businesses, there is a growing need for better MapReduce energy efficency evaluation mechanisms. We present a statistics-driven workload generation framework that distills summary statistics from production MapReduce traces and realistically reproduces representative workloads. These workloads help us evaluate design decisions with regard to scale, configuration, scheduling, and other issues. We use this framework to identify specific suggestions to improve MapReduce energy efficiency. Our key finding is that evaluations using trace-driven workloads reverse current design priorities in optimizing for data intensive synthetic jobs.},
}

EndNote citation:

%0 Report
%A Chen, Yanpei 
%A Ganapathi, Archana Sulochana 
%A Fox, Armando 
%A Katz, Randy H. 
%A Patterson, David A. 
%T Statistical Workloads for Energy Efficient MapReduce
%I EECS Department, University of California, Berkeley
%D 2010
%8 January 21
%@ UCB/EECS-2010-6
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-6.html
%F Chen:EECS-2010-6