Statistics-Driven Workload Modeling for the Cloud

Archana Sulochana Ganapathi, Yanpei Chen, Armando Fox, Randy H. Katz and David A. Patterson

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2009-160
November 30, 2009

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-160.pdf

A recent trend for data-intensive computations is to use pay-as-you-go execution environments that scale transparently to the user. However, providers of such environments must tackle the challenge of configuring their system to provide maximal performance while minimizing the cost of resources used. In this paper, we use statistical models to predict resource requirements for Cloud computing applications. Such a prediction framework can guide system design and deployment decisions such as scale, scheduling, and capacity. In addition, we present initial design of a workload generator that can be used to evaluate alternative configurations without the overhead of reproducing a real workload. This paper focuses on statistical modeling and its application to data-intensive workloads.


BibTeX citation:

@techreport{Ganapathi:EECS-2009-160,
    Author = {Ganapathi, Archana Sulochana and Chen, Yanpei and Fox, Armando and Katz, Randy H. and Patterson, David A.},
    Title = {Statistics-Driven Workload Modeling for the Cloud},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2009},
    Month = {Nov},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-160.html},
    Number = {UCB/EECS-2009-160},
    Abstract = {A recent trend for data-intensive computations is to use pay-as-you-go execution environments that scale transparently to the user. However, providers of such environments must tackle the challenge of configuring their system to provide maximal performance while minimizing the cost of resources used. In this paper, we use statistical models to predict resource requirements for Cloud computing applications. Such a prediction framework can guide system design and deployment decisions such as scale, scheduling, and capacity. In addition, we present initial design of a workload generator that can be used to evaluate alternative configurations without the overhead of reproducing a real workload. This paper focuses on statistical modeling and its application to data-intensive workloads.}
}

EndNote citation:

%0 Report
%A Ganapathi, Archana Sulochana
%A Chen, Yanpei
%A Fox, Armando
%A Katz, Randy H.
%A Patterson, David A.
%T Statistics-Driven Workload Modeling for the Cloud
%I EECS Department, University of California, Berkeley
%D 2009
%8 November 30
%@ UCB/EECS-2009-160
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-160.html
%F Ganapathi:EECS-2009-160