Automating Datacenter Operations Using Machine Learning

Peter Bodik

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2010-114
August 16, 2010

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-114.pdf

Today’s Internet datacenters run many complex and large-scale Web applications that are very difficult to manage. The main challenges are understanding user workloads and application performance, and quickly identifying and resolving performance problems. Statistical Machine Learning (SML) provides a methodology for quickly processing the large quantities of monitoring data generated by these applications, finding repeating patterns in their behavior, and building accurate models of their performance.

This dissertation argues that SML is a useful tool for simplifying and automating datacenter operations and demonstrates application of SML to three important problems in this area: characterization and synthesis of workload spikes, dynamic resource allocation in stateful systems, and quick and accurate identification of recurring performance problems.

Advisor: David A. Patterson, Michael Jordan and Armando Fox


BibTeX citation:

@phdthesis{Bodik:EECS-2010-114,
    Author = {Bodik, Peter},
    Title = {Automating Datacenter Operations Using Machine Learning},
    School = {EECS Department, University of California, Berkeley},
    Year = {2010},
    Month = {Aug},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-114.html},
    Number = {UCB/EECS-2010-114},
    Abstract = {Today’s Internet datacenters run many complex and large-scale Web applications that are very difficult to manage. The main challenges are understanding user workloads and application performance, and quickly identifying and resolving performance problems. Statistical Machine Learning (SML) provides a methodology for quickly processing the large quantities of monitoring data generated by these applications, finding repeating patterns in their behavior, and building accurate models of their performance. 

This dissertation argues that SML is a useful tool for simplifying and automating datacenter operations and demonstrates application of SML to three important problems in this area: characterization and synthesis of workload spikes, dynamic resource allocation in stateful systems, and quick and accurate identification of recurring performance problems.}
}

EndNote citation:

%0 Thesis
%A Bodik, Peter
%T Automating Datacenter Operations Using Machine Learning
%I EECS Department, University of California, Berkeley
%D 2010
%8 August 16
%@ UCB/EECS-2010-114
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-114.html
%F Bodik:EECS-2010-114