Predicting and Optimizing System Utilization and Performance via Statistical Machine Learning

Archana Sulochana Ganapathi

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2009-181
December 17, 2009

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-181.pdf

The complexity of modern computer systems makes performance modeling an invaluable resource for guiding crucial decisions such as workload management, configuration management, and resource provisioning. With continually evolving systems, it is difficult to obtain ground truth about system behavior. Moreover, system management policies must adapt to changes in workload and configuration to continue making efficient decisions. Thus, we require data-driven modeling techniques that auto-extract relationships between a system’s input workload, its configuration parameters, and consequent performance.

This dissertation argues that statistical machine learning (SML) techniques are a powerful asset to system performance modeling. We present an SML-based methodology that extracts correlations between a workload’s pre-execution characteristics or configuration parameters, and post-execution performance observations. We leverage these correlations for performance prediction and optimization.

We present three success stories that validate the usefulness of our methodology on storage and compute based parallel systems. In all three scenarios, we outperform state of the art alternatives. Our results strongly suggest the use of SML-based performance modeling to improve the quality of system management decisions.

Advisor: David A. Patterson


BibTeX citation:

@phdthesis{Ganapathi:EECS-2009-181,
    Author = {Ganapathi, Archana Sulochana},
    Title = {Predicting and Optimizing System Utilization and Performance via Statistical Machine Learning},
    School = {EECS Department, University of California, Berkeley},
    Year = {2009},
    Month = {Dec},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-181.html},
    Number = {UCB/EECS-2009-181},
    Abstract = {The complexity of modern computer systems makes performance modeling an invaluable resource for guiding crucial decisions such as workload management, configuration management, and resource provisioning. With continually evolving systems, it is difficult to obtain ground truth about system behavior. Moreover, system management policies must adapt to changes in workload and configuration to continue making efficient decisions. Thus, we require data-driven modeling techniques that auto-extract relationships between a system’s input workload, its configuration parameters, and consequent performance. 

This dissertation argues that statistical machine learning (SML) techniques are a powerful asset to system performance modeling. We present an SML-based methodology that extracts correlations between a workload’s pre-execution characteristics or configuration parameters, and post-execution performance observations. We leverage these correlations for performance prediction and optimization. 

We present three success stories that validate the usefulness of our methodology on storage and compute based parallel systems. In all three scenarios, we outperform state of the art alternatives. Our results strongly suggest the use of SML-based performance modeling to improve the quality of system management decisions.}
}

EndNote citation:

%0 Thesis
%A Ganapathi, Archana Sulochana
%T Predicting and Optimizing System Utilization and Performance via Statistical Machine Learning
%I EECS Department, University of California, Berkeley
%D 2009
%8 December 17
%@ UCB/EECS-2009-181
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-181.html
%F Ganapathi:EECS-2009-181