Evan Sparks

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2014-122

May 20, 2014

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-122.pdf

Model search is a crucial component of data analytics pipelines, and this laborious process of choosing an appropriate learning algorithm and tuning its parameters remains a major obstacle in the widespread adoption of machine learning techniques. Recent efforts aiming to automate this process have assumed model training itself to be a black-box, thus limiting the effectiveness of such approaches on large-scale problems. In this work, we build upon these recent efforts. By inspecting the inner workings of model training and framing model search as bandit-like resource allocation problem, we present an integrated distributed system for model search that targets large-scale learning applications. We study the impact of our approach on a variety of datasets and demonstrate that our system, named GHOSTFACE, solves the model search problem with comparable accuracy as basic strategies but an order of magnitude faster. We further demonstrate that GHOSTFACE can scale to models trained on terabytes of data across hundreds of machines.

Advisors: Michael Franklin


BibTeX citation:

@mastersthesis{Sparks:EECS-2014-122,
    Author= {Sparks, Evan},
    Title= {Scalable Automated Model Search},
    School= {EECS Department, University of California, Berkeley},
    Year= {2014},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-122.html},
    Number= {UCB/EECS-2014-122},
    Abstract= {Model search is a crucial component of data analytics pipelines, and this laborious process of choosing an appropriate learning algorithm and tuning its parameters remains a major obstacle in the widespread adoption of machine learning techniques. Recent efforts aiming to automate this process have assumed model training itself to be a black-box, thus limiting the effectiveness of such approaches on large-scale problems. In this work, we build upon these recent efforts. By inspecting the inner workings of model training and framing model search as bandit-like resource allocation problem, we present an integrated distributed system for model search that targets large-scale learning applications. We study the impact of our approach on a variety of datasets and demonstrate that our system, named GHOSTFACE, solves the model search problem with comparable accuracy as basic strategies but an order of magnitude faster. We further demonstrate that GHOSTFACE can scale to models trained on terabytes of data across hundreds of machines.},
}

EndNote citation:

%0 Thesis
%A Sparks, Evan 
%T Scalable Automated Model Search
%I EECS Department, University of California, Berkeley
%D 2014
%8 May 20
%@ UCB/EECS-2014-122
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-122.html
%F Sparks:EECS-2014-122