Evaluating the Security of Machine Learning Algorithms
Marco Antonio Barreno
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2008-63
May 20, 2008
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-63.pdf
<p> Two far-reaching trends in computing have grown in significance in recent years. First, statistical machine learning has entered the mainstream as a broadly useful tool set for building applications. Second, the need to protect systems against malicious adversaries continues to increase across computing applications. The growing intersection of these trends compels us to investigate how well machine learning performs under adversarial conditions. When a learning algorithm succeeds in adversarial conditions, it is an algorithm for <i>secure learning.</i> The crucial task is to evaluate the resilience of learning systems and determine whether they satisfy requirements for secure learning. In this thesis, we show that <i>the space of attacks against machine learning has a structure that we can use to build secure learning systems.</i> </p>
<p> This thesis makes three high-level contributions. First, we develop a framework for analyzing attacks against machine learning systems. We present a taxonomy that describes the space of attacks against learning systems, and we model such attacks as a cost-sensitive game between the attacker and the defender. We survey attacks in the literature and describe them in terms of our taxonomy. Second, we develop two concrete attacks against a popular machine learning spam filter and present experimental results confirming their effectiveness. These attacks demonstrate that real systems using machine learning are vulnerable to compromise. Third, we explore defenses against attacks with both a high-level discussion of defenses within our taxonomy and a multi-level defense against attacks in the domain of virus detection. Using both global and local information, our virus defense successfully captures many viruses designed to evade detection. Our framework, exploration of attacks, and discussion of defenses provides a strong foundation for constructing secure learning systems. </p>
Advisors: Doug Tygar
BibTeX citation:
@phdthesis{Barreno:EECS-2008-63, Author= {Barreno, Marco Antonio}, Title= {Evaluating the Security of Machine Learning Algorithms}, School= {EECS Department, University of California, Berkeley}, Year= {2008}, Month= {May}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-63.html}, Number= {UCB/EECS-2008-63}, Abstract= {<p> Two far-reaching trends in computing have grown in significance in recent years. First, statistical machine learning has entered the mainstream as a broadly useful tool set for building applications. Second, the need to protect systems against malicious adversaries continues to increase across computing applications. The growing intersection of these trends compels us to investigate how well machine learning performs under adversarial conditions. When a learning algorithm succeeds in adversarial conditions, it is an algorithm for <i>secure learning.</i> The crucial task is to evaluate the resilience of learning systems and determine whether they satisfy requirements for secure learning. In this thesis, we show that <i>the space of attacks against machine learning has a structure that we can use to build secure learning systems.</i> </p> <p> This thesis makes three high-level contributions. First, we develop a framework for analyzing attacks against machine learning systems. We present a taxonomy that describes the space of attacks against learning systems, and we model such attacks as a cost-sensitive game between the attacker and the defender. We survey attacks in the literature and describe them in terms of our taxonomy. Second, we develop two concrete attacks against a popular machine learning spam filter and present experimental results confirming their effectiveness. These attacks demonstrate that real systems using machine learning are vulnerable to compromise. Third, we explore defenses against attacks with both a high-level discussion of defenses within our taxonomy and a multi-level defense against attacks in the domain of virus detection. Using both global and local information, our virus defense successfully captures many viruses designed to evade detection. Our framework, exploration of attacks, and discussion of defenses provides a strong foundation for constructing secure learning systems. </p>}, }
EndNote citation:
%0 Thesis %A Barreno, Marco Antonio %T Evaluating the Security of Machine Learning Algorithms %I EECS Department, University of California, Berkeley %D 2008 %8 May 20 %@ UCB/EECS-2008-63 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-63.html %F Barreno:EECS-2008-63