Evaluating the Security of Machine Learning Algorithms

Marco Antonio Barreno

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2008-63

May 20, 2008

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-63.pdf

Two far-reaching trends in computing have grown in significance in recent years. First, statistical machine learning has entered the mainstream as a broadly useful tool set for building applications. Second, the need to protect systems against malicious adversaries continues to increase across computing applications. The growing intersection of these trends compels us to investigate how well machine learning performs under adversarial conditions. When a learning algorithm succeeds in adversarial conditions, it is an algorithm for secure learning. The crucial task is to evaluate the resilience of learning systems and determine whether they satisfy requirements for secure learning. In this thesis, we show that the space of attacks against machine learning has a structure that we can use to build secure learning systems.

This thesis makes three high-level contributions. First, we develop a framework for analyzing attacks against machine learning systems. We present a taxonomy that describes the space of attacks against learning systems, and we model such attacks as a cost-sensitive game between the attacker and the defender. We survey attacks in the literature and describe them in terms of our taxonomy. Second, we develop two concrete attacks against a popular machine learning spam filter and present experimental results confirming their effectiveness. These attacks demonstrate that real systems using machine learning are vulnerable to compromise. Third, we explore defenses against attacks with both a high-level discussion of defenses within our taxonomy and a multi-level defense against attacks in the domain of virus detection. Using both global and local information, our virus defense successfully captures many viruses designed to evade detection. Our framework, exploration of attacks, and discussion of defenses provides a strong foundation for constructing secure learning systems.

Advisors: Doug Tygar

BibTeX citation:

@phdthesis{Barreno:EECS-2008-63,
    Author= {Barreno, Marco Antonio},
    Title= {Evaluating the Security of Machine Learning Algorithms},
    School= {EECS Department, University of California, Berkeley},
    Year= {2008},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-63.html},
    Number= {UCB/EECS-2008-63},
    Abstract= {<p>
Two far-reaching trends in computing have grown in significance in
recent years.  First, statistical machine learning has entered the
mainstream as a broadly useful tool set for building applications.
Second, the need to protect systems against malicious adversaries
continues to increase across computing applications.  The growing
intersection of these trends compels us to investigate how well
machine learning performs under adversarial conditions.  When a learning
algorithm succeeds in adversarial conditions, it is an algorithm for
<i>secure learning.</i>  The crucial task is to evaluate the
resilience of learning systems and determine whether they satisfy
requirements for secure learning.  In this thesis, we show that
<i>the space of attacks against machine learning has a structure
that we can use to build secure learning systems.</i>
</p>

<p>
This thesis makes three high-level contributions.  First,
we develop a framework for analyzing attacks against
machine learning systems.  We present a taxonomy that describes the
space of attacks against learning systems, and we model such attacks
as a cost-sensitive game between the attacker and the defender.  We
survey attacks in the literature and describe them in terms of our
taxonomy.  Second, we develop two concrete attacks against a popular
machine learning spam filter and present experimental results
confirming their effectiveness.  These attacks demonstrate that real
systems using machine learning are vulnerable to compromise.  Third,
we explore defenses against attacks with both a high-level discussion
of defenses within our taxonomy and
a multi-level defense against attacks in the domain of
virus detection.  Using both global and local
information, our virus defense successfully captures many
viruses designed to evade detection.  Our framework, exploration of
attacks, and discussion of defenses provides a strong foundation for
constructing secure learning systems.
</p>},
}

EndNote citation:

%0 Thesis
%A Barreno, Marco Antonio 
%T Evaluating the Security of Machine Learning Algorithms
%I EECS Department, University of California, Berkeley
%D 2008
%8 May 20
%@ UCB/EECS-2008-63
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-63.html
%F Barreno:EECS-2008-63