Detecting Backdoored Neural Networks with Structured Adversarial Attacks

Charles Yang

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-90

May 14, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-90.pdf

Deep Learning models are becoming increasingly enmeshed into our digital infrastructure, unlocking our phones and powering our social media feeds. It is critical to understand the security vulnerabilities of these black-box models before they are deployed in safety-critical applications, such as self-driving cars and biometric authentication. In particular, recent literature has demonstrated the ability to install backdoors into deep learning models. This thesis uses structured optimization constraints to find adversarial attacks in order to determine if a model is backdoored. We use the TrojAI dataset to benchmark our approach and achieve 0.83AUC on the challenging round 3 dataset.

Advisors: Michael William Mahoney

BibTeX citation:

@mastersthesis{Yang:EECS-2021-90,
    Author= {Yang, Charles},
    Title= {Detecting Backdoored Neural Networks with Structured Adversarial Attacks},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-90.html},
    Number= {UCB/EECS-2021-90},
    Abstract= {
Deep Learning models are becoming increasingly enmeshed into our digital infrastructure, unlocking our phones and powering our social media feeds. It is critical to understand the security vulnerabilities of these black-box models before they are deployed in safety-critical applications, such as self-driving cars and biometric authentication. In particular, recent literature has demonstrated the ability to install backdoors into deep learning models. This thesis uses structured optimization constraints to find adversarial attacks in order to determine if a model is backdoored. We use the TrojAI dataset to benchmark our approach and achieve 0.83AUC on the challenging round 3 dataset.},
}

EndNote citation:

%0 Thesis
%A Yang, Charles 
%T Detecting Backdoored Neural Networks with Structured Adversarial Attacks
%I EECS Department, University of California, Berkeley
%D 2021
%8 May 14
%@ UCB/EECS-2021-90
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-90.html
%F Yang:EECS-2021-90