Neurotoxin: Durable Backdoors in Federated Learning

Linyue Song and Zhengming Zhang and Ashwinee Panda and Yaoqing Yang and Michael Mahoney and Joseph Gonzalez and Prateek Prateek Mittal

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-89

May 13, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-89.pdf

Due to their decentralized nature, federated learning (FL) systems have an inherent vulnera- bility during their training to adversarial backdoor attacks. In this type of attack, the goal of the attacker is to use poisoned updates to implant so-called backdoors into the learned model such that, at test time, the model’s outputs can be fixed to a given target for certain inputs. (As a simple example, if a user types “people from New York” into a mobile keyboard app that uses a backdoored next word prediction model, then the model could auto-complete the sentence to “people from New York are rude”). Prior work has shown that backdoors can be inserted into FL models, but these backdoors are often not durable, i.e., they do not remain in the model after the attacker stops uploading poisoned updates. Thus, since training typically continues progressively in production FL systems, an inserted backdoor may not survive until deployment. Here, we propose Neurotoxin, a simple one-line modification to existing backdoor attacks that acts by attacking parameters that are changed less in magnitude during training. We conduct an exhaustive evaluation across ten natural language processing and computer vision tasks, and we find that we can double the durability of state-of-the-art backdoors.

BibTeX citation:

@mastersthesis{Song:EECS-2022-89,
    Author= {Song, Linyue and Zhang, Zhengming and Panda, Ashwinee and Yang, Yaoqing and Mahoney, Michael and Gonzalez, Joseph and Prateek Mittal, Prateek},
    Editor= {Ramchandran, Kannan and Friedland, Gerald},
    Title= {Neurotoxin: Durable Backdoors in Federated Learning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-89.html},
    Number= {UCB/EECS-2022-89},
    Abstract= {Due to their decentralized nature, federated learning (FL) systems have an inherent vulnera- bility during their training to adversarial backdoor attacks. In this type of attack, the goal of the attacker is to use poisoned updates to implant so-called backdoors into the learned model such that, at test time, the model’s outputs can be fixed to a given target for certain inputs. (As a simple example, if a user types “people from New York” into a mobile keyboard app that uses a backdoored next word prediction model, then the model could auto-complete the sentence to “people from New York are rude”). Prior work has shown that backdoors can be inserted into FL models, but these backdoors are often not durable, i.e., they do not remain in the model after the attacker stops uploading poisoned updates. Thus, since training typically continues progressively in production FL systems, an inserted backdoor may not survive until deployment. Here, we propose Neurotoxin, a simple one-line modification to existing backdoor attacks that acts by attacking parameters that are changed less in magnitude during training. We conduct an exhaustive evaluation across ten natural language processing and computer vision tasks, and we find that we can double the durability of state-of-the-art backdoors.},
}

EndNote citation:

%0 Thesis
%A Song, Linyue 
%A Zhang, Zhengming 
%A Panda, Ashwinee 
%A Yang, Yaoqing 
%A Mahoney, Michael 
%A Gonzalez, Joseph 
%A Prateek Mittal, Prateek 
%E Ramchandran, Kannan 
%E Friedland, Gerald 
%T Neurotoxin: Durable Backdoors in Federated Learning
%I EECS Department, University of California, Berkeley
%D 2022
%8 May 13
%@ UCB/EECS-2022-89
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-89.html
%F Song:EECS-2022-89