Jason Zhou

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-81

May 14, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-81.pdf

The rapid proliferation of unmanned aerial vehicles (UAVs) in both commercial and consumer applications in recent years raises serious concerns of public security, as the versatility of UAVs allow the platform to be easily adapted for malicious activities by adversarial actors. While interdiction methods exist, they are either indiscriminate or are unable to disable a large swarm of drones. Recent work in wireless communications, microelectromechanical systems, fabrication, and multi-agent reinforcement learning make UAV-based counter-UAV systems increasingly feasible - that is, defense systems consisting of autonomous drone swarms interdicting intruder drone swarms. Such a system is desirable in that it can conceivably produce a highly versatile, adaptable, and targeted response while still retaining an aerial presence.

We progress towards such a system through deep reinforcement learning in two distinct domains, which can be broadly described as 1-vs-1 and N-vs-N autonomous adversarial drone defense. In the former scenario, we learn reasonable, emergent drone dogfighting policies using soft actor-critic through 1-vs-1 competitive self-play in a novel, high-fidelity drone combat and interdiction simulation environment, and demonstrate continual, successful learning throughout multiple generations. In the latter case, we formalize permutation and quantity invariances in learned state representation methods for downstream swarm control policies. By using neural network architectures that respect these invariances in the process of embedding messages received from, and observations made of members of homogeneous swarms (be it friendly or adversarial), we enable parameter-sharing proximal policy optimization to learn effective decentralized perimeter defense policies against adversarial drone swarms, while models using conventional state representation techniques fail to converge to any effective policies. Two such possible embedding architectures are presented: adversarial mean embeddings and adversarial attentional embeddings.

Advisors: Kristofer Pister


BibTeX citation:

@mastersthesis{Zhou:EECS-2021-81,
    Author= {Zhou, Jason},
    Title= {Adversarial Swarm Defense with Decentralized Swarms},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-81.html},
    Number= {UCB/EECS-2021-81},
    Abstract= {The rapid proliferation of unmanned aerial vehicles (UAVs) in both commercial and consumer applications in recent years raises serious concerns of public security, as the versatility of UAVs allow the platform to be easily adapted for malicious activities by adversarial actors. While interdiction methods exist, they are either indiscriminate or are unable to disable a large swarm of drones. Recent work in wireless communications, microelectromechanical systems, fabrication, and multi-agent reinforcement learning make UAV-based counter-UAV systems increasingly feasible - that is, defense systems consisting of autonomous drone swarms interdicting intruder drone swarms. Such a system is desirable in that it can conceivably produce a highly versatile, adaptable, and targeted response while still retaining an aerial presence.

We progress towards such a system through deep reinforcement learning in two distinct domains, which can be broadly described as 1-vs-1 and N-vs-N autonomous adversarial drone defense. In the former scenario, we learn reasonable, emergent drone dogfighting policies using soft actor-critic through 1-vs-1 competitive self-play in a novel, high-fidelity drone combat and interdiction simulation environment, and demonstrate continual, successful learning throughout multiple generations. In the latter case, we formalize permutation and quantity invariances in learned state representation methods for downstream swarm control policies. By using neural network architectures that respect these invariances in the process of embedding messages received from, and observations made of members of homogeneous swarms (be it friendly or adversarial), we enable parameter-sharing proximal policy optimization to learn effective decentralized perimeter defense policies against adversarial drone swarms, while models using conventional state representation techniques fail to converge to any effective policies. Two such possible embedding architectures are presented: adversarial mean embeddings and adversarial attentional embeddings.},
}

EndNote citation:

%0 Thesis
%A Zhou, Jason 
%T Adversarial Swarm Defense with Decentralized Swarms
%I EECS Department, University of California, Berkeley
%D 2021
%8 May 14
%@ UCB/EECS-2021-81
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-81.html
%F Zhou:EECS-2021-81