Robust deep-reinforcement learning policies for mixed-autonomy traffic

Kathy Jang

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-252

December 8, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-252.pdf

Using deep reinforcement learning, we successfully train a set of two autonomous vehicles to lead a fleet of vehicles onto a roundabout and then transfer this policy from simulation to a scaled city without fine-tuning. We use Flow, a library for deep reinforcement learning in microsimulators, to train two policies, (1) a policy with noise injected into the state and action space and (2) a policy without any injected noise. In simulation, the autonomous vehicles learn an emergent metering behavior for both policies which allows smooth merging. We then directly transfer this policy without any tuning to the University of Delaware's Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles. We characterize the performance of the transferred policy based on how thoroughly the ramp metering behavior is captured in UDSSC. We show that the noise-free policy results in severe slowdowns and only, occasionally, it exhibits acceptable metering behavior. On the other hand, the noise-injected policy consistently performs an acceptable metering behavior, implying that the noise eventually aids with the zero-shot policy transfer. Finally, the transferred, noise-injected policy leads to a 5% reduction of average travel time and a reduction of 22% in maximum travel time in the UDSSC. Videos of the proposed self-learning controllers can be found at [https://sites.google.com/view/iccps-policy-transfer](https://sites.google.com/view/iccps-policy-transfer).

We also demonstrate a zero-shot transfer of an autonomous driving policy from simulation to University of Delaware's scaled smart city with adversarial multi-agent reinforcement learning, in which an adversary attempts to decrease the net reward by perturbing both the inputs and outputs of the autonomous vehicles during training. We train the autonomous vehicles to coordinate with each other while crossing a roundabout in the presence of an adversary in simulation. The adversarial policy successfully reproduces the simulated behavior and incidentally outperforms, in terms of travel time, both a human-driving baseline and adversary-free trained policies. Finally, we demonstrate that the addition of adversarial training considerably improves the performance of the policies after transfer to the real world compared to Gaussian noise injection.

Advisors: Alexandre Bayen

BibTeX citation:

@mastersthesis{Jang:EECS-2021-252,
    Author= {Jang, Kathy},
    Title= {Robust deep-reinforcement learning policies for mixed-autonomy traffic},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-252.html},
    Number= {UCB/EECS-2021-252},
    Abstract= {Using deep reinforcement learning, we successfully train a set of two autonomous vehicles to lead a fleet of vehicles onto a roundabout and then transfer this policy from simulation to a scaled city without fine-tuning. We use Flow, a library for deep reinforcement learning in microsimulators, to train two policies, (1) a policy with noise injected into the state and action space and (2) a policy without any injected noise. In simulation, the autonomous vehicles learn an emergent metering behavior for both policies which allows smooth merging. We then directly transfer this policy without any tuning to the University of Delaware's Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles. We characterize the performance of the transferred policy based on how thoroughly the ramp metering behavior is captured in UDSSC. We show that the noise-free policy results in severe slowdowns and only, occasionally, it exhibits acceptable metering behavior. On the other hand, the noise-injected policy consistently performs an acceptable metering behavior, implying that the noise eventually aids with the zero-shot policy transfer. Finally, the transferred, noise-injected policy leads to a 5% reduction of average travel time and a reduction of 22% in maximum travel time in the UDSSC. Videos of the proposed self-learning controllers can be found at [https://sites.google.com/view/iccps-policy-transfer](https://sites.google.com/view/iccps-policy-transfer).

We also demonstrate a zero-shot transfer of an autonomous driving policy from simulation to University of Delaware's scaled smart city with adversarial multi-agent reinforcement learning, in which an adversary attempts to decrease the net reward by perturbing both the inputs and outputs of the autonomous vehicles during training. We train the autonomous vehicles to coordinate with each other while crossing a roundabout in the presence of an adversary in simulation. The adversarial policy successfully reproduces the simulated behavior and incidentally outperforms, in terms of travel time, both a human-driving baseline and adversary-free trained policies. Finally, we demonstrate that the addition of adversarial training considerably improves the performance of the policies after transfer to the real world compared to Gaussian noise injection.},
}

EndNote citation:

%0 Thesis
%A Jang, Kathy 
%T Robust deep-reinforcement learning policies for mixed-autonomy traffic
%I EECS Department, University of California, Berkeley
%D 2021
%8 December 8
%@ UCB/EECS-2021-252
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-252.html
%F Jang:EECS-2021-252