On Training Robust Policies for Flow Smoothing

Kanaad Parvate

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2020-197
December 1, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-197.pdf

Flow smoothing, or the use specially designed controllers for automated vehicles to damp ``stop-and-go'' waves, has been shown to mitigate inefficiencies in human driving behavior, reducing system-wide fuel usage. Using the software framework, Flow, researchers can specify traffic scenarios and train flow smoothing controllers using deep reinforcement learning (RL). RL is a powerful technique for designing effective controllers, however, it tends to create controllers that overfit to their simulator: the resultant controllers are prone to failure when transferred to new settings.

We first present D-MALT: Diverse Multi-Adversarial Learning for Transfer. We explore the hypothesis that perturbing the dynamics of an environment with a diverse pool of adversaries can combat said failure by ensuring that the controller sees an informative set of possible perturbations. We demonstrate that the simple technique of rewarding adversaries for reaching different reward ranges is sufficient for constructing a set of highly diverse adversaries. Adversaries with the diversity reward find potential controller failure modes whereas we find that domain randomization, a standard transfer technique, fails to find these modes as the problem dimensionality increases. We show, in a variety of environments, that training against the diverse adversaries leads to better transfer than playing against a single adversary or using domain randomization.

Secondly, we detail a set of transfer tests and evaluations meant to assess the effectiveness of flow smoothing controllers and evaluate their performance in a variety of traffic scenarios. We present two such examples: varying the penetration rate of the AVs in the system and varying the aggressiveness of human drivers. As new flow smoothing controllers are developed in Flow, we expect D-MALT and other techniques for developing robustness to be paramount for successful transfer to physical settings. Analyzing the transfer performance of these controllers will be crucial to their ultimate deployment.

Advisor: Alexandre Bayen


BibTeX citation:

@mastersthesis{Parvate:EECS-2020-197,
    Author = {Parvate, Kanaad},
    Editor = {Bayen, Alexandre and Abbeel, Pieter},
    Title = {On Training Robust Policies for Flow Smoothing},
    School = {EECS Department, University of California, Berkeley},
    Year = {2020},
    Month = {Dec},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-197.html},
    Number = {UCB/EECS-2020-197},
    Abstract = {Flow smoothing, or the use specially designed controllers for automated vehicles to damp ``stop-and-go'' waves, has been shown to mitigate inefficiencies in human driving behavior, reducing system-wide fuel usage. Using the software framework, Flow, researchers can specify traffic scenarios and train flow smoothing controllers using deep reinforcement learning (RL). RL is a powerful technique for designing effective controllers, however, it tends to create controllers that overfit to their simulator: the resultant controllers are prone to failure when transferred to new settings.  

We first present D-MALT: Diverse Multi-Adversarial Learning for Transfer. We explore the hypothesis that perturbing the dynamics of an environment with a diverse pool of adversaries can combat said failure by ensuring that the controller sees an informative set of possible perturbations. We demonstrate that the simple technique of rewarding adversaries for reaching different reward ranges is sufficient for constructing a set of highly diverse adversaries. Adversaries with the diversity reward find potential controller failure modes whereas we find that domain randomization, a standard transfer technique, fails to find these modes as the problem dimensionality increases. We show, in a variety of environments, that training against the diverse adversaries leads to better transfer than playing against a single adversary or using domain randomization.

Secondly, we detail a set of transfer tests and evaluations meant to assess the effectiveness of flow smoothing controllers and evaluate their performance in a variety of traffic scenarios.
We present two such examples: varying the penetration rate of the AVs in the system and varying the aggressiveness of human drivers. As new flow smoothing controllers are developed in Flow, we expect D-MALT and other techniques for developing robustness to be paramount for successful transfer to physical settings.
Analyzing the transfer performance of these controllers will be crucial to their ultimate deployment.}
}

EndNote citation:

%0 Thesis
%A Parvate, Kanaad
%E Bayen, Alexandre
%E Abbeel, Pieter
%T On Training Robust Policies for Flow Smoothing
%I EECS Department, University of California, Berkeley
%D 2020
%8 December 1
%@ UCB/EECS-2020-197
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-197.html
%F Parvate:EECS-2020-197