Xingyu Lu and Stas Tiomkin and Pieter Abbeel

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2020-56

May 23, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-56.pdf

Despite the significant progress of deep reinforcement learning (RL) in solving sequential decision making problems, agents trained through RL often over-specialize to training environments and struggle to adapt to new, unseen circumstances. In this work, our primary contribution is to propose an information theoretic regularization objective and an annealing-based optimization method to promote better generalization ability in RL agents.

Training an agent that is resilient to unseen changes in environments helps combat "diversity": a robot trained in an ideal setting may be required to perform in more adversarial circumstances, such as increased obstacles, darker lighting and smoother/rougher surfaces; to prevail in these unseen environments, agent must be capable of adapting to unseen dynamics.

Our work tackles the generalization problem from an information theoretic perspective. We hypothesize that the expressiveness of deep neural networks (DNN) may cause memorization of inputs from training environments, preventing the agent from learning the general dynamics. We address this issue by adding communication constraints between observations and internal representations in the form of an information bottleneck (IB). We suggest an optimization scheme, based on annealing, to obtain a family of solutions parameterized by the regularization weight.

In the first part of the thesis we focus our attention on various maze environments, which have simple dynamics and can be naturally randomized by altering their wall and goal placements. Through experiments in these environments, we study 1) the learning behavior of an agent directly through IB; 2) the benefits of the proposed annealing scheme; 3) the characteristics of the output of the IB; 4) the generalization benefits of a tight IB. In the second part of the thesis we direct our attention to control environments with simulated robotic agents. We demonstrate the generalization benefits of the IB over unseen goals and dynamics, comparing it with other regularization methods and state-of-the-art DRL baselines.

To the best of our knowledge, our work is the first to study the benefits of information bottleneck in DRL for general domain randomization. We tackle the joint-optimization problem by proposing an annealing scheme, and study a variety of domain randomization settings including varying maze layouts, introducing unseen goals, and changing robot dynamics.

Furthermore, this work opens doors for the systematic study of generalization from training conditions to extremely different testing settings, focusing on the established connections between information theory and machine learning, in particular through the lens of state compression and estimation.

Advisors: Pieter Abbeel


BibTeX citation:

@mastersthesis{Lu:EECS-2020-56,
    Author= {Lu, Xingyu and Tiomkin, Stas and Abbeel, Pieter},
    Title= {Generalization via Information Bottleneck in Deep Reinforcement Learning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2020},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-56.html},
    Number= {UCB/EECS-2020-56},
    Abstract= {Despite the significant progress of deep reinforcement learning (RL) in solving sequential decision making problems, agents trained through RL often over-specialize to training environments and struggle to adapt to new, unseen circumstances. In this work, our primary contribution is to propose an information theoretic regularization objective and an annealing-based optimization method to promote better generalization ability in RL agents.

Training an agent that is resilient to unseen changes in environments helps combat "diversity": a robot trained in an ideal setting may be required to perform in more adversarial circumstances, such as increased obstacles, darker lighting and smoother/rougher surfaces; to prevail in these unseen environments, agent must be capable of adapting to unseen dynamics. 

Our work tackles the generalization problem from an information theoretic perspective. We hypothesize that the expressiveness of deep neural networks (DNN) may cause memorization of inputs from training environments, preventing the agent from learning the general dynamics. We address this issue by adding communication constraints between observations and internal representations in the form of an information bottleneck (IB). We suggest an optimization scheme, based on annealing, to obtain a family of solutions parameterized by the regularization weight.

In the first part of the thesis we focus our attention on various maze environments, which have simple dynamics and can be naturally randomized by altering their wall and goal placements. Through experiments in these environments, we study 1) the learning behavior of an agent directly through IB; 2) the benefits of the proposed annealing scheme; 3) the characteristics of the output of the IB; 4) the generalization benefits of a tight IB. In the second part of the thesis we direct our attention to control environments with simulated robotic agents. We demonstrate the generalization benefits of the IB over unseen goals and dynamics, comparing it with other regularization methods and state-of-the-art DRL baselines. 

To the best of our knowledge, our work is the first to study the benefits of information bottleneck in DRL for general domain randomization. We tackle the joint-optimization problem by proposing an annealing scheme, and study a variety of domain randomization settings including varying maze layouts, introducing unseen goals, and changing robot dynamics.

Furthermore, this work opens doors for the systematic study of generalization from training conditions to extremely different testing settings, focusing on the established connections between information theory and machine learning, in particular through the lens of state compression and estimation.},
}

EndNote citation:

%0 Thesis
%A Lu, Xingyu 
%A Tiomkin, Stas 
%A Abbeel, Pieter 
%T Generalization via Information Bottleneck in Deep Reinforcement Learning
%I EECS Department, University of California, Berkeley
%D 2020
%8 May 23
%@ UCB/EECS-2020-56
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-56.html
%F Lu:EECS-2020-56