Adapting with Latent Variables in Model-Based Reinforcement Learning for Quadcopter Flight

Rachel Li

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2020-77

May 28, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-77.pdf

Real-world robots will require adaptation to a wide variety of underlying dynamics functions. For example, an autonomous delivery drone would need to fly with different payloads or environmental conditions that modify the physics of flight, and a land robot might encounter varying terrains during its runtime. This paper focuses on developing a single sample-efficient policy that adapts to time-varying dynamics, applied to a quadcopter in simulation that carries a payload of varying weight and a real mini-quadcopter carrying a variable string length hanging payload. From the sample-efficient PETS policy, our approach learns a dynamics model from data and learns a context variable to represent a range of dynamics. At test time, we infer the context that best explains recent data. We evaluate this method both on a simulated quadcopter and a real quadcopter, the Ryze Tello. For both scenarios, we illustrate the performance improvements of our method in adapting to different dynamics compared to traditional model-based techniques. Supplemental materials and videos can be found at our website: https://sites.google.com/view/meta-rl-for-flight.

Advisors: Sergey Levine

BibTeX citation:

@mastersthesis{Li:EECS-2020-77,
    Author= {Li, Rachel},
    Title= {Adapting with Latent Variables in Model-Based Reinforcement Learning for Quadcopter Flight},
    School= {EECS Department, University of California, Berkeley},
    Year= {2020},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-77.html},
    Number= {UCB/EECS-2020-77},
    Abstract= {Real-world robots will require adaptation to a wide variety of underlying dynamics functions. For example, an autonomous delivery drone would need to fly with different payloads or environmental conditions that modify the physics of flight, and a land robot might encounter varying terrains during its runtime. This paper focuses on developing a single sample-efficient policy that adapts to time-varying dynamics, applied to a quadcopter in simulation that carries a payload of varying weight and a real mini-quadcopter carrying a variable string length hanging payload. From the sample-efficient PETS policy, our approach learns a dynamics model from data and learns a context variable to represent a range of dynamics. At test time, we infer the context that best explains recent data. We evaluate this method both on a simulated quadcopter and a real quadcopter, the Ryze Tello. For both scenarios, we illustrate the performance improvements of our method in adapting to different dynamics compared to traditional model-based techniques. Supplemental materials and videos can be found at our website: https://sites.google.com/view/meta-rl-for-flight.},
}

EndNote citation:

%0 Thesis
%A Li, Rachel 
%T Adapting with Latent Variables in Model-Based Reinforcement Learning for Quadcopter Flight
%I EECS Department, University of California, Berkeley
%D 2020
%8 May 28
%@ UCB/EECS-2020-77
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-77.html
%F Li:EECS-2020-77