Adapting with Latent Variables in Model-Based Reinforcement Learning for Quadcopter Flight
Rachel Li
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2020-77
May 28, 2020
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-77.pdf
Real-world robots will require adaptation to a wide variety of underlying dynamics functions. For example, an autonomous delivery drone would need to fly with different payloads or environmental conditions that modify the physics of flight, and a land robot might encounter varying terrains during its runtime. This paper focuses on developing a single sample-efficient policy that adapts to time-varying dynamics, applied to a quadcopter in simulation that carries a payload of varying weight and a real mini-quadcopter carrying a variable string length hanging payload. From the sample-efficient PETS policy, our approach learns a dynamics model from data and learns a context variable to represent a range of dynamics. At test time, we infer the context that best explains recent data. We evaluate this method both on a simulated quadcopter and a real quadcopter, the Ryze Tello. For both scenarios, we illustrate the performance improvements of our method in adapting to different dynamics compared to traditional model-based techniques. Supplemental materials and videos can be found at our website: https://sites.google.com/view/meta-rl-for-flight.
Advisors: Sergey Levine
BibTeX citation:
@mastersthesis{Li:EECS-2020-77, Author= {Li, Rachel}, Title= {Adapting with Latent Variables in Model-Based Reinforcement Learning for Quadcopter Flight}, School= {EECS Department, University of California, Berkeley}, Year= {2020}, Month= {May}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-77.html}, Number= {UCB/EECS-2020-77}, Abstract= {Real-world robots will require adaptation to a wide variety of underlying dynamics functions. For example, an autonomous delivery drone would need to fly with different payloads or environmental conditions that modify the physics of flight, and a land robot might encounter varying terrains during its runtime. This paper focuses on developing a single sample-efficient policy that adapts to time-varying dynamics, applied to a quadcopter in simulation that carries a payload of varying weight and a real mini-quadcopter carrying a variable string length hanging payload. From the sample-efficient PETS policy, our approach learns a dynamics model from data and learns a context variable to represent a range of dynamics. At test time, we infer the context that best explains recent data. We evaluate this method both on a simulated quadcopter and a real quadcopter, the Ryze Tello. For both scenarios, we illustrate the performance improvements of our method in adapting to different dynamics compared to traditional model-based techniques. Supplemental materials and videos can be found at our website: https://sites.google.com/view/meta-rl-for-flight.}, }
EndNote citation:
%0 Thesis %A Li, Rachel %T Adapting with Latent Variables in Model-Based Reinforcement Learning for Quadcopter Flight %I EECS Department, University of California, Berkeley %D 2020 %8 May 28 %@ UCB/EECS-2020-77 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-77.html %F Li:EECS-2020-77