### Michael Lim

###
EECS Department

University of California, Berkeley

Technical Report No. UCB/EECS-2023-160

May 12, 2023

### http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-160.pdf

Sequential decision making under uncertainty problems often deal with partially observable Markov decision processes (POMDPs). POMDPs mathematically capture making decisions at each step while accounting for potential rewards and uncertainties an agent may encounter in the future, which make them desirable and flexible representations of many real world problems. However, such sequential decision making problems with various sources of uncertainty are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. Furthermore, modern problem settings require sophisticated machine learning techniques to effectively handle complex data structures like image, text or audio inputs, while performing complicated reasoning such as localizing with noisy camera images or predicting intentions and locations of other agents. Modern approaches that involve artificial intelligence and machine learning methods provide powerful computational resources that can effectively manage the above challenges. Many of these decision making algorithms and machine learning techniques can either capture rigorous theoretical guarantees or empirical performance, but few capture both.

This dissertation aims to lay the foundations to study sequential decision making under uncertainty from multiple angles: theoretical guarantees, integration with learning, and real world applications. We strike a balance between mathematical analysis of the foundational framework of POMDPs, and enabling and deploying these techniques via integration with machine learning techniques through compositional learning.

We first begin the theoretical portion of the dissertation by analyzing novel POMDP solvers and their theoretical convergence properties. This portion introduces a several novel POMDP algorithms that serve as foundations for studying convergence properties of modern POMDP algorithms when dealing with continuous observation and action spaces. Then, we cover a more general result that provides theoretical guarantees and justification for solving the particle belief approximation of POMDPs while retaining guarantees in the original POMDP. This result formally justifies a common POMDP approximation technique known as the particle likelihood weighting, which is the first-of-its-kind in theoretically explaining a family of modern POMDP algorithms that use this technique.

Then, we introduce approaches to integrate model-based planning with learning-based components via compositional learning for real world robotic settings. First, we study how to integrate the aforementioned POMDP planning algorithms with machine learning components by using deep generative models, which enables these algorithms to tackle visual navigation tasks. Second, we substantially extend a robotic arm manipulation algorithm for tabletop manipulation through reasoning with demonstration sequences and weighted multi-task learning.

Lastly, we propose a novel application area of sequential decision making in ecological sub-field of community state navigation. Specifically, we focus on formulating the species coexistence navigation problem as an optimal path planning problem. This approach allows us to understand the population dynamics by analyzing small perturbations to the equilibrium states and subsequently find action sequences that allow efficient navigation. We also discuss the benefits and impact of applying sequential decision making framework to community state navigation problems and beyond.

Afterwards, we summarize the main contributions once again and contextualize the novel contributions. We also discuss some opportunities for future works in sequential decision making under uncertainty, in terms of new theoretical developments, alternative approaches for compositional learning, and other avenues for impactful real world applications.

**Advisor:** Claire Tomlin

BibTeX citation:

@phdthesis{Lim:EECS-2023-160, Author = {Lim, Michael}, Title = {Sequential Decision Making under Uncertainty: Optimality Guarantees, Compositional Learning, and Applications to Robotics and Ecology}, School = {EECS Department, University of California, Berkeley}, Year = {2023}, Month = {May}, URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-160.html}, Number = {UCB/EECS-2023-160}, Abstract = {Sequential decision making under uncertainty problems often deal with partially observable Markov decision processes (POMDPs). POMDPs mathematically capture making decisions at each step while accounting for potential rewards and uncertainties an agent may encounter in the future, which make them desirable and flexible representations of many real world problems. However, such sequential decision making problems with various sources of uncertainty are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. Furthermore, modern problem settings require sophisticated machine learning techniques to effectively handle complex data structures like image, text or audio inputs, while performing complicated reasoning such as localizing with noisy camera images or predicting intentions and locations of other agents. Modern approaches that involve artificial intelligence and machine learning methods provide powerful computational resources that can effectively manage the above challenges. Many of these decision making algorithms and machine learning techniques can either capture rigorous theoretical guarantees or empirical performance, but few capture both. This dissertation aims to lay the foundations to study sequential decision making under uncertainty from multiple angles: theoretical guarantees, integration with learning, and real world applications. We strike a balance between mathematical analysis of the foundational framework of POMDPs, and enabling and deploying these techniques via integration with machine learning techniques through compositional learning. We first begin the theoretical portion of the dissertation by analyzing novel POMDP solvers and their theoretical convergence properties. This portion introduces a several novel POMDP algorithms that serve as foundations for studying convergence properties of modern POMDP algorithms when dealing with continuous observation and action spaces. Then, we cover a more general result that provides theoretical guarantees and justification for solving the particle belief approximation of POMDPs while retaining guarantees in the original POMDP. This result formally justifies a common POMDP approximation technique known as the particle likelihood weighting, which is the first-of-its-kind in theoretically explaining a family of modern POMDP algorithms that use this technique. Then, we introduce approaches to integrate model-based planning with learning-based components via compositional learning for real world robotic settings. First, we study how to integrate the aforementioned POMDP planning algorithms with machine learning components by using deep generative models, which enables these algorithms to tackle visual navigation tasks. Second, we substantially extend a robotic arm manipulation algorithm for tabletop manipulation through reasoning with demonstration sequences and weighted multi-task learning. Lastly, we propose a novel application area of sequential decision making in ecological sub-field of community state navigation. Specifically, we focus on formulating the species coexistence navigation problem as an optimal path planning problem. This approach allows us to understand the population dynamics by analyzing small perturbations to the equilibrium states and subsequently find action sequences that allow efficient navigation. We also discuss the benefits and impact of applying sequential decision making framework to community state navigation problems and beyond. Afterwards, we summarize the main contributions once again and contextualize the novel contributions. We also discuss some opportunities for future works in sequential decision making under uncertainty, in terms of new theoretical developments, alternative approaches for compositional learning, and other avenues for impactful real world applications.} }

EndNote citation:

%0 Thesis %A Lim, Michael %T Sequential Decision Making under Uncertainty: Optimality Guarantees, Compositional Learning, and Applications to Robotics and Ecology %I EECS Department, University of California, Berkeley %D 2023 %8 May 12 %@ UCB/EECS-2023-160 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-160.html %F Lim:EECS-2023-160