Robust Action Primitives and Visual Perception Pipelines for Automation in Surgical and Industrial Robotics Applications

Kishore Srinivas

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-103

May 14, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-103.pdf

Many robotic manipulation tasks consist of the same few fundamental subtasks: perceiving the environment, building an informative state estimate, interacting with the environment to perform the desired manipulation, evaluating the success of the intended manipulation, and responding to any detected failures. Developing a reliable visual perception system and motion primitives provides the building blocks for automating the execution of these tasks. In this thesis, I present the frameworks we used to automate manipulation tasks in a variety of settings.

First, I consider the task of surgical suturing and introduce STITCH, a novel framework combining deep learning, analytical, and sampling-based approaches to perform 6D needle pose estimation for closed-loop control. We incorporate “interactive perception” for improving needle pose estimation and correction to increase robustness to uncertainty in perception, control, and physics. In experiments, we find that STITCH achieves an average of 4.47 successful sutures with human intervention. Next, I explore the task of tableware decluttering and present TIDY, a framework consisting of a classical vision pipeline for tableware detection and a set of action primitives leveraging multi-object grasping to efficiently clear tableware from a workspace. We developed two algorithms incorporating consolidation and multi-object grasps and find that this leads to a 1.8x improvement in the number of objects transported at once. Finally, I investigate the task of large scale 3D scene reconstruction and introduce Room-Scale LEGS, an online multi-camera 3DGS reconstruction system for large-scale scenes that constructs a hybrid 3D semantic representation using explicit 3D Gaussians for geometry and implicit scale-conditioned hash-grid for the semantics. We find that Room-Scale LEGS produces high quality Gaussian Splats in room-scale scenes with training times 3.5x faster than baselines.

This thesis presents the motivation, methods, and results for each of these frameworks, and briefly explores how they can be extended to other tasks in related domains.

BibTeX citation:

@mastersthesis{Srinivas:EECS-2024-103,
    Author= {Srinivas, Kishore},
    Title= {Robust Action Primitives and Visual Perception Pipelines for Automation in Surgical and Industrial Robotics Applications},
    School= {EECS Department, University of California, Berkeley},
    Year= {2024},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-103.html},
    Number= {UCB/EECS-2024-103},
    Abstract= {Many robotic manipulation tasks consist of the same few fundamental subtasks: perceiving the environment, building an informative state estimate, interacting with the environment to perform the desired manipulation, evaluating the success of the intended manipulation, and responding to any detected failures. Developing a reliable visual perception system and motion primitives provides the building blocks for automating the execution of these tasks. In this thesis, I present the frameworks we used to automate manipulation tasks in a variety of settings. 

First, I consider the task of surgical suturing and introduce STITCH, a novel framework combining deep learning, analytical, and sampling-based approaches to perform 6D needle pose estimation for closed-loop control. We incorporate “interactive perception” for improving needle pose estimation and correction to increase robustness to uncertainty in perception, control, and physics. In experiments, we find that STITCH achieves an average of 4.47 successful sutures with human intervention. Next, I explore the task of tableware decluttering and present TIDY, a framework consisting of a classical vision pipeline for tableware detection and a set of action primitives leveraging multi-object grasping to efficiently clear tableware from a workspace. We developed two algorithms incorporating consolidation and multi-object grasps and find that this leads to a 1.8x improvement in the number of objects transported at once. Finally, I investigate the task of large scale 3D scene reconstruction and introduce Room-Scale LEGS, an online multi-camera 3DGS reconstruction system for large-scale scenes that constructs a hybrid 3D semantic representation using explicit 3D Gaussians for geometry and implicit scale-conditioned hash-grid for the semantics. We find that Room-Scale LEGS produces high quality Gaussian Splats in room-scale scenes with training times 3.5x faster than baselines. 

This thesis presents the motivation, methods, and results for each of these frameworks, and briefly explores how they can be extended to other tasks in related domains.},
}

EndNote citation:

%0 Thesis
%A Srinivas, Kishore 
%T Robust Action Primitives and Visual Perception Pipelines for Automation in Surgical and Industrial Robotics Applications
%I EECS Department, University of California, Berkeley
%D 2024
%8 May 14
%@ UCB/EECS-2024-103
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-103.html
%F Srinivas:EECS-2024-103