Object-Centric Perception for Real-World Robotics

Nikhil Mishra

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-24

April 30, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-24.pdf

Deep learning has resulted in incredible progress in many applications of artificial intelligence. However, these techniques often fall short when applied to robotics, due to their inability to reason about the ambiguity that often arises in the real world. Much of this ambiguity stems from the real world’s long-tail visual diversity – in particular, the huge variety of objects that robots must interact with. Such shortcomings are only exacerbated by the strict requirements for autonomous, high-throughput operation that deployed systems must meet, as well as the cost and difficulty of obtaining the large-scale training datasets that modern deep learning methods require.

In this thesis, we explore two primary avenues of addressing these challenges. First, we introduce models that can better express uncertainty in challenging or ambiguous situations, across a variety of 2D and 3D perception tasks. Real-world robots can incorporate these models to reason explicitly about ambiguity, in flexible ways depending on their specific tasks. Second, we extend the capabilities of neural renderers to develop a sim2real2sim method that can drastically reduce the amount of data needed to train such models. From only a handful of in-the-wild examples, our method learns to generate synthetic scenes, targeted to specific real objects and environments, that can be used to train downstream perception models for a variety of tasks.

Advisors: Pieter Abbeel

BibTeX citation:

@phdthesis{Mishra:EECS-2024-24,
    Author= {Mishra, Nikhil},
    Title= {Object-Centric Perception for Real-World Robotics},
    School= {EECS Department, University of California, Berkeley},
    Year= {2024},
    Month= {Apr},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-24.html},
    Number= {UCB/EECS-2024-24},
    Abstract= {Deep learning has resulted in incredible progress in many applications of artificial intelligence. However, these techniques often fall short when applied to robotics, due to their inability to reason about the ambiguity that often arises in the real world. Much of this ambiguity stems from the real world’s long-tail visual diversity – in particular, the huge variety of objects that robots must interact with. Such shortcomings are only exacerbated by the strict requirements for autonomous, high-throughput operation that deployed systems must meet, as well as the cost and difficulty of obtaining the large-scale training datasets that modern deep learning methods require.

In this thesis, we explore two primary avenues of addressing these challenges. First, we introduce models that can better express uncertainty in challenging or ambiguous situations, across a variety of 2D and 3D perception tasks. Real-world robots can incorporate these models to reason explicitly about ambiguity, in flexible ways depending on their specific tasks. Second, we extend the capabilities of neural renderers to develop a sim2real2sim method that can drastically reduce the amount of data needed to train such models. From only a handful of in-the-wild examples, our method learns to generate synthetic scenes, targeted to specific real objects and environments, that can be used to train downstream perception models for a variety of tasks.},
}

EndNote citation:

%0 Thesis
%A Mishra, Nikhil 
%T Object-Centric Perception for Real-World Robotics
%I EECS Department, University of California, Berkeley
%D 2024
%8 April 30
%@ UCB/EECS-2024-24
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-24.html
%F Mishra:EECS-2024-24