High-Fidelity 3D Mesh Reconstruction of Humans and Objects

Shubham Goel

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-254

December 1, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-254.pdf

Humans perceive the world through their eyes -- where the images formed on the retina are two-dimensional projections of the underlying three-dimensional world. Akin to human vision, the goal of computer vision, is to extract information about the 3D world from 2D images. A fundamental problem in computer vision is to extract the 3D structure underlying such 2D images. Even though this problem is mathematically ill-posed, the ambiguity can be resolved, either using multiple 2D views, or using priors about how the world is structured.

In this thesis, I present my work on high-fidelity 3D mesh reconstruction of humans and objects from 2D images. I discuss the more classical setting of optimizing a shape/texture using multiple image inputs, as well as how we can learn priors that enable mesh reconstruction even from a single image. Specifically, I first present work on multi-view 3D reconstruction, where we reconstruct meshes of an object given few images with noisy camera poses. Then, I continue with 3D reconstruction from single images, enabled by learning category-specific shape priors from natural image datasets. Finally, I focus on learning single-view 3D human reconstruction using big models and big data. Such robust 3D reconstruction of humans enables downstream applications like 3D tracking and action recognition.

Advisors: Jitendra Malik and Angjoo Kanazawa

BibTeX citation:

@phdthesis{Goel:EECS-2023-254,
    Author= {Goel, Shubham},
    Title= {High-Fidelity 3D Mesh Reconstruction of Humans and Objects},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-254.html},
    Number= {UCB/EECS-2023-254},
    Abstract= {Humans perceive the world through their eyes -- where the images formed on the retina are two-dimensional projections of the underlying three-dimensional world. Akin to human vision, the goal of computer vision, is to extract information about the 3D world from 2D images. A fundamental problem in computer vision is to extract the 3D structure underlying such 2D images. Even though this problem is mathematically ill-posed, the ambiguity can be resolved, either using multiple 2D views, or using priors about how the world is structured.

In this thesis, I present my work on high-fidelity 3D mesh reconstruction of humans and objects from 2D images. I discuss the more classical setting of optimizing a shape/texture using multiple image inputs, as well as how we can learn priors that enable mesh reconstruction even from a single image. Specifically, I first present work on multi-view 3D reconstruction, where we reconstruct meshes of an object given few images with noisy camera poses. Then, I continue with 3D reconstruction from single images, enabled by learning category-specific shape priors from natural image datasets. Finally, I focus on learning single-view 3D human reconstruction using big models and big data. Such robust 3D reconstruction of humans enables downstream applications like 3D tracking and action recognition.},
}

EndNote citation:

%0 Thesis
%A Goel, Shubham 
%T High-Fidelity 3D Mesh Reconstruction of Humans and Objects
%I EECS Department, University of California, Berkeley
%D 2023
%8 December 1
%@ UCB/EECS-2023-254
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-254.html
%F Goel:EECS-2023-254