Learning to Detect Geometric Structures from Images for 3D Parsing

Yichao Zhou

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2020-227

December 18, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-227.pdf

Recovering 3D geometries of scenes from 2D images is one of the most fundamental and challenging problems in computer vision. On one hand, traditional geometry-based algorithms such as SfM and SLAM are fragile in certain environments, and the resulting noisy point-clouds are hard to process and interpret. On the other hand, recent learning-based 3D-understanding neural networks parse scenes by extrapolating patterns seen in the training data, which often have limited generalizability and accuracy.

In my dissertation, I try to address these shortcomings and combine the advantage of geometric-based and data-driven approaches into an integrated framework. More specifically, I have applied learning-based methods to extract high-level geometric structures from images and use them for 3D parsing. To this end, I have designed specialized neural networks that understand geometric structures such as lines, junctions, planes, vanishing points, and symmetry, and detect them from images accurately; I have created large-scale 3D datasets with structural annotations to support data-driven approaches; and I have demonstrated how to use these high-level abstractions to parse and reconstruct scenes. By combining the power of data-driven approaches and geometric principles, future 3D systems are becoming more accurate, reliable, and easier to implement, resulting in clean, compact, and interpretable scene representations.

Advisors: Yi Ma

BibTeX citation:

@phdthesis{Zhou:EECS-2020-227,
    Author= {Zhou, Yichao},
    Title= {Learning to Detect Geometric Structures from Images for 3D Parsing},
    School= {EECS Department, University of California, Berkeley},
    Year= {2020},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-227.html},
    Number= {UCB/EECS-2020-227},
    Abstract= {Recovering 3D geometries of scenes from 2D images is one of the most fundamental and challenging problems in computer vision. On one hand, traditional geometry-based algorithms such as SfM and SLAM are fragile in certain environments, and the resulting noisy point-clouds are hard to process and interpret. On the other hand, recent learning-based 3D-understanding neural networks parse scenes by extrapolating patterns seen in the training data, which often have limited generalizability and accuracy.

In my dissertation, I try to address these shortcomings and combine the advantage of geometric-based and data-driven approaches into an integrated framework. More specifically, I have applied learning-based methods to extract high-level geometric structures from images and use them for 3D parsing. To this end, I have designed specialized neural networks that understand geometric structures such as lines, junctions, planes, vanishing points, and symmetry, and detect them from images accurately; I have created large-scale 3D datasets with structural annotations to support data-driven approaches; and I have demonstrated how to use these high-level abstractions to parse and reconstruct scenes. By combining the power of data-driven approaches and geometric principles, future 3D systems are becoming more accurate, reliable, and easier to implement, resulting in clean, compact, and interpretable scene representations.},
}

EndNote citation:

%0 Thesis
%A Zhou, Yichao 
%T Learning to Detect Geometric Structures from Images for 3D Parsing
%I EECS Department, University of California, Berkeley
%D 2020
%8 December 18
%@ UCB/EECS-2020-227
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-227.html
%F Zhou:EECS-2020-227