Learning to Detect Geometric Structures from Images for 3D Parsing
Yichao Zhou
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2020-227
December 18, 2020
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-227.pdf
Recovering 3D geometries of scenes from 2D images is one of the most fundamental and challenging problems in computer vision. On one hand, traditional geometry-based algorithms such as SfM and SLAM are fragile in certain environments, and the resulting noisy point-clouds are hard to process and interpret. On the other hand, recent learning-based 3D-understanding neural networks parse scenes by extrapolating patterns seen in the training data, which often have limited generalizability and accuracy.
In my dissertation, I try to address these shortcomings and combine the advantage of geometric-based and data-driven approaches into an integrated framework. More specifically, I have applied learning-based methods to extract high-level geometric structures from images and use them for 3D parsing. To this end, I have designed specialized neural networks that understand geometric structures such as lines, junctions, planes, vanishing points, and symmetry, and detect them from images accurately; I have created large-scale 3D datasets with structural annotations to support data-driven approaches; and I have demonstrated how to use these high-level abstractions to parse and reconstruct scenes. By combining the power of data-driven approaches and geometric principles, future 3D systems are becoming more accurate, reliable, and easier to implement, resulting in clean, compact, and interpretable scene representations.
Advisors: Yi Ma
BibTeX citation:
@phdthesis{Zhou:EECS-2020-227, Author= {Zhou, Yichao}, Title= {Learning to Detect Geometric Structures from Images for 3D Parsing}, School= {EECS Department, University of California, Berkeley}, Year= {2020}, Month= {Dec}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-227.html}, Number= {UCB/EECS-2020-227}, Abstract= {Recovering 3D geometries of scenes from 2D images is one of the most fundamental and challenging problems in computer vision. On one hand, traditional geometry-based algorithms such as SfM and SLAM are fragile in certain environments, and the resulting noisy point-clouds are hard to process and interpret. On the other hand, recent learning-based 3D-understanding neural networks parse scenes by extrapolating patterns seen in the training data, which often have limited generalizability and accuracy. In my dissertation, I try to address these shortcomings and combine the advantage of geometric-based and data-driven approaches into an integrated framework. More specifically, I have applied learning-based methods to extract high-level geometric structures from images and use them for 3D parsing. To this end, I have designed specialized neural networks that understand geometric structures such as lines, junctions, planes, vanishing points, and symmetry, and detect them from images accurately; I have created large-scale 3D datasets with structural annotations to support data-driven approaches; and I have demonstrated how to use these high-level abstractions to parse and reconstruct scenes. By combining the power of data-driven approaches and geometric principles, future 3D systems are becoming more accurate, reliable, and easier to implement, resulting in clean, compact, and interpretable scene representations.}, }
EndNote citation:
%0 Thesis %A Zhou, Yichao %T Learning to Detect Geometric Structures from Images for 3D Parsing %I EECS Department, University of California, Berkeley %D 2020 %8 December 18 %@ UCB/EECS-2020-227 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-227.html %F Zhou:EECS-2020-227