Progress and Proposals: A Case Study of Monocular Depth Estimation

Khalil Sarwari and Forrest Laine and Claire Tomlin

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-32

May 5, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-32.pdf

Deep learning has achieved great results and made rapid progress over the past few years, particularly in the field of computer vision. Deep learning models are composed of artificial neural networks and a supervised, semi-supervised, or unsupervised learning scheme. Larger models have neural network architectures with more parameters, often resulting from more/wider layers. In this paper, we perform a case study in the domain of monocular depth estimation and contribute both a new model as well as a new dataset. We propose PixelBins, a simplification to AdaBins, the existing state-of-the-art model, and obtain comparable performance to state-of-the-art methods. Our method achieves a ~20x reduction in model size as well as an absolute relative error of 0.057 on the popular KITTI benchmark. Furthermore, we conceptualize and justify the need for truly open datasets. Consequently, we introduce a modern, extensible dataset consisting of high quality, cross-calibrated image+point cloud pairs across a diverse set of locations. The dataset is uniquely suited for the designation of truly open for a variety of reasons, such as a ~100x reduction in cost to contribute a new image+pointcloud pair. We make our code and dataset publicly available and provide instructions for contributing to and replicating our experiments.

Advisors: Claire Tomlin

BibTeX citation:

@mastersthesis{Sarwari:EECS-2021-32,
    Author= {Sarwari, Khalil and Laine, Forrest and Tomlin, Claire},
    Title= {Progress and Proposals: A Case Study of Monocular Depth Estimation},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-32.html},
    Number= {UCB/EECS-2021-32},
    Abstract= {Deep learning has achieved great results and made rapid progress over the past few years, particularly in the field of computer vision. Deep learning models are composed of artificial neural networks and a supervised, semi-supervised, or unsupervised learning scheme. Larger models have neural network architectures with more parameters, often resulting from more/wider layers. In this paper, we perform a case study in the domain of monocular depth estimation and contribute both a new model as well as a new dataset. We propose PixelBins, a simplification to AdaBins, the existing state-of-the-art model, and obtain comparable performance to state-of-the-art methods. Our method achieves a ~20x reduction in model size as well as an absolute relative error of 0.057 on the popular KITTI benchmark. Furthermore, we conceptualize and justify the need for truly open datasets. Consequently, we introduce a modern, extensible dataset consisting of high quality, cross-calibrated image+point cloud pairs across a diverse set of locations. The dataset is uniquely suited for the designation of truly open for a variety of reasons, such as a ~100x reduction in cost to contribute a new image+pointcloud pair. We make our code and dataset publicly available and provide instructions for contributing to and replicating our experiments.},
}

EndNote citation:

%0 Thesis
%A Sarwari, Khalil 
%A Laine, Forrest 
%A Tomlin, Claire 
%T Progress and Proposals: A Case Study of Monocular Depth Estimation
%I EECS Department, University of California, Berkeley
%D 2021
%8 May 5
%@ UCB/EECS-2021-32
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-32.html
%F Sarwari:EECS-2021-32