James Smith, Michael Laielli, Giscard Biamby, Trevor Darrell and Björn Hartmann
EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2019-58
May 17, 2019
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-58.pdf
Computer vision is applied in an ever expanding range of applications, many of which require custom training data to perform well. We present a novel interface for rapid collection and labeling of training images to improve computer vision based object detectors. LabelAR leverages the spatial tracking capabilities of an AR-enabled camera, allowing users to place persistent bounding volumes that stay centered on real-world objects. The interface then guides the user to move the camera to cover a wide variety of viewpoints. We eliminate the need for post-hoc manual labeling of images by automatically projecting 2D bounding boxes around objects in the images as they are captured from AR-marked viewpoints. In a user study with 12 participants, LabelAR significantly outperforms existing approaches in terms of the trade-off between model performance and collection time.
Advisor: Björn Hartmann
BibTeX citation:
@mastersthesis{Smith:EECS-2019-58, Author = {Smith, James and Laielli, Michael and Biamby, Giscard and Darrell, Trevor and Hartmann, Björn}, Title = {LabelAR: A spatial guidance interface for fast computer vision image collection}, School = {EECS Department, University of California, Berkeley}, Year = {2019}, Month = {May}, URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-58.html}, Number = {UCB/EECS-2019-58}, Abstract = {Computer vision is applied in an ever expanding range of applications, many of which require custom training data to perform well. We present a novel interface for rapid collection and labeling of training images to improve computer vision based object detectors. LabelAR leverages the spatial tracking capabilities of an AR-enabled camera, allowing users to place persistent bounding volumes that stay centered on real-world objects. The interface then guides the user to move the camera to cover a wide variety of viewpoints. We eliminate the need for post-hoc manual labeling of images by automatically projecting 2D bounding boxes around objects in the images as they are captured from AR-marked viewpoints. In a user study with 12 participants, LabelAR significantly outperforms existing approaches in terms of the trade-off between model performance and collection time.} }
EndNote citation:
%0 Thesis %A Smith, James %A Laielli, Michael %A Biamby, Giscard %A Darrell, Trevor %A Hartmann, Björn %T LabelAR: A spatial guidance interface for fast computer vision image collection %I EECS Department, University of California, Berkeley %D 2019 %8 May 17 %@ UCB/EECS-2019-58 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-58.html %F Smith:EECS-2019-58