LabelAR: A spatial guidance interface for fast computer vision image collection

James Smith and Michael Laielli and Giscard Biamby and Trevor Darrell and Björn Hartmann

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2019-58

May 17, 2019

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-58.pdf

Computer vision is applied in an ever expanding range of applications, many of which require custom training data to perform well. We present a novel interface for rapid collection and labeling of training images to improve computer vision based object detectors. LabelAR leverages the spatial tracking capabilities of an AR-enabled camera, allowing users to place persistent bounding volumes that stay centered on real-world objects. The interface then guides the user to move the camera to cover a wide variety of viewpoints. We eliminate the need for post-hoc manual labeling of images by automatically projecting 2D bounding boxes around objects in the images as they are captured from AR-marked viewpoints. In a user study with 12 participants, LabelAR significantly outperforms existing approaches in terms of the trade-off between model performance and collection time.

Advisors: Björn Hartmann

BibTeX citation:

@mastersthesis{Smith:EECS-2019-58,
    Author= {Smith, James and Laielli, Michael and Biamby, Giscard and Darrell, Trevor and Hartmann, Björn},
    Title= {LabelAR: A spatial guidance interface for fast computer vision image collection},
    School= {EECS Department, University of California, Berkeley},
    Year= {2019},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-58.html},
    Number= {UCB/EECS-2019-58},
    Abstract= {Computer vision is applied in an ever expanding range of applications, many of which require custom training data to perform well.  We present a novel interface for rapid collection and labeling of training images to improve computer vision based object detectors. LabelAR leverages the spatial tracking capabilities of an AR-enabled camera, allowing users to place persistent bounding volumes that stay centered on real-world objects. The interface then guides the user to move the camera to cover a wide variety of viewpoints. We eliminate the need for post-hoc manual labeling of images by automatically projecting 2D bounding boxes around objects in the images as they are captured from AR-marked viewpoints. In a user study with 12 participants, LabelAR significantly outperforms existing approaches in terms of the trade-off between model performance and collection time.},
}

EndNote citation:

%0 Thesis
%A Smith, James 
%A Laielli, Michael 
%A Biamby, Giscard 
%A Darrell, Trevor 
%A Hartmann, Björn 
%T LabelAR: A spatial guidance interface for fast computer vision image collection
%I EECS Department, University of California, Berkeley
%D 2019
%8 May 17
%@ UCB/EECS-2019-58
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-58.html
%F Smith:EECS-2019-58