LabelAR: A spatial guidance interface for fast computer vision image collection

James Smith, Michael Laielli, Giscard Biamby, Trevor Darrell and Björn Hartmann

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2019-58
May 17, 2019

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-58.pdf

Computer vision is applied in an ever expanding range of applications, many of which require custom training data to perform well. We present a novel interface for rapid collection and labeling of training images to improve computer vision based object detectors. LabelAR leverages the spatial tracking capabilities of an AR-enabled camera, allowing users to place persistent bounding volumes that stay centered on real-world objects. The interface then guides the user to move the camera to cover a wide variety of viewpoints. We eliminate the need for post-hoc manual labeling of images by automatically projecting 2D bounding boxes around objects in the images as they are captured from AR-marked viewpoints. In a user study with 12 participants, LabelAR significantly outperforms existing approaches in terms of the trade-off between model performance and collection time.

Advisor: Björn Hartmann


BibTeX citation:

@mastersthesis{Smith:EECS-2019-58,
    Author = {Smith, James and Laielli, Michael and Biamby, Giscard and Darrell, Trevor and Hartmann, Björn},
    Title = {LabelAR: A spatial guidance interface for fast computer vision image collection},
    School = {EECS Department, University of California, Berkeley},
    Year = {2019},
    Month = {May},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-58.html},
    Number = {UCB/EECS-2019-58},
    Abstract = {Computer vision is applied in an ever expanding range of applications, many of which require custom training data to perform well.  We present a novel interface for rapid collection and labeling of training images to improve computer vision based object detectors. LabelAR leverages the spatial tracking capabilities of an AR-enabled camera, allowing users to place persistent bounding volumes that stay centered on real-world objects. The interface then guides the user to move the camera to cover a wide variety of viewpoints. We eliminate the need for post-hoc manual labeling of images by automatically projecting 2D bounding boxes around objects in the images as they are captured from AR-marked viewpoints. In a user study with 12 participants, LabelAR significantly outperforms existing approaches in terms of the trade-off between model performance and collection time.}
}

EndNote citation:

%0 Thesis
%A Smith, James
%A Laielli, Michael
%A Biamby, Giscard
%A Darrell, Trevor
%A Hartmann, Björn
%T LabelAR: A spatial guidance interface for fast computer vision image collection
%I EECS Department, University of California, Berkeley
%D 2019
%8 May 17
%@ UCB/EECS-2019-58
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-58.html
%F Smith:EECS-2019-58