Ronghang Hu

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2019-174

December 17, 2019

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-174.pdf

Most methods for object instance segmentation require all training examples to be labeled with segmentation masks. This requirement makes it expensive to annotate new categories and has restricted instance segmentation models to approximately 100 well-annotated classes. The goal of this report is to propose a new partially supervised training paradigm, together with a novel weight transfer function, that enables training instance segmentation models on a large set of categories all of which have box annotations, but only a small fraction of which have mask annotations. These contributions allow us to train Mask R-CNN to detect and segment 3000 visual concepts using box annotations from the Visual Genome dataset and mask annotations from the 80 classes in the COCO dataset. We evaluate our approach in a controlled study on the COCO dataset. This work is a first step towards instance segmentation models that have broad comprehension of the visual world.

Advisors: Trevor Darrell


BibTeX citation:

@mastersthesis{Hu:EECS-2019-174,
    Author= {Hu, Ronghang},
    Title= {Learning to Segment Every Thing},
    School= {EECS Department, University of California, Berkeley},
    Year= {2019},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-174.html},
    Number= {UCB/EECS-2019-174},
    Abstract= {Most methods for object instance segmentation require all training examples to be labeled with segmentation masks. This requirement makes it expensive to annotate new categories and has restricted instance segmentation models to approximately 100 well-annotated classes. The goal of this report is to propose a new partially supervised training paradigm, together with a novel weight transfer function, that enables training instance segmentation models on a large set of categories all of which have box annotations, but only a small fraction of which have mask annotations. These contributions allow us to train Mask R-CNN to detect and segment 3000 visual concepts using box annotations from the Visual Genome dataset and mask annotations from the 80 classes in the COCO dataset. We evaluate our approach in a controlled study on the COCO dataset. This work is a first step towards instance segmentation models that have broad comprehension of the visual world.},
}

EndNote citation:

%0 Thesis
%A Hu, Ronghang 
%T Learning to Segment Every Thing
%I EECS Department, University of California, Berkeley
%D 2019
%8 December 17
%@ UCB/EECS-2019-174
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-174.html
%F Hu:EECS-2019-174