Simultaneous Detection and Segmentation

Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik.

Abstract

We aim to detect all instances of a category in an image and, for each instance, mark the pixels that belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic segmentation, we require individual object instances. We build on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN [1]), introducing a novel architecture tailored for SDS. We then use category-specific, top-down figure-ground predictions to refine our bottom-up proposals. We show a 7 point boost (16% relative) over our baselines on SDS, a 4 point boost (8% relative) over state-of-the-art on semantic segmentation, and state-of-the-art performance in object detection. Finally, we provide diagnostic tools that unpack performance and provide directions for future work.

Downloads

Paper
Code: You can download the code from the GitHub repository or as a zip.
Pretrained models: You can download the precomputed models here.
Precomputed results: You can download the precomputed results here.

You'll need to download the code to use the precomputed results and the precomputed models. Look at the README in the code for more details.

We've also published our results (detection and semantic segmentation) on PASCAL VOC 2012 Test. Look at both results here or on the VOC leaderboard.

Citation

Please cite the paper as:

@InProceedings{BharathECCV2014, author = "Bharath Hariharan and Pablo Arbel\'{a}ez and Ross Girshick and Jitendra Malik", title = "Simultaneous Detection and Segmentation", booktitle = "European Conference on Computer Vision (ECCV)", year = "2014", }

A note on semantic segmentation results

For the semantic segmentation task the paper only compares to O₂P [2] and R-CNN [1]. There has been other semantic segmentation work that performs better. The table below is a better sampling of the state-of-the-art:

Method Mean IU on VOC 2012 Test

O₂P [2] 47.8

DivMBest [3] 48.1

Fisher CodeMaps [4] 48.3

Lin et al. [5] 50.6

C+ref (Ours) 51.6

References

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accu- rate object detection and semantic segmentation. In CVPR, 2014.
Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In ECCV, 2012.
Yadollahpour, P., Batra, D., Shakhnarovich, G.: Discriminative Re-Ranking of Diverse Segmentations. In CVPR, 2013.
Li, Z., Gavves, E., van de Sande, K.E., Snoek, C., Smeulders, A.W.: Codemaps Segment, Classify and Search Objects Locally. In ICCV, 2013.
Lin, X., Cogswell, M., Parikh, D., Batra, D.: Propose and Re-rank Semantic Segmentation via Deep Image Classification. In BigVision Workshop, 2014.

Method	Mean IU on VOC 2012 Test
O₂P [2]	47.8
DivMBest [3]	48.1
Fisher CodeMaps [4]	48.3
Lin et al. [5]	50.6
C+ref (Ours)	51.6