Simultaneous Detection and Segmentation

Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik.


We aim to detect all instances of a category in an image and, for each instance, mark the pixels that belong to it. We call this task Simultaneous Detection and Segmentation (SDS). Unlike classical bounding box detection, SDS requires a segmentation and not just a box. Unlike classical semantic segmentation, we require individual object instances. We build on recent work that uses convolutional neural networks to classify category-independent region proposals (R-CNN [1]), introducing a novel architecture tailored for SDS. We then use category-specific, top-down figure-ground predictions to refine our bottom-up proposals. We show a 7 point boost (16% relative) over our baselines on SDS, a 4 point boost (8% relative) over state-of-the-art on semantic segmentation, and state-of-the-art performance in object detection. Finally, we provide diagnostic tools that unpack performance and provide directions for future work.


You'll need to download the code to use the precomputed results and the precomputed models. Look at the README in the code for more details.

We've also published our results (detection and semantic segmentation) on PASCAL VOC 2012 Test. Look at both results here or on the VOC leaderboard.


Please cite the paper as:

author = "Bharath Hariharan and Pablo Arbel\'{a}ez and Ross Girshick and Jitendra Malik",
title = "Simultaneous Detection and Segmentation",
booktitle = "European Conference on Computer Vision (ECCV)",
year = "2014",

A note on semantic segmentation results

For the semantic segmentation task the paper only compares to O2P [2] and R-CNN [1]. There has been other semantic segmentation work that performs better. The table below is a better sampling of the state-of-the-art:
Method Mean IU on VOC 2012 Test
O2P [2] 47.8
DivMBest [3] 48.1
Fisher CodeMaps [4] 48.3
Lin et al. [5] 50.6
C+ref (Ours) 51.6


  1. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accu- rate object detection and semantic segmentation. In CVPR, 2014.
  2. Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In ECCV, 2012.
  3. Yadollahpour, P., Batra, D., Shakhnarovich, G.: Discriminative Re-Ranking of Diverse Segmentations. In CVPR, 2013.
  4. Li, Z., Gavves, E., van de Sande, K.E., Snoek, C., Smeulders, A.W.: Codemaps Segment, Classify and Search Objects Locally. In ICCV, 2013.
  5. Lin, X., Cogswell, M., Parikh, D., Batra, D.: Propose and Re-rank Semantic Segmentation via Deep Image Classification. In BigVision Workshop, 2014.