Using Poselets for Detection and Segmentation

Lubomir Bourdev, Subhransu Maji and Jitendra Malik

Abstract

We address the classic problems of detection and segmentation using a part based detector that operates on a novel part, which we refer to as a poselet. Poselets are tightly clustered in both appearance space (and thus are easy to detect) as well as in configuration space (and thus are helpful for localization and segmentation). We demonstrate poselets are effective for detection, pose extraction and segmentation. Poselet construction requires extra annotations beyond the object bounds. To train poselets we have created H3D (Humans in 3D) - a dataset of 1000 person annotations. The annotations include the joints, the extracted 3D pose, keypoint visibility and region labels. We have also annotated the people in the training and validation sets of PASCAL VOC 2007 and 2009.

Our poselet classifier achieves state-of-the-art results for the person category on PASCAL VOC 2007, 2008 and 2009 as well as on our dataset, H3D.

Results

The following are results as of July 2010 for the Person category of the PASCAL VOC challenges.

	Poselets	Second-highest score
VOC 2009	47.8	43.8 *
VOC 2008	54.1	43.1 **
VOC 2007	46.9	43.2 *

* P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan Object Detection with Discriminatively Trained Part Based Models, (Release 4, 2010)

** P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan Object Detection with Discriminatively Trained Part Based Models, PAMI (preprint, 2009)

In this comparison we included all methods participating in Comp 3 (trained on VOC data) and Comp 4 (trained on own data). Our method requires extra annotations, so we competed in Comp 4, but we were the only submission in that category. Some parts of our algorithm were trained on the VOC training set, but others were trained on H3D, which is richly annotated but roughly 10% of the size of the PASCAL VOC training set.

Papers

Lubomir Bourdev, Subhransu Maji, Thomas Brox, Jitendra Malik,Detecting People Using Mutually Consistent Poselet Activations, ECCV 2010 (to appear)
Lubomir Bourdev, Jitendra Malik,Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations, ICCV 2009

Code

Here is stand-alone code that takes an image and draws bounding boxes of the people in it. Requirements: Matlab + Image Processing toolbox. The code is released with a non-commercial license. The released code and trained detector is similar to the one we used in the PASCAL 2009 competition. The code for our ECCV 2010 paper is not yet available.
Note: If you use WinZip and Matlab reports that your file is corrupt, please try WinRAR.

H3D Dataset

The dataset and the associated Matlab toolbox is available here.

H3D Annotation tool

The Java3D tool that we used to create H3D and a video tutorial are available here. There are no license restrictions on using the tool for your own annotations.

BibTex reference

For the idea of poselets please reference:

@InProceedings{PoseletsICCV09,
  author       = "Lubomir Bourdev and Jitendra Malik",
  title        = "Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations",
  booktitle    = "International Conference on Computer Vision",
  month        = "sep",
  year         = "2009",
  url          = "http://www.eecs.berkeley.edu/~lbourdev/poselets"
}

For our latest work please reference:

@InProceedings{PoseletsECCV10,
  author       = "Lubomir Bourdev and Subhransu Maji and Thomas Brox and Jitendra Malik",
  title        = "Detecting People Using Mutually Consistent Poselet Activations",
  booktitle    = "European Conference on Computer Vision",
  month        = "sep",
  year         = "2010",
  url          = "http://www.eecs.berkeley.edu/~lbourdev/poselets"
}