The Human Annotation Tool

Lubomir Bourdev and Jitendra Malik

Updated June 17, 2011

The Human Annotation Tool is a tool that allows one to annotate people - where their arms and legs are, what their 3D pose is, which body parts are occluded, etc. A database of annotated people would be invaluable for creating computer vision algorithms to detect and localize people.

Try the Human Annotation Tool

You may run a copy of the tool by clicking on the above image and agreeing to all the disclaimers. You need to have Java and a reasonably good graphics card. The tool is tested on Mac OS X and on Windows.

Tutorial

The tool supports two kinds of annotations - labeling joints and extracting the 3D pose, and labeling the regions of the body (hair, face, upper clothes, etc).

PART 1: CREATING THE 3D POSE

Here is a video tutorial that shows you how to create the 3D pose:

 

The picture above shows all the controls. To annotate, please follow these steps.

STEP 1. Navigate to the person to annotate (blue controls)

To jump to a given annotation, put its index in the "Current Entry" box. Use the arrows to go to an image containing a person that is not annotated. Pan and zoom to the person. Press the New Entry button and choose Male, Female or Child.

STEP 2. Specify keypoints (picture controls)

Move the mouse over the location of each keypoint and press the corresponding key, indicated with a picture of the body part. The right shoulder, elbow and wrist correspond to keys Q, A, and Z, and the left ones - to W, S and X. The right hip, knee and ankle correspond to E, D, and C, and the left ones - to R, F and V. Select the ears with T and U, the eyes with G and H, and the nose with Y. You may pick and drag keypoints to adjust their locations.

You can also use J for the back of the head, and P and O for the right/left toes respectively. The keys can be changed from the configuration file.

If a keypoint is occluded or falls outside the image but you have a rough guess where it should be, mark it as best as you can. Leave keypoints unmarked if you have no idea where they should lie. Both shoulders,

Shoulders, Elbows, Wrists, Hips, Knees, Ankles: Approximate limbs as cylinders. The joint location in 3D is the intersection of the axes of adjoining cylinders.

Left vs Right: The keypoint is labelled as left or right from the point of view of the labelled person, not based its location in the image. For example, if the person is facing the camera his or her left keypoints lie on the right in image space

Detail

Nose Tip: The location is the tip of the nose, regardless of frontal or profile view.nose_frontalnose_profile

Eyes: In frontal view it is the midpoint of the two eye corners. The eye location does not depend on the pupils. frontal_eyes In profile view it is the tip of the eye surface.profile_eye Even if the eye is closed, we estimate the tip of the eye surface, ignoring the eyelids.

Ears: The tip of the tragus (the small pointed eminence of the external ear). frontal_ear. profile_ear

 

STEP 3. Specify keypoint z-order and visibility (red controls)

When you hover with the mouse over a keypoint, use the red keys to specify keypoint properties or to delete it.

Press N to mark a keypoint as occluded. Occluded keypoints are shown in green. The general rule is that a keypoint is visible if and only if the ray from the keypoint location reaches the camera, with the following exceptions:

Detail

Eyes: Glasses (including dark sunglasses) do not hide the eye. Closed eyes are still considered visible.

Ears: If the triagus is overlapped by the person's hair or clothes, it is considered hidden.hidden_ear

Shoulders, Elbows, Wrists, Hips, Knees, Feet: The clothes corresponding to the area of the joint do not hide the joint.visible_joint The joint may, however, be hidden by the torso or limbs of the same person. For example, in right profile view, the left joints of the body are often occluded by the torso and/or by the right limbs.

Most keypoints have a reference keypoint and, for each keypoint, we need to specify if it is closer to the camera, roughly equidistant, or further away than its reference keypoint. By default keypoints are considered equidistant. When you hover over a keypoint, you will see a segment connecting it to its reference keypoint. Use the B key to toggle between the three depth states of the keypoint. When a keypoint is marked Far (or Near), it is shown smaller (or larger) than usual. The segment to its reference keypoint also changes.

STEP 4. Adjusting the 3D pose (green controls)

It is a good idea to press the Save button to record all of your changes. It is now time to adjust the keypoints to approximate the correct 3D pose of the person. Orbit the right window using the left mouse button to see the person from another viewpoint. The 3D pose is usually far from correct initially. You may need to adjust a few keypoints manually and see how that affects the 3D pose. A few tips:

STEP 5. Save!

Be sure to press the Save button before going to the next annotation or you will lose all of your changes!

A pose labelling is acceptable if:

  1. The keypoints are near their ideal positions, and definitely over the body
  2. There are no bright red segments in the 3D view
  3. The 3D pose looks reasonable from multiple viewpoints
  4. Both shoulders are labelled

PART 2: LABELING THE REGIONS

Some images have associated precomputed segmentations. When a segmentation is available, you can switch to region labelling mode using the "Pose3D / Segmentation" radio button.

Segmentation Tool

In this view the left window remains the image itself, while the right window shows the image broken down ito segments and the associated labels for each segment. You can pan and zoom the image from either window. The segment under the cursor is hilighted in red, as shown in the picture. Furthermore, its label is shown in the info bar at the bottom - in the above example the cursor is over a region labelled "LowerClothes". The labeled region selected by this segment is shown in the left image (not displayed in this example). Some files contain thousands of small segments and it would be too time consuming to label each of them. The HAT tool allows us to hierarchically merge segments into fewer large ones and label them at once. You can use the up/down arrow keys to subdivide or merge segments. Holding down the Shift button while pressing up/down arrow makes larger segmentation steps. Here is the above example at three different segmentation levels:

seg levels

A preferred way to label images containing thousands of small segments is to start labelling at low subdivision levels and then increase the subdivision and refine the labelled regions.region keys

Segments are labelled by pressing a key while the mouse is over the segment. Most of the keys are shown on the picture to the right and are as follows:

The following keys are not shown in the illustration on the right:

Other keys:

Note the following:

The region annotation tool now supports restricted labeling, which is very helpful in making sure new edits don't damage previous annotations. Specificially, the scope of the region labeling sequence is constrained to the label under the mouse at the beginning of the sequence. For example, if you press and hold the "d" (dress) key while the mouse is over "l" (lower clothes) region, and then move the mouse, you will be marking regions as dress, but only the ones that used to be marked lower clothes. Similarly, if you press and hold the delete key while on a labelled region, your erasing operation will only affect regions labelled with that label.

The June 2011 version fixes several bugs that were introduced with the latest Java updates. Region labeling is now possible on Macs. However, on some computers region labeling is VERY slow. If you find it slow please try it on a different computer/operating system.

Using the tool for your data, and beyond person

By default the tool uses demo images and cannot save them. To use it on your data, you will need to create a directory, for example my_annotations, and place the images in a subdirectory my_annotations/images. In addition, you need to place a configuration file in my_annotations/person_config.xml. Here is the default configuration file. You may change it to add new keypoints or regions, change the shortcuts, or even label a new category. Upon startup the tool asks you to open that configuration file. The tool accepts JPG images only.

Region labeling requires segmenting the images and placing the segmentations in a directory my_annotations/segmentations. To create the segmentation image from a given JPEG image you need to download the Berkeley segmentation engine. Use this file to create the segmentation image and save it as PNG in the segmentations directory. Restart the HAT tool and go to that image. The radio button for segmentation should now be enabled.

The annotations are saved in a new directory my_annotations/info that the tool creates. Each annotation is a separate XML file. The XML files generated by the tool can also be read by our Matlab tools, which can use the data to create poselets or to compute various statistics.

The H3D dataset version 1.01+ is compatible with this tool. You may download it and explore its directory structure.

Code

The source code of the annotation tool is now available here. Feel free to extend it or fix any bugs and let me know if you have created an improved version that we can use instead of this one. The code requires bravery. You are on your own.

Copyright

The images used in H3D are taken from Flickr under the Creative Commons Attribution license. It allows for redistribution and derivative work for non-commercial or commercial purposes as long as the authors are attributed accordingly. Please see the license for more detail.

Feedback

If you find bugs or have suggestions on how to improve the tool, please email me at lbourdev-at-eecs. Your feedback is much appreciated!

Reference

Camillo J. Taylor. "Reconstruction of Articulated Objects from Point Correspondences in a Single Image" : Computer Vision and Image Understanding, Volume 80, No. 3 pp 349-363 Dec. 2000