Recognition:

Objects, Humans, Activities

"I can see the cup on the table," interupted Diogenes, "but I can't see the `cupness'".

"That's because you have the eyes to see the cup," said Plato, "but", tapping his head with his forefinger, "you don't have the intellect with which to comprehend `cupness'."

Teachings of Diogenes

In order to survive, an animal has to understand the environment it is in, learning to spot both predator and prey, spot navigational obstacles and so on. Detecting and recognizing objects is thus one of the most important uses of vision systems in nature, and is consequently highly evolved. Indeed, humans can distinguish between more than 30,000 visual categories, and can detect objects in the span of a few hundred milliseconds.

Although we are nowhere near human performance in this task, we have made considerable progress in the past few years. Our work focusses on building object detection systems that can work "in the wild", in the presence of heavy occlusion and drastic appearance changes. We work on a large variety of systems, ranging from those based on sliding window-based detectors to those powered by regions from bottom up segmentation. Along the way our research has also yielded better feature descriptors, and better and more efficient machine learning algorithms.

Object detection is hardly the end goal, and keeping that in mind, we also focus on finer grained tasks, such as segmenting out the pixels associated with the objects, or inferring its pose and other attributes.

Publications:

	"Learning Rich Features from RGB-D Images for Object Detection and Segmentation" S. Gupta, R. Girshick, P. Arbelaez and J. Malik. To appear, ECCV 2014 [pdf] [project page (coming soon)]
	"Simultaneous Detection and Segmentation" B. Hariharan, P. Arbelaez, R. Girshick and J. Malik. To appear, ECCV 2014 [pdf] [project page]
	"Analyzing the Performance of Multilayer Neural Networks for Object Recognition" Pulkit Agrawal, R. Girshick and J. Malik. To appear, ECCV 2014 [pdf] [supplementary]
	"Using k-poselets for detecting people and localizing their keypoints" G. Gkioxari, B. Hariharan, R. Girshick and J. Malik. (*equal contribution). CVPR 2014 [pdf] [project page]
	"Volumetric Semantic Segmentation using Pyramid Context Features" J. Barron, P. Arbelaez, S. Keranen, M. Biggin, D. Knowles and J. Malik. ICCV, 2013. [pdf] [bibtex] [movie]
	"Articulated Pose Estimation using Discriminative Armlet Classifiers" G. Gkioxari, P. Arbelaez, L. Bourdev,and J. Malik. CVPR 2013 [pdf]
	"Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images" S. Gupta, P. Arbelaez, and J. Malik. CVPR 2013 [pdf]
	"Multi-Component Models for Object Detection" C. Gu, P. Arbelaez, Y. Lin, K. Yu, and J. Malik. ECCV 2012 [pdf]
	"Discriminative Decorrelation for Clustering and Classification" B. Hariharan, J. Malik, D. Ramanan. ECCV 2012 [pdf]
	"Semantic Segmentation using Regions and Parts" P. Arbelaez, B. Hariharan, C. Gu, S. Gupta, L. Bourdev, and J. Malik. CVPR 2012 [pdf]
	"Describing People: A Poselet-Based Approach to Attribute Classification" L. Bourdev, S. Maji, and J. Malik. ICCV 2011 [pdf] [More Info]
	"Semantic Contours from Inverse Detectors" B. Hariharan, P. Arbelaez, L. Bourdev, S. Maji, and J. Malik. ICCV 2011 [pdf] [dataset]
	"Object Segmentation by Alignment of Poselet Activations to Image Contours" T. Brox, L. Bourdev, S. Maji, and J. Malik, CVPR 2011. [pdf] [More Info]
	"Action Recognition from a Distributed Representation of Pose and Appearance" S. Maji, L. Bourdev, and J. Malik, CVPR 2011. [pdf] [More Info]
	"Discriminative Mixture-of-Templates for Viewpoint Classification" C. Gu and X. Ren, ECCV 2010. [pdf]
	"Detecting People Using Mutually Consistent Poselet Activations" L. Bourdev , S. Maji, T. Brox, and J. Malik, ECCV 2010. [pdf] [More Info]
	"Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations" L. Bourdev and J. Malik. ICCV 2009. [pdf] [More Info]
	"Multi-Scale Object Detection by Clustering Lines" B. Ommer and J. Malik, ICCV 2009. [pdf]
	"Max-Margin Additive Classifiers for Detection" S. Maji and A. C. Berg , ICCV 2009. [pdf]
	"Context by Region Ancestry" J.J. Lim, P. Arbelaez, C. Gu and J. Malik. ICCV 2009. [pdf]
	"Understanding Rapid Category Detection via Multiply Degraded Images" C. Nandakumar and J. Malik. JOV 2009. [pdf]
	"Object Detection Using a Max-Margin Hough Tranform" S. Maji and J. Malik. CVPR 2009. [pdf]
	"Recognition Using Regions" C. Gu, J.J. Lim, P. Arbelaez and J. Malik. CVPR 2009. [pdf] [talk] [code]
	"Classification using Intersection Kernel SVMs is Efficient" S. Maji, A. C. Berg and J. Malik. CVPR 2008. [pdf] [more info]
	"Parsing Images of Architectural Scenes" A. C. Berg , F. Grabler, J. Malik. ICCV 2007. [pdf]
	"Learning Distance Functions for Exemplar-Based Object Recognition." A. Frome, PhD Thesis, 2007. [pdf]
	"Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification" A. Frome, Y. Singer, F. Sha, J. Malik. ICCV 2007. [pdf] [talk]
	"Learning to Locate Informative Features for Visual Identification" A. Ferencz, E. Learned-Miller, J. Malik. IJCV 2008. [pdf] [more info]
	"Image Retrieval and Classification Using Local Distance Functions" A. Frome, Y. Singer, J. Malik. NIPS 2006. [pdf] [poster]
	"SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition" H. Zhang, A. C. Berg , M. Maire, J. Malik. CVPR 2006 [pdf]
	"Recovering 3D Human Body Configurations Using Shape Contexts" G. Mori, J. Malik. IEEE PAMI Jul. 2006 [pdf]
	"Efficient Shape Matching Using Shape Contexts" G. Mori, S. Belongie, J. Malik. IEEE PAMI Nov. 2005 [pdf]
	"Recovering Human Body Configurations using Pairwise Constraints between Parts" X. Ren, A. C. Berg, and J. Malik. ICCV 2005 [pdf]
	"Building a Classification Cascade for Visual Identification from One Example" A. Ferencz, E. Learned-Miller, J. Malik. ICCV 2005 [pdf] [more info]
	"Shape Matching and Object Recognition using Low Distortion Correspondence" A. C. Berg , T. L. Berg, J. Malik. CVPR 2005 [pdf] [ppt]
	"Strike a Pose: Tracking People by Finding Stylized Poses" D. Ramanan, D. A. Forsyth, and A. Zisserman. CVPR 2005 [pdf]
	"Detecting, Localizing, and Recovering Kinematics of Textured Animals" D. Ramanan, D. A. Forsyth, and K. Barnard. CVPR 2005 [pdf]
	"Object Recognition using Locality Sensitive Hashing of Shape Contexts" A. Frome, J. Malik. Nearest-Neighbor Methods in Learning and Vision, Eds. G. Shakhnarovich, T. Darrell, P. Indyk. 2005. [pdf] formatting and pagination differ from printed version
	"An Information Maximization Model of Eye Movements" L. Walker Renninger, J. Coughlan, P. Verghese, and J. Malik. NIPS 2004 [pdf]
	"Learning Hyper-Features for Visual Identification" A. Ferencz, E. Learned-Miller, J. Malik. NIPS 2004 [pdf] [more info]
	"Who's in the Picture" T. L. Berg, A. C. Berg, J. Edwards, and D. A. Forsyth. NIPS 2004 [pdf] [ps.gz]
	"Names and Faces in the News" T. L. Berg, A. C. Berg, J. Edwards, M. Maire, R. White, Y. W. Teh, E. Learned-Miller, and D. A. Forsyth. CVPR 2004 [pdf] [ps.gz]
	"Recovering Human Body Configurations: Combining Segmentation and Recognition" G. Mori, X. Ren, A. A. Efros and J. Malik. CVPR 2004 [pdf] [ppt]
	"Twist Based Acquisition and Tracking of Animal and Human Kinematics" C. Bregler, J. Malik, and K. Pullen IJCV 2004. [pdf]
	"When is scene identification just texture recognition?" L. Walker Renninger, and J. Malik. Vision Research 2004 [pdf] also presented at Vision Sciences Meeting, 2002
	"Recognizing Objects in Range Data Using Regional Point Descriptors" A. Frome, D. Huber, R. Kolluri, T. Buelow, and J. Malik. ECCV 2004 [pdf]
	"Automatic Annotation of Everyday Movements" D. Ramanan and D. A. Forsyth NIPS 2003 [pdf]
	"Recognizing Action at a Distance" A. A. Efros, A. C. Berg, G. Mori, and J. Malik. ICCV 2003. [pdf] [slides]
	"Using Temporal Coherence to Build Models of Animals" D. Ramanan and D. A. Forsyth. ICCV 2003 [pdf]
	"Finding and Tracking People from the Bottom Up" D. Ramanan and D. A. Forsyth. CVPR 2003 [pdf]
	"Learning a Discriminative Classifier using Shape Context Distances" H. Zhang, and J. Malik. CVPR 2003 [pdf]
	"Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA" G. Mori, and J. Malik. CVPR 2003 [pdf] [ppt] [more info]
	"Estimating Human Body Configurations using Shape Context Matching", G, Mori and J. Malik. &nbssp; ECCV 2002 [pdf] also Workshop on Models versus Exemplars 2001 [pdf]
	"Shape Matching and Object Recognition Using Shape Contexts" S. Belongie, J. Malik, and J. Puzicha. PAMI 2002 [pdf] [more info]
	"Geometric Blur for Template Matching" A.C. Berg and J. Malik. CVPR 2001 [pdf]
	"Shape Contexts Enable Efficient Retrieval of Similar Shapes" G. Mori, S. Belongie, and J. Malik. CVPR 2001 [pdf] [more info]
	"Matching Shapes" S. Belongie, J. Malik, and J. Puzicha. ICCV 2001 [pdf] [ppt]
	"Human Tracking with Mixtures of Trees" S. Ioffe and D. A. Forsyth. ICCV 2001 [pdf]
	"Probabilistic methods for finding people" S. Ioffe and D. A. Forsyth. IJCV 2001 [pdf]
	"Shape Context: A new descriptor for shape matching and object recognition" S. Belongie, J. Malik, and J. Puzicha. NIPS 2000 [ps.gz]
	"Matching with Shape Contexts" S. Belongie and J. Malik. CBAIVL 2000 [ps.gz]
	"Finding People by Sampling" S. Ioffe and D. A. Forsyth ICCV 1999 [pdf][ps.gz]
	"Automatic Detection of Human Nudes" D. A. Forsyth and M. Fleck. IJCV 1999 [pdf]
	"Tracking People with Twists and Exponential Maps" C. Bregler and J. Malik. CVPR 1998 [ps.gz] [pdf] [More info]
	"Learning and Recognizing Human Dynamics in Video Sequences" C. Bregler. CVPR 1997 [pdf] [More info]
	"Video Rewrite: Driving Visual Speech with Audio" C. Bregler, M. Covell, and M. Slaney. SIGGRAPH 1997 [pdf] [More info]
	"Finding Naked People" M. Fleck, D. A. Forsyth, and C. Bregler. ECCV 1996 [pdf]

Back to Berkeley Computer Vision page
Questions —> Bharath Hariharan