Tech Reports | EECS at UC Berkeley

Ning Zhang

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2015-244

December 17, 2015

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-244.pdf

In contrast to basic-level object recognition, fine-grained categorization aims to distinguish between subordinate categories, such as different animal breeds or species, plant species or man-made product models. The problem can be extremely challenging due to the subtle differences in the appearance of certain parts across related categories and often requires distinctions that must be conditioned on the object pose for reliable identification. Discrimi- native markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variations often present in these domains. Face recognition is the classic case of fine-grained recognition, and it is noteworthy that the best face recog- nition methods jointly discover facial landmarks and extract features from those locations. We propose pose-normalized representations, which align training exemplars, either piece- wise by part or globally for the whole object, effectively factoring out differences in pose and in camera viewing angle.

I first present the methods of using the idea of pose-normalization for two related applica- tions: human attribute classification and person recognition beyond frontal face. Following the recent success of deep learning, we use deep convolutional features as feature represen- tations. Next, I will introduce the part-based RCNN method as an extension of state-of-art object detection method RCNN for fine-grained categorization. The model learns both whole-object and part detectors, and enforces learned geometric constraints between them. I will also show the results of using the recent compact bilinear features to generate the pose-normalized representations. However, bottom-up region proposals is limited by hand- engineered features and in the final work, I will present a fully convolution deep network, trained end-to-end for part localization and fine-grained classification.

Advisors: Trevor Darrell

BibTeX citation:

@phdthesis{Zhang:EECS-2015-244,
    Author= {Zhang, Ning},
    Title= {Visual Representations for Fine-grained Categorization},
    School= {EECS Department, University of California, Berkeley},
    Year= {2015},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-244.html},
    Number= {UCB/EECS-2015-244},
    Abstract= {In contrast to basic-level object recognition, fine-grained categorization aims to distinguish between subordinate categories, such as different animal breeds or species, plant species or man-made product models. The problem can be extremely challenging due to the subtle differences in the appearance of certain parts across related categories and often requires distinctions that must be conditioned on the object pose for reliable identification. Discrimi- native markings are often highly localized, leading traditional object recognition approaches to struggle with the large pose variations often present in these domains. Face recognition is the classic case of fine-grained recognition, and it is noteworthy that the best face recog- nition methods jointly discover facial landmarks and extract features from those locations. We propose pose-normalized representations, which align training exemplars, either piece- wise by part or globally for the whole object, effectively factoring out differences in pose and in camera viewing angle.

I first present the methods of using the idea of pose-normalization for two related applica- tions: human attribute classification and person recognition beyond frontal face. Following the recent success of deep learning, we use deep convolutional features as feature represen- tations. Next, I will introduce the part-based RCNN method as an extension of state-of-art object detection method RCNN for fine-grained categorization. The model learns both whole-object and part detectors, and enforces learned geometric constraints between them. I will also show the results of using the recent compact bilinear features to generate the pose-normalized representations. However, bottom-up region proposals is limited by hand- engineered features and in the final work, I will present a fully convolution deep network, trained end-to-end for part localization and fine-grained classification.},
}

EndNote citation:

%0 Thesis
%A Zhang, Ning 
%T Visual Representations for Fine-grained Categorization
%I EECS Department, University of California, Berkeley
%D 2015
%8 December 17
%@ UCB/EECS-2015-244
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-244.html
%F Zhang:EECS-2015-244