Andrea Lynn Frome

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2007-98

August 8, 2007

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-98.pdf

This thesis investigates an exemplar-based approach to object recognition that learns, on an image-by-image basis, the relative importance of patch-based features for determining similarity. We borrow the idea of "family resemblances" from Wittgenstein's Philosophical Investigations and Eleanor Rosch's psychological studies to support the idea of learning the detailed relationships between images of the same category, which is a departure from some popular machine learning approaches such as Support Vector Machines that seek only the boundaries between categories.

We represent images as sets of patch-based features. To find the distance between two images, we first find for each patch its nearest patch in the other image and compute their inter-patch distance. The weighted sum of these inter-patch distances is defined to be the distance between the two images. The main contribution of this thesis is a method for learning a set-to-set distance function specific to each training image and demonstrating the use of these functions for image browsing, retrieval, and classification. The goal of the learning algorithm is to assign a non-negative weight to each patch-based feature of the image such that the most useful patches are assigned large weights and irrelevant or confounding patches are given zero weights. We formulate this as a large-margin optimization, related to the soft-margin Support Vector Machine, and discuss two versions: a "focal" version that learns weights for each image separately, and a "global" version that jointly learns the weights for all training images. In the focal version, the distance functions learned for the training images are not directly comparable to one another and can be most directly applied to in-sample applications such as image browsing, though with heuristics or additional learning, these functions can be used for image retrieval or classification. The global approach, however, learns distance functions that are globally consistent and can be used directly for image retrieval and classification. Using geometric blur and simple color features, we show that both versions perform as well or better than the best-performing algorithms on the Caltech 101 object recognition benchmark. The global version achieves the best results, a 63.2% mean recognition rate when trained with fifteen images per category and 66.6% when trained with twenty.

Advisors: Jitendra Malik


BibTeX citation:

@phdthesis{Frome:EECS-2007-98,
    Author= {Frome, Andrea Lynn},
    Title= {Learning Local Distance Functions for Exemplar-Based Object Recognition},
    School= {EECS Department, University of California, Berkeley},
    Year= {2007},
    Month= {Aug},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-98.html},
    Number= {UCB/EECS-2007-98},
    Abstract= {This thesis investigates an exemplar-based approach to object recognition that learns, on an image-by-image basis, the relative importance of patch-based features for determining similarity.  We borrow the idea of "family resemblances" from Wittgenstein's Philosophical Investigations and Eleanor Rosch's psychological studies to support the idea of learning the detailed relationships between images of the same category, which is a departure from some popular machine learning approaches such as Support Vector Machines that seek only the boundaries between categories.

We represent images as sets of patch-based features.  To find the distance between two images, we first find for each patch its nearest patch in the other image and compute their inter-patch distance.  The weighted sum of these inter-patch distances is defined to be the distance between the two images.  The main contribution of this thesis is a method for learning a set-to-set distance function specific to each training image and demonstrating the use of these functions for image browsing, retrieval, and classification.  The goal of the learning algorithm is to assign a non-negative weight to each patch-based feature of the image such that the most useful patches are assigned large weights and irrelevant or confounding patches are given zero weights.  We formulate this as a large-margin optimization, related to the soft-margin Support Vector Machine, and discuss two versions: a "focal" version that learns weights for each image separately, and a "global" version that jointly learns the weights for all training images.  In the focal version, the distance functions learned for the training images are not directly comparable to one another and can be most directly applied to in-sample applications such as image browsing, though with heuristics or additional learning, these functions can be used for image retrieval or classification. The global approach, however, learns distance functions that are globally consistent and can be used directly for image retrieval and classification.  Using geometric blur and simple color features, we show that both versions perform as well or better than the best-performing algorithms on the Caltech 101 object recognition benchmark.  The global version achieves the best results, a 63.2% mean recognition rate when trained with fifteen images per category and 66.6% when trained with twenty.},
}

EndNote citation:

%0 Thesis
%A Frome, Andrea Lynn 
%T Learning Local Distance Functions for Exemplar-Based Object Recognition
%I EECS Department, University of California, Berkeley
%D 2007
%8 August 8
%@ UCB/EECS-2007-98
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-98.html
%F Frome:EECS-2007-98