Learning to Recognize Objects from Unseen Modalities

Overview

We investigate the problem of exploiting multiple sources of information for object recognition tasks when additional modalities that are not present in the labeled training set are available for inference. This scenario is common to many robotics sensing applications and is in contrast with the assumption made by existing approaches that require at least some labeled examples for each modality. To leverage the previously unseen features, we make use of the unlabeled data to learn a mapping from the existing modalities to the new ones (see Figure 1). This allows us to predict the missing data for the labeled examples and exploit all modalities using multiple kernel learning. We demonstrate the effectiveness of our approach on several multi-modal tasks including object recognition from multi-resolution imagery, grayscale and color images, and images and text. Our approach outperforms multiple kernel learning on the original modalities, as well as nearest-neighbor and bootstrapping schemes.

This work will appear in the European Conference on Computer Vision this September (see below). The features and code used in the paper are available for download below.

Figure 1: Inferring color from intensity. With our approach the unlabeled test data is leveraged to learn a mapping between the labeled intensity channel and the unlabeled color features. The color features are then `hallucinated' on the labeled intensity-only training set using the learned regression function and used for classification. We evaluate our approach on a variety of problem domains, including multi-resolution imagery, grayscale and color images, and images and text.

Supplementary Results

The additional results referenced in Section 4.2 of the paper, over all feature combinations on the natural scenes dataset are available here.

Datasets

The dataset features and train and test splits are available in MATLAB .mat format below for a subset of the datasets in the paper. The remaining feature files are available from the provided project webpages.

The feature files below contain the following variables:

For each modality, X_(modality): D x N feature matrices
For each modality, D_(modality): N x N distance matrices
L: N x 1 label vector

with D the dimension of each feature vector and N the number of examples.

For each dataset two training /test split files are included, one over training set size and the other over different kernel PCA dimensions. Each split file contains the following variables:

N: vector of training set sizes
d: vector of kernel PCA dimensionalities
S: number of splits per training set size
splits: |N| x 1 cell array of training/test split structure, each cell containing an array of S structures

The relevant papers where each dataset was published are listed below.

Robotics Dataset: K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. In the Proceedings of the European Conference on Computer Vision (ECCV), 2010.
Oxford Flowers Dataset: M. E. Nilsback and A. Zisserman. A visual vocabulary for flower classification. In the Proceedings of Computer Vision and Pattern Recognition (CVPR), 2006.
Birds Dataset: S. Lazebnick, C. Schmid, and J. Ponce. A maximum entropy framework for part-basaed texture and object recognition. In the Proceedings of Computer Vision and Pattern Recognition (CVPR), 2005.
Butterflies Dataset: S. Lazebnik, C. Schmid, and J. Ponce. Semi-local affine parts for object recognition. In the Proceedings of the British Machine Vision Conference (BMVC), 2004.
Office Dataset: K. Saenko and T. Darrell. Filtering abstract senses from image search results. In the Proceedings of Neural Information and Processing Systems (NIPS), 2008.
Mouse Dataset: K. Saenko and T. Darrell. Unsupervised learning of visual sense models for polysemous words. In the Proceedings of Neural Information and Processing Systems (NIPS), 2009.

Dataset	Features	Train/Test Splits	Relevant Link
Robotics	robotics_features.mat	robotics_dims_splits.mat robotics_trainsize_splits.mat	Domain Adaptation Project Page
Oxford Flowers	distancematrices17gcfeat06.mat (from project page)	flowers_dims_splits.mat flowers_trainsize_splits.mat	Oxford Flowers Project Page
Birds	birds_features.mat	birds_dims_splits.mat birds_trainsize_splits.mat	Ponce Group Project Page
Butterflies	butterflies_features.mat	butterflies_dims_splits.mat butterflies_trainsize_splits.mat	Ponce Group Project Page
Office	NA	NA	NA
Mouse	NA	NA	NA

* NA = not yet available

Code

A MATLAB implementation of the feature hallucination method described in the paper is available here.

This code has been tested with MATLAB version 7.x. Included in the archive is a README that describes its contents and usage.

Publications

C. Mario Christoudias, Raquel Urtasun, Mathieu Salzmann and Trevor Darrell. Learning to Recognize Objects from Unseen Modalities. In the Proceedings of the European Conference on Computer Vision, September, 2010. [pdf]