Learning to Recognize Objects from Unseen Modalities

Overview

We investigate the problem of exploiting multiple sources of information for object recognition tasks when additional modalities that are not present in the labeled training set are available for inference. This scenario is common to many robotics sensing applications and is in contrast with the assumption made by existing approaches that require at least some labeled examples for each modality. To leverage the previously unseen features, we make use of the unlabeled data to learn a mapping from the existing modalities to the new ones (see Figure 1). This allows us to predict the missing data for the labeled examples and exploit all modalities using multiple kernel learning. We demonstrate the effectiveness of our approach on several multi-modal tasks including object recognition from multi-resolution imagery, grayscale and color images, and images and text. Our approach outperforms multiple kernel learning on the original modalities, as well as nearest-neighbor and bootstrapping schemes.

This work will appear in the European Conference on Computer Vision this September (see below). The features and code used in the paper are available for download below.



Figure 1: Inferring color from intensity. With our approach the unlabeled test data is leveraged to learn a mapping between the labeled intensity channel and the unlabeled color features. The color features are then `hallucinated' on the labeled intensity-only training set using the learned regression function and used for classification. We evaluate our approach on a variety of problem domains, including multi-resolution imagery, grayscale and color images, and images and text.

Supplementary Results

The additional results referenced in Section 4.2 of the paper, over all feature combinations on the natural scenes dataset are available here.

Datasets

The dataset features and train and test splits are available in MATLAB .mat format below for a subset of the datasets in the paper. The remaining feature files are available from the provided project webpages.

The feature files below contain the following variables:
  • For each modality, X_(modality): D x N feature matrices
  • For each modality, D_(modality): N x N distance matrices
  • L: N x 1 label vector
with D the dimension of each feature vector and N the number of examples.

For each dataset two training /test split files are included, one over training set size and the other over different kernel PCA dimensions. Each split file contains the following variables:
  • N: vector of training set sizes
  • d: vector of kernel PCA dimensionalities
  • S: number of splits per training set size
  • splits: |N| x 1 cell array of training/test split structure, each cell containing an array of S structures
The relevant papers where each dataset was published are listed below.
  • Robotics Dataset: K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. In the Proceedings of the European Conference on Computer Vision (ECCV), 2010.
  • Oxford Flowers Dataset: M. E. Nilsback and A. Zisserman. A visual vocabulary for flower classification. In the Proceedings of Computer Vision and Pattern Recognition (CVPR), 2006.
  • Birds Dataset: S. Lazebnick, C. Schmid, and J. Ponce. A maximum entropy framework for part-basaed texture and object recognition. In the Proceedings of Computer Vision and Pattern Recognition (CVPR), 2005.
  • Butterflies Dataset: S. Lazebnik, C. Schmid, and J. Ponce. Semi-local affine parts for object recognition. In the Proceedings of the British Machine Vision Conference (BMVC), 2004.
  • Office Dataset: K. Saenko and T. Darrell. Filtering abstract senses from image search results. In the Proceedings of Neural Information and Processing Systems (NIPS), 2008.
  • Mouse Dataset: K. Saenko and T. Darrell. Unsupervised learning of visual sense models for polysemous words. In the Proceedings of Neural Information and Processing Systems (NIPS), 2009.

Dataset Features Train/Test Splits Relevant Link
Robotics robotics_features.mat robotics_dims_splits.mat
robotics_trainsize_splits.mat
Domain Adaptation Project Page
Oxford Flowers distancematrices17gcfeat06.mat
(from project page)
flowers_dims_splits.mat
flowers_trainsize_splits.mat
Oxford Flowers Project Page
Birds birds_features.mat birds_dims_splits.mat
birds_trainsize_splits.mat
Ponce Group Project Page
Butterflies butterflies_features.mat butterflies_dims_splits.mat
butterflies_trainsize_splits.mat
Ponce Group Project Page
Office NA NA NA
Mouse NA NA NA
* NA = not yet available

Code

A MATLAB implementation of the feature hallucination method described in the paper is available here.

This code has been tested with MATLAB version 7.x. Included in the archive is a README that describes its contents and usage.

Publications

  • C. Mario Christoudias, Raquel Urtasun, Mathieu Salzmann and Trevor Darrell. Learning to Recognize Objects from Unseen Modalities. In the Proceedings of the European Conference on Computer Vision, September, 2010. [pdf]