Grouping

Grouping and Perceptual Organization

Overview

Early and intermediate level visual processing can be modeled as a multi stage process. The image is first processed by spatiotemporal receptive fields tuned to orientation, spatial frequency, opponent color, and short-range motion. This is followed by a grouping stage resulting in the formation of regions of coherent brightness, color and texture. Call these 'proto-surfaces'. We model this as a process of finding a partition of the image into regions such that there is high similarity within a region and low similarity across regions. This is made precise as the Normalized cut criterion which can be optimized by solving a generalized eignevalue problem. The resulting eigenvectors provide a herarchical partitioning of the image into regions ordered according to salience. Brightness, color, texture, motion similarity, proximity and good continuation can all be encoded into this framework. We have demonstrated results using these multiple cues for segmenting arbitrary gray-level images.

We believe these results form a good substrate for work on object recognition as well as figure-ground processing.

RESULTS WITH COMBINED CUES OF TEXTURE AND CONTOUR (as of 11 Nov 1999)

Specifics for the results:

Each image is a grayscale image which has been partitioned into regions. The boundary of each region is shown in red. Regions are disjoint and their union is the whole image.
The images are drawn from the Corel image database. The original RGB images have been converted to grayscale using the matlab RGBtogray function. Color is not used in these segmentations. It can be readily incorporated in our framework, but we chose not to use it now in order to provide a more stringent test for the quality of image segmentations.
We particularly draw your attention to the fact that these images contain regions with boundaries defined by any of a combination of brightness edges, lines (as in a cartoon sketch) or texture differences.
All results have been obtained with exactly the same algorithm and parameters.

Animals
People
Architecture/Interior Man-made Scenes
Natural Scenes
Paintings

Results for nearly 1000 gray scale images from the Corel image database are available here.

RELEVANT PUBLICATIONS

A Database of Human Segmented Natural Images and its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics
J. Malik, D. Martin, C. Fowlkes, and D. Tal
[Submitted to International Conference on Computer Vision, 2001.] Paper.
[Also avaliable as Technical Report No. UCB/CSD-1-1133, Computer Science Division, University of California at Berkeley, January, 2001.] Tech Report.

This paper presents a database containing 'ground truth' segmentations produced by humans for images of a wide variety of natural scenes. We define an error measure which quantifies the consistency between segmentations of differing granularities and find that different human segmentations of the same image are highly consistent. Use of this dataset is demonstrated in two applications: (1) evaluating the performance of segmentation algorithms and (2) measuring probability distributions associated with Gestalt grouping factors as well as statistics of image region properties.

Contour and Texture Analysis for Image Segmentation
J. Malik, S. Belongie, T. Leung, and J. Shi
[To appear in International Journal of Computer Vision, 2001.] Paper.
[Also in Perceptual Organization for Artificial Vision Systems, K.L. Boyer and S. Sarkar, editors. Kluwer Academic Publishers, 2000]

This paper provides an algorithm for partitioning grayscale images into disjoint regions of coherent brightness and texture. Natural images contain both textured and untextured regions, so the cues of contour and texture differences are exploited simultaneously. Contours are treated in the {\em intervening contour} framework, while texture is analyzed using {\em textons}. Each of these cues has a domain of applicability, so to facilitate cue combination we introduce a gating operator based on the texturedness of the neighborhood at a pixel. Having obtained a local measure of how likely two nearby pixels are to belong to the same region, we use the spectral graph theoretic framework of normalized cuts to find partitions of the image into regions of coherent texture and brightness. Experimental results on a wide range of images are shown.

Normalized Cuts and Image Segmentation
J. Shi and J. Malik.
[ IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), August, 2000, pp. 888-905.] Paper.

We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion. We have applied this approach to segmenting static images as well as motion sequences and found results very encouraging.

Textons, Contours and Regions: Cue Combination in Image Segmentation
J. Malik, S. Belongie, J. Shi, and T. Leung
[International Conference on Computer Vision, September 1999] Paper.

This paper makes two contributions. It provides (1) an operational definition of textons, the putative elementary units of texture perception, and (2) an algorithm for partitioning the image into disjoint regions of coherent brightness and texture, where boundaries of regions are defined by peaks in contour orientation energy and differences in texton densities across the contour.

Julesz introduced the term texton, analogous to a phoneme in speech recognition, but did not provide an operational definition for gray-level images. Here we re-invent textons as frequently co-occurring combinations of oriented linear filter outputs. These can be learned using a K-means approach. By mapping each pixel to its nearest texton, the image can be analyzed into texton channels, each of which is a point set where discrete techniques such as Voronoi diagrams become applicable.

Local histograms of texton frequencies can be used with a chi squared test for significant differences to find texture boundaries. Natural images contain both textured and untextured regions, so we combine this cue with that of the presence of peaks of contour energy derived from outputs of odd- and even-symmetric oriented Gaussian derivative filters. Each of these cues has a domain of applicability, so to facilitate cue combination we introduce a gating operator based on a statistical test for isotropy of Delaunay neighbors. Having obtained a local measure of how likely two nearby pixels are to belong to the same region, we use the spectral graph theoretic framework of normalized cuts to find partitions of the image into regions of coherent texture and brightness. Experimental results on a wide range of images are shown.

Normalized Cuts and Image Segmentation
J. Shi and J. Malik
[IEEE Conf. Computer Vision and Pattern Recognition, June 1997] Paper.

We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion. We have applied this approach to segmenting static images and found results very encouraging.

Motion Segmentation and Tracking Using Normalized Cuts
J. Shi and J. Malik
[ International Conference on Computer Vision, January 1998] Paper.

We propose a motion segmentation algorithm that aims to break a scene into its most prominent moving groups. A weighted graph is constructed on the image sequence by connecting pixels that are in the spatiotemporal neighborhood of each other. At each pixel, we define motion profile vectors which capture the probability distribution of the image velocity. The distance between motion profiles is used to assign a weight on the graph edges. Using normalized cuts we find the most salient partitions of the spatiotemporal graph formed by the image sequence. For segmenting long image sequences, we have developed a recursive update procedure that incorporates knowledge of segmentation in previous frames for efficiently finding the group correspondence in the new frame.

Contour continuity in region based image segmentation
T. Leung and J. Malik
[Fifth European Conference on Computer Vision, June 1998] Paper.

Region-based image segmentation techniques make use of similarity in intensity, color and texture to determine the partitioning of an image. The powerful cue of contour continuity is not exploited at all. In this paper, we provide a way of incorporating curvilinear grouping into region-based image segmentation. Soft contour information is obtained through orientation energy. Weak contrast gaps and subjective contours are completed by contour propagation. The normalized cut approach proposed by Shi and Malik is used for the segmentation. Results on a large variety of images are shown.

Finding Boundaries in Natural Images: A New Method Using Point Descriptors and Area Completion
S. Belongie and J. Malik
[Fifth European Conference on Computer Vision, June 1998] Paper.

There are several reasons why a satisfactory solution to image segmentation for natural scenes has remained elusive. Perhaps the foremost of these is image texture. One general methodology which shows promise for solving this problem is to characterize textured regions via their responses to a set of filters. However, this approach brings with it many open questions, including how to combine texture and intensity information into a common descriptor and how to deal with the fact that filter responses inside textured regions are generally spatially inhomogeneous. Our goal in this work is to introduce two new ideas which address these open questions and to demonstrate the application of these ideas to the segmentation of natural images. The first idea consists of a novel means of describing points in natural images and an associated distance function for comparing these descriptors. This distance function is aided in textured regions by the use of the second idea, a new process introduced here which we have termed area completion. Experimental segmentation results which incorporate our proposed approach are provided for a variety of natural images.

Self Inducing Relational Distance and its Application to Image Segmentation
J. Shi and J. Malik
[Fifth European Conference on Computer Vision, June 1998] Paper

We propose a new feature distance which is derived from an optimal relational graph matching criterion. Instead of defining an arbitrary similarity measure for grouping, we will use the criterion of reducing instability in the relational graph to induce a similarity measure. This similarity measure not only improves the stability of the matching, but more importantly, also captures the relative importance of relational similarity in the feature space for the purpose of grouping. We will call this similarity measure the self-induced relational distance. We demonstrate the distance measure on a brightness-texture feature space and apply it to the segmentation of complex natural images.