We believe these results form a good substrate for work on object
recognition as well as figure-ground processing.
RESULTS WITH COMBINED
CUES OF TEXTURE AND CONTOUR
(as of 11 Nov 1999)
Overview
Early and intermediate level visual processing can be modeled as a
multi stage process. The image is first processed by spatiotemporal
receptive fields tuned to orientation, spatial frequency, opponent
color, and short-range motion. This is followed by a grouping stage
resulting in the formation of regions of coherent brightness, color
and texture. Call these 'proto-surfaces'. We model this as a process
of finding a partition of the image into regions such that there is
high similarity within a region and low similarity across
regions. This is made precise as the Normalized cut criterion
which can be optimized by solving a generalized eignevalue
problem. The resulting eigenvectors provide a herarchical partitioning
of the image into regions ordered according to salience. Brightness,
color, texture, motion similarity, proximity and good continuation can
all be encoded into this framework. We have demonstrated results using
these multiple cues for segmenting arbitrary gray-level images.
Specifics for the results:
Each image is a grayscale
image which has been partitioned into regions. The boundary of
each region is shown in red. Regions are disjoint and their union
is the whole image.
The images are drawn from
the Corel image
database. The original RGB images have been converted to
grayscale using the matlab RGBtogray function. Color is not used
in these segmentations. It can be readily incorporated in our
framework, but we chose not to use it now in order to provide a
more stringent test for the quality of image segmentations.
We particularly draw your
attention to the fact that these images contain regions with
boundaries defined by any of a combination of brightness edges,
lines (as in a cartoon sketch) or texture differences.
All results have been
obtained with exactly the same algorithm and parameters.
Animals
People
Architecture/Interior Man-made Scenes
Natural Scenes
Paintings
Results for nearly 1000 gray scale images from the Corel image
database are available here.
RELEVANT PUBLICATIONS
Julesz introduced the term texton, analogous to a phoneme in speech
recognition, but did not provide an operational definition for
gray-level images. Here we re-invent textons as frequently
co-occurring combinations of oriented linear filter outputs. These can
be learned using a K-means approach. By mapping each pixel to its
nearest texton, the image can be analyzed into texton channels, each
of which is a point set where discrete techniques such as Voronoi
diagrams become applicable.
Local histograms of texton frequencies can be used with a chi squared
test for significant differences to find texture boundaries. Natural
images contain both textured and untextured regions, so we combine
this cue with that of the presence of peaks of contour energy derived
from outputs of odd- and even-symmetric oriented Gaussian derivative
filters. Each of these cues has a domain of applicability, so to
facilitate cue combination we introduce a gating operator based on a
statistical test for isotropy of Delaunay neighbors. Having obtained a
local measure of how likely two nearby pixels are to belong to the
same region, we use the spectral graph theoretic framework of
normalized cuts to find partitions of the image into regions of
coherent texture and brightness. Experimental results on a wide range
of images are shown.
A Database of Human Segmented Natural Images and its Application
to Evaluating Segmentation Algorithms and Measuring Ecological
Statistics
J. Malik,
D. Martin,
C. Fowlkes, and
D. Tal
[Submitted to International Conference on Computer Vision,
2001.]
Paper.
[Also avaliable as Technical Report No. UCB/CSD-1-1133,
Computer Science Division, University of California at Berkeley,
January, 2001.]
Tech Report.
This paper presents a database containing 'ground truth'
segmentations produced by humans for images of a wide variety
of natural scenes. We define an error measure which quantifies
the consistency between segmentations of differing granularities
and find that different human segmentations of the same image are
highly consistent. Use of this dataset is demonstrated in two
applications: (1) evaluating the performance of segmentation
algorithms and (2) measuring probability distributions associated
with Gestalt grouping factors as well as statistics of image
region properties.
Contour and Texture Analysis for Image Segmentation
J. Malik,
S. Belongie,
T. Leung, and
J. Shi
[To appear in International Journal of Computer Vision,
2001.]
Paper.
[Also in Perceptual Organization for Artificial
Vision Systems, K.L. Boyer and S. Sarkar, editors. Kluwer
Academic Publishers, 2000]
This paper provides an algorithm for partitioning grayscale images
into disjoint regions of coherent brightness and texture. Natural
images contain both textured and untextured regions, so the cues of
contour and texture differences are exploited simultaneously.
Contours are treated in the {\em intervening contour} framework, while
texture is analyzed using {\em textons}. Each of these cues has a
domain of applicability, so to facilitate cue combination we introduce
a gating operator based on the texturedness of the neighborhood at a
pixel. Having obtained a local measure of how likely two nearby
pixels are to belong to the same region, we use the spectral graph
theoretic framework of normalized cuts to find partitions of the image
into regions of coherent texture and brightness. Experimental results
on a wide range of images are shown.
Normalized Cuts and Image Segmentation
J. Shi and
J. Malik.
[ IEEE Transactions on Pattern Analysis and Machine
Intelligence, 22(8), August, 2000, pp. 888-905.]
Paper.
We propose a novel approach for solving the perceptual grouping
problem in vision. Rather than focusing on local features and their
consistencies in the image data, our approach aims at extracting the
global impression of an image. We treat image segmentation as a graph
partitioning problem and propose a novel global criterion, the
normalized cut, for segmenting the graph. The normalized cut criterion
measures both the total dissimilarity between the different groups as
well as the total similarity within the groups. We show that an
efficient computational technique based on a generalized eigenvalue
problem can be used to optimize this criterion. We have applied this
approach to segmenting static images as well as motion sequences and
found results very encouraging.
Textons, Contours and Regions: Cue Combination in Image
Segmentation
J. Malik,
S. Belongie,
J. Shi, and
T. Leung
[International Conference on Computer Vision, September 1999]
Paper.
This paper makes two contributions. It provides (1) an operational
definition of textons, the putative elementary units of texture
perception, and (2) an algorithm for partitioning the image into
disjoint regions of coherent brightness and texture, where boundaries
of regions are defined by peaks in contour orientation energy and
differences in texton densities across the contour.
Normalized Cuts and Image Segmentation
J. Shi and
J. Malik
[IEEE Conf. Computer Vision and Pattern Recognition, June 1997]
Paper.
We propose a novel approach for solving the perceptual
grouping problem in vision. Rather than focusing on local
features and their consistencies in the image data, our approach aims
at extracting the global impression of an image. We treat image
segmentation as a graph partitioning problem and propose a novel
global criterion, the normalized cut, for segmenting the
graph. The normalized cut criterion measures both the total
dissimilarity between the different groups as well as the total
similarity within the groups. We show that an efficient computational
technique based on a generalized eigenvalue problem can be used to
optimize this criterion. We have applied this approach to segmenting static images
and found results very encouraging.
![]()
Motion Segmentation and Tracking Using Normalized Cuts
J. Shi and
J. Malik
[ International Conference on Computer Vision, January 1998]
Paper.
We propose a motion segmentation algorithm that aims to break a scene
into its most prominent moving groups.
A weighted graph is constructed on the image sequence
by connecting pixels that are in the spatiotemporal
neighborhood of each other. At each pixel, we define motion profile
vectors which capture the probability distribution of the image
velocity. The distance between motion profiles is used to
assign a weight on the graph edges.
Using normalized cuts we find the most salient partitions
of the spatiotemporal graph formed by the image sequence.
For segmenting long image sequences, we have developed a recursive
update procedure that incorporates knowledge of segmentation in
previous frames for efficiently finding the group correspondence in
the new frame.
Contour continuity in region based image segmentation
T. Leung and
J. Malik
[Fifth European Conference on Computer Vision, June 1998]
Paper.
Region-based image segmentation techniques make use of similarity in intensity, color and
texture to determine the partitioning of an image. The powerful cue of contour continuity is
not exploited at all. In this paper, we provide a way of incorporating curvilinear grouping into
region-based image segmentation. Soft contour information is obtained through orientation
energy. Weak contrast gaps and subjective contours are completed by contour propagation.
The normalized cut approach proposed by Shi and Malik is used for the segmentation.
Results on a large variety of images are shown.
Finding Boundaries in Natural Images: A New Method Using Point
Descriptors and Area Completion
S. Belongie and
J. Malik
[Fifth European Conference on Computer Vision, June 1998]
Paper.
There are several reasons why a satisfactory solution to image segmentation for natural
scenes has remained elusive. Perhaps the foremost of these is image texture. One general
methodology which shows promise for solving this problem is to characterize textured
regions via their responses to a set of filters. However, this approach brings with it many
open questions, including how to combine texture and intensity information into a common
descriptor and how to deal with the fact that filter responses inside textured regions are
generally spatially inhomogeneous. Our goal in this work is to introduce two new ideas
which address these open questions and to demonstrate the application of these ideas to the
segmentation of natural images. The first idea consists of a novel means of describing points
in natural images and an associated distance function for comparing these descriptors. This
distance function is aided in textured regions by the use of the second idea, a new process
introduced here which we have termed area completion. Experimental segmentation results
which incorporate our proposed approach are provided for a variety of
natural images.
Self Inducing Relational Distance and its Application to Image Segmentation
J. Shi and
J. Malik
[Fifth European Conference on Computer Vision, June 1998]
Paper
We propose a new feature distance which is derived
from an optimal relational graph matching criterion. Instead of defining
an arbitrary similarity measure for grouping, we will use
the criterion of reducing instability in the relational graph to induce a similarity
measure. This similarity measure not only improves the stability of the matching, but
more importantly, also captures
the relative importance of relational similarity in the feature space for
the purpose of grouping.
We will call this similarity measure the self-induced relational distance.
We demonstrate the distance measure on a brightness-texture feature space and apply
it to the segmentation of complex natural images.