CS 294-6 Recognizing People, Objects and Actions
Jitendra Malik
Spring 2004
405 Soda Hall
Tu 4-6
This course will be designed around the challenge problem of making
computers aware of the everyday visual world i.e. process images or
video to be able to recognize categories such as cars, buses, tigers,
zebras, rooms, doors, telephones, faces, arms and hands as well as
actions such as running, jumping and kicking. Topics will include a
survey of human visual recognition: perception and physiology,
recognition in the presence of transformations, local matching
techniques, global matching techniques, segmentation as a front end,
motion descriptors for action recognition, as well as case studies of
recognition in different domains. I have a specific list of about 300
visual categories to focus our thoughts.
Lecture Topics
- Introduction: Characteristics of visual recognition. Prototypes
and affordances. Basic, Superordinate and subordinate categories
(reference: Palmer, Chapter 9)
- Human visual system
- Basic computations in retina, LGN, V1, V2
- Models of receptive fields-center-surround, oriented,
simple/complex
- Cortical magnification factor, log-polar mapping
- Five approaches to handwritten digit recognition
- LeCun's
convolutional neural nets
- Simard et al's Tangent
Distance
- Belongie,
Malik and Puzicha: Shape Contexts
- Decoste &
Scholkopf : Invariant SVMs
- Amit, Geman and
Wilder: Randomized Decision Trees
- Template matching using distance transform variants
- Chamfer distance
- Barrow et al
- Borgefors
- Gavrila & Philomin
- Hausdorff distance
- Huttenlocher, Klanderman
& Rucklidge
- Olson &
Huttenlocher
- Discussion of transformations in general
- D'Arcy Thompson, Fischler and Elschlager, Grenander
- Similarity and Affine transforms
- Smooth diffeomorphisms, Thin Plate Splines
- Local scale-invariant keypoint features
- David
Lowe, Distinctive Image Features from Scale-Invariant Keypoints,
IJCV 04
- Tony
Lindeberg, Principles for Automatic Scale Selection, CVAP KTH Tech
Report
- Pose estimation, pose clustering, geometric hashing, basis
views
- Multiple view approaches to 3D objects - aspects, k-medoids
- Perceptual Organization - Grouping, figure/ground
- The Human Body
- Human Movement
- Scenes.
- Project presentations.
There is no required text for this course. Steve Palmer's
Vision Science and Forsyth and Ponce's Computer vision: A
Modern
Approach have useful source material.
We will use a scribe system to make course notes available through
the semester. Each lecture, one or two students will take turns taking
notes and typing them up. I'll edit and make the notes available on
the web.
The grade will be determined by a combination of home assignments,
scribe notes, and a final project. The project could be the
mathematical/statistical analysis of a visual task or the
implementation of some interesting algorithm or some psychophysical
experiment.
You'll be encouraged to work in teams for the projects and for the
home assignments.
I hope you enjoy the course!
General Papers
Lecture Notes