CS 294-6 Recognizing People, Objects and Actions

Jitendra Malik
Spring 2004
405 Soda Hall
Tu 4-6

Course Content

This course will be designed around the challenge problem of making
computers aware of the everyday visual world i.e. process images or
video to be able to recognize categories such as cars, buses, tigers,
zebras, rooms, doors, telephones, faces, arms and hands  as well as
actions such as running, jumping and kicking. Topics will include a
survey of human visual recognition: perception and physiology,
recognition in the presence of transformations, local matching
techniques, global matching techniques, segmentation as a front end,
motion descriptors for action recognition, as well as case studies of
recognition in different domains. I have a specific list of about 300
visual categories to focus our thoughts.

Lecture Topics

Introduction: Characteristics of visual recognition. Prototypes and affordances. Basic, Superordinate and subordinate categories (reference: Palmer, Chapter 9)
Human visual system

Basic computations in retina, LGN, V1, V2
Models of receptive fields-center-surround, oriented, simple/complex
Cortical magnification factor, log-polar mapping

Five approaches to handwritten digit recognition

Template matching using distance transform variants

Chamfer distance

Hausdorff distance

Discussion of transformations in general

D'Arcy Thompson, Fischler and Elschlager, Grenander
Similarity and Affine transforms
Smooth diffeomorphisms, Thin Plate Splines

Local scale-invariant keypoint features

Pose estimation, pose clustering, geometric hashing, basis views
Multiple view approaches to 3D objects - aspects, k-medoids
Perceptual Organization - Grouping, figure/ground
The Human Body
Human Movement
Scenes.
Project presentations.

There is no required text for this course. Steve Palmer's Vision Science and Forsyth and Ponce's Computer vision: A Modern Approach have useful source material.

We will use a scribe system to make course notes available through the semester. Each lecture, one or two students will take turns taking notes and typing them up. I'll edit and make the notes available on the web.

The grade will be determined by a combination of home assignments, scribe notes, and a final project. The project could be the mathematical/statistical analysis of a visual task or the implementation of some interesting algorithm or some psychophysical experiment.

You'll be encouraged to work in teams for the projects and for the home assignments.

I hope you enjoy the course!

General Papers

Max Wertheimer, Laws of Organization in Perceptual forms (1923)

Lecture Notes

Homework

Homework 1