Biological Vision
Neurobiology
Our visual system depends on the use of nerve cells
called neurons. Human beings have about 1010
Neurons and these form around 104 connections.
The neuron consists of dendrites, where input is received, axons, where
output is transmitted, the cell body, where the input is processes, and
synapses, which are the connections between neurons. Neurons generally
have several inputs from the dendrites, but only one output through the
axon. Input from the dendrites adds up in the cell body and the cell will
only fire once a threshold of activation is reached.
Neurons can be organized as:
Note: The reason why we can tell that a sound is coming from the right or the left depends on the difference in time that the sound waves hit each ear. This time difference can be less than 10 ms. Thus, our nervous system is still capable of discriminating signals with an accuracy of less than 10 ms.
The brain is like a 6 layer handkerchief made out of neurons that is crumpled and shoved inside the skull. Connections among neurons on this "handkerchief" can be:
Studying the Brain
Several disciplines have approached the problem of investigating how
the brain works.
So far, what we can account for as far as object recognition is concerned, is that the measurement of optical flow and optical filtering occur in specific areas of the brain.
Ungerleider and Miskin (1982) proposed that there
are two streams of visual processing, Dorsal and Ventral.
The Dorsal stream begins in the Primary Visual Cortex and moves up
to the Posterior Parietal Globule. The Ventral stream begins in the Primary
Visual Cortex as well, but it moves down to the Inferotemporal Cortex.
The Dorsal stream involves the interaction of vision with the motor cortex.
Thus, the Dorsal stream may be responsible for the hand-eye coordination
required to do such activities as picking something up. The Ventral stream
is related to recognition and categorization. The functions of each stream
can be summarized as Where vs. What, that is, action vs. recognition. An
important distinction between these is that the Dorsal stream involves
feedback. From these studies, it has been assumed that object recognition
takes place in the Inferotemporal cortex.
Gross, Rolls and Perret (1972) investigated face recognition by studying how a monkey's neurons respond to different orientations of monkey faces. Tanaka conducted a similar investigation using objects (probably monkey-specific objects) instead of faces. The results of these investigations led to the conclusion that there are areas in the brain specifically dedicated to object recognition and face recognition.
Finally, Another area of study that may be related to object and face recognition are studies in Gait and Expression recognition. Everyone walks in a specific way, whether it just be our skeletal structure, or the fact that we're carrying something or we have a hurt leg, all these factors influence the way we walk. One way to implement this would be to have a feature vector ( f ) of the joints of the body. This feature vector would have to be over time. The implementation is based on the use of Hidden Markov Models. Hidden Markov Models have been used in language processing to determine the probability that a sound wave is a given phoneme. As the name implies, Hidden Markov Models calculate the probability that hidden data (phonemes) corresponds to the known data (sound waves). This model also takes into account the past and the future. Applying this to Gait recognition, the joint data would be the known data, and the gait (walking, old person, limp leg, carrying a rock, etc.) would be the unknown data. A similar process can be used for expressions and gestures. The difference is that an expression or gesture is not periodic although it still is a function of time. An important application of this technology is in human-computer interaction, where our space-age house may be able to recognize we had a bad day and dim the lights, put on some mellow music and open a chilled beer.