Generic Visual Perception Processor Seminar Report
Generic Visual Perception.doc (Size: 437 KB / Downloads: 221)
Generic visual.doc (Size: 37 KB / Downloads: 32)
While computing technology is growing in leaps and bounds, the human brain continues to be the world's fastest computer. Combine brain-power with seeing power, and you have the fastest, cheapest, most extra ordinary processor ever-the human eye. Little wonder, research labs the world over are striving to produce a near-perfect electronic eye.
The 'generic visual perception processor (GVPP)' has been developed after 10 long years of scientific effort. Generic Visual Perception Processor (GVPP) can automatically detect objects and track their movement in real-time. The GVPP, which crunches 20 billion instructions per second (BIPS), models the human perceptual process at the hardware level by mimicking the separate temporal and spatial functions of the eye-to-brain system. The processor sees its environment as a stream of histograms regarding the location and velocity of objects.
GVPP has been demonstrated as capable of learning-in-place to solve a variety of pattern recognition problems. It boasts automatic normalization for varying object size, orientation and lighting conditions, and can function in daylight or darkness.
This electronic "eye" on a chip can now handle most tasks that a normal human eye can. That includes driving safely, selecting ripe fruits, reading and recognizing things. Sadly, though modeled on the visual perception capabilities of the human brain, the chip is not really a medical marvel, poised to cure the blind.
BACKGROUND OF THE INVENTION
The invention relates generally to methods and devices for automatic visual perception, and more particularly to methods and devices for processing image signals using two or more histogram calculation units to localize one or more objects in an image signal using one or more characteristics an object such as the shape, size and orientation of the object. Such devices can be termed an electronic spatio-temporal neuron, and are particularly useful for image processing, but may also be used for other signals, such as audio signals. The techniques of the present invention are also particularly useful for tracking one or more objects in real time.
The GVPP was invented in 1992, by BEV founder Patric Pirim . It would be relatively simple for a CMOS chip to implement in hardware the separate contributions of temporal and spatial processing in the brain. The brain-eye system uses layers of parallel-processing neurons that pass the signal through a series of preprocessing steps, resulting in real-time tracking of multiple moving objects within a visual scene.
Pirim created a chip architecture that mimicked the work of the neurons, with the help of multiplexing and memory. The result is an inexpensive device that can autonomously "perceive" and then track up to eight user-specified objects in a video stream based on hue, luminance, saturation, spatial orientation, speed and direction of motion.
The GVPP tracks an "object," defined as a certain set of hue, luminance and saturation values in a specific shape, from frame to frame in a video stream by anticipating where it’s leading and trailing edges make "differences" with the background. That means it can track an object through varying light sources or changes in size, as when an object gets closer to the viewer or moves farther away.
The GVPP’S major performance strength over current-day vision systems is its adaptation to varying light conditions. Today’s vision systems dictate uniform shadow less illumination ,and even next generation prototype systems, designed to work under “normal” lighting conditions, can be used only dawn to dusk. The GVPP on the other hand, adapt to real time changes in lighting without recalibration, day or light.
HOW IT WORKS
Basically the chip is made of neural network modeled resembling the structure of human brain. The basic element here is a neuron. There are large number of input lines and an output line to a neuron. Each neuron is capable of implementing a simple function. It takes the weighted sum of its inputs and produces an output that is fed into the next layer. The weights assigned to each input are a variable quantity.
A large number of such neurons interconnected form a neural network. Every input that is given to the neural network gets transmitted over entire network via direct connections called synaptic connections and feed back paths. Thus the signal ripples in the neural network, every time changing the weighted values associated with each input of every neuron. These changes in the ripples will naturally direct the weights to modify into those values that will become stable .That is, those values does not change. At this point the information about the signal is stored as the weighted values of inputs in the neural network.
A neural network geometrizes computation. When we draw the state diagram of a neural network, the network activity burrows a trajectory in this state space. The trajectory begins with a computation problem. The problem specifies initial conditions which define the beginning of trajectory in the state space.
In pattern learning, the pattern to be learned defines the initial conditions. Where as in pattern recognition, the pattern to be recognized defines the initial conditions. Most of the trajectory consists of transient behavior or computations. The weights associated with inputs gradually change to learn new pattern information. The trajectory ends when the system reaches equilibrium. This is the final state of the neural network. If the pattern was meant to be matched, the final neuronal state represents the pattern that is closest match to the input pattern.
THE PROCESSING DETAILS
The electronic eye follows exactly the same theoretical processing steps of the real eye with hard-wired silicon circuitry around each pixel in its sensor array. A sensor array is a set of several sensors that an information gathering device uses to gather information (usually directional in nature) that cannot be gathered from a single source for a central processing unit. Each pixel is read by the vision chip with hardware that determines and scales luminescence, tracks color, remembers movement in the previous moment, recalls the direction of previous movement, and then deduces the speed of the various detected objects from parallel phasic and tonic neural circuitry.
Basically, each parameter has an associated neuron that handles its processing tasks in parallel. In addition, each pixel has two auxiliary neurons that define the zone in which an object is located-that is, from the direction in which an object is moving, these neurons deduce the leading and trailing edge of the object and mark with registers associated with the first (leading-edge) and last (trailing-edge) pixel belonging to the object. Each of these silicon neurons is built with RAM, a few registers, an adder and a comparator. Supplied as a 100-pin module, the chip accommodates analogue-input line levels for video input, with an input amplifier with programmable gain auto scaling the signal.
In applications, each pixel may be described with respect to any of the six domains of information available to it — hue, luminance, saturation, speed, direction of motion and spatial orientation. The GVPP further subcategorizes pixels by ranges, for instance luminance within 10 percent and 65 percent, hue of blue, saturation between 20 and 25 percent, and moving upward in scene.
A set of second-level pattern recognition commands permits the GVPP to search for different objects in different parts of the scene — for instance, to look for a closed eyelid only within the rectangle bordered by the corners of the eye. Since some applications may also require multiple levels of recognition, the GVPP has software hooks to pass along the recognition task from level to level.
For instance, to detect when a driver is falling asleep — a capability that could find use in California, which is about to mandate that cars sound an "alarm" when drowsy drivers begin to nod off — the GVPP is first programmed to detect the driver's head, for which it creates histograms of head movement. The microprocessor reads these histograms to identify the area for the eye.