A Trainable System for Object Detection in Images and Video Sequences

Item

Title
en_US A Trainable System for Object Detection in Images and Video Sequences
Creator
en_US Papageorgiou, Constantine P.
Date
2004-10-01T13:59:58Z
Date Available
2004-10-01T13:59:58Z
Date Issued
en_US 2000-05-01
Identifier
en_US AITR-1685
en_US CBCL-186
Abstract
en_US This thesis presents a general, trainable system for object detection in static images and video sequences. The core system finds a certain class of objects in static images of completely unconstrained, cluttered scenes without using motion, tracking, or handcrafted models and without making any assumptions on the scene structure or the number of objects in the scene. The system uses a set of training data of positive and negative example images as input, transforms the pixel images to a Haar wavelet representation, and uses a support vector machine classifier to learn the difference between in-class and out-of-class patterns. To detect objects in out-of-sample images, we do a brute force search over all the subwindows in the image. This system is applied to face, people, and car detection with excellent results. For our extensions to video sequences, we augment the core static detection system in several ways -- 1) extending the representation to five frames, 2) implementing an approximation to a Kalman filter, and 3) modeling detections in an image as a density and propagating this density through time according to measured features. In addition, we present a real-time version of the system that is currently running in a DaimlerChrysler experimental vehicle. As part of this thesis, we also present a system that, instead of detecting full patterns, uses a component-based approach. We find it to be more robust to occlusions, rotations in depth, and severe lighting conditions for people detection than the full body version. We also experiment with various other representations including pixels and principal components and show results that quantify how the number of features, color, and gray-level affect performance.
Extent
en_US 128 p.
72537763 bytes
15910731 bytes
Format
application/postscript
application/pdf
Language
en_US
Relation
en_US AITR-1685
en_US CBCL-186
Subject
en_US AI
en_US MIT
en_US Artificial Intelligence
en_US object detection
en_US pattern recognition
en_US people detection
en_US face detection
en_US car detection