Multimedia Systems and Databases
Content-Based Access to Video Databases
SRI
is developing tools to help analysts browse through video quickly and
detect important events. Current techniques for video manipulation are
based on the concept of video as a sequence of image frames, but cannot
reliably detect global video parameters such as camera motion and scene
changes. We interpret video as a 3-D solid whose third spatial dimension
is the time axis; thus, the objects in a scene appear as cylinders within
the 3-D solid.
We first designed and implemented a prototype video analyzer that displays
a 3-D representation of video; human analysts can use random access methods
to browse through the video. The video analyzer also uses cross sections
of 3-D representations of the video to detect scene changes.
We developed tools to analyze of surveillance video: algorithms that
track targets in the video and compensate for camera motion. We also developed
a scenario in which an unmanned aerial vehicle (UAV) sends continuous
video; the analyst detects objects, tracks them, and reports the video
contents; methods based on correlation enable the analyst to track blob-like
targets and maintain the tracks of the targets over time. We used a similar
method to compensate for camera motion by tracking background objects,
and to create mosaics of the scene while the video was analyzed.
Video Imagery
Automatic identification of the contents of video imagery
would permit videos to be indexed in a convenient and meaningful way for
later reference. For the MAESTRO system we developed a text extraction capability
for video ("video OCR"). Video imagery often contains text that is semantically
related to the scene depicted in the video, especially in broadcast news
programs. Such text can be computer-generated text overlaid on the imagery
(such as captions) or text that appears as part of the video scene itself.
In video imagery text is harder to locate and recognize than in many
other OCR applications because of small character sizes and nonuniform
backgrounds. In addition, scene text can be viewed from an oblique angle,
not in a two-dimensional plane, or blurred by motion. SRI has developed
an approach that involves binarizing individual color video frames and
then applying a commercially developed OCR engine. The accuracy of the
recognition result can be improved substantially by postprocessing the
OCR results with a lexicon of named entities extracted by MAESTRO from
the audio or closed caption tracks. (more
information)
Multimedia Information Systems
The use of
image-based data has grown exponentially in a wide range of industries. An
example is diagnostic imagery in a medical database: a composite of imagery
and specialists’ interpretations that presents a detailed view of a patient’s
lifetime health and health care. A typical image is associated with an
annotation of the image’s context and key features: at a medium-size hospital,
tens of thousands of images and terabytes of information are collected
per year. Government and commercial users are collecting at least as much
imagery as the health care industry. The key challenge is the creation
of image-based multimedia databases to store, manage, and provide access
to image data: such databases must support content-based indexing—especially
to answer queries such as “show me other images like this one.“
We implemented a system that stores imagery in a database and enables
the user to access the data by its content. We first developed methods
of accessing image data via text annotations associated with each image,
as well as data models that describe the multimedia data. Next, we devised
methods of accessing the data by image content, which is described in
terms of feature vectors (primarily color and texture). We then worked
on a scheme that uses linear models in the parameter space to model similarities
between images. The linear models are learned via on-line learning algorithms.
|