MAESTRO: Conductor of Multimedia Analysis Technologies


The SRI MAESTRO Team, C. O. R. P. O. R. A. T. E. (2000). MAESTRO: Conductor of multimedia analysis technologies. Communications of the ACM, 43(2), 57-63.


Although keyword-based queries are now a familiar part of any user’s experience with the World Wide Web, they are of limited direct applicability to the vast and growing quantity of multimedia information becoming available in materials such as broadcast news, video teleconferences, reconnaissance data, and audio-visual recordings of corporate meetings and classroom lectures. Content-based indexing, archiving and retrieval would facilitate access to large databases of such materials.

For example, in the broadcast news domain, content-based archiving is particularly useful. Archiving can be done by exploiting the speech contained in the audio track, the images contained in the video track, and the text in video overlays. One application is to filter down huge volumes of raw news footage to create the nicely packaged news broadcasts that we watch on television. Another use is to create a news-on-demand system for viewing news more efficiently. We can create a database of news broadcasts annotated for later retrieval of news clips of interest. The query “Tell me about the recent elections in Bosnia” would bring up news clips related to the elections.

MAESTRO (Multimedia Annotation and Enhancement via a Synergy of Technologies and Reviewing Operators) is a research and demonstration system developed at SRI International for exploring the contribution of a variety of analysis technologies — for example, speech recognition, image understanding, and optical character recognition — to the indexing and retrieval of multimedia. Informedia [1] and Broadcast News Navigator [2] are similar projects that use these technologies for archiving and retrieval. The main goal of the MAESTRO project is to discover, implement, and evaluate various combinations of these technologies to achieve analysis performance that surpasses the sum of the parts. For example, British Prime Minister Tony Blair can be identified in the news by his voice, his appearance, captions, and other cues. A combination of these cues should provide more reliable identification of the Prime Minister than using any of the cues on their own.

MAESTRO is a highly multidisciplinary effort, involving contributions from three laboratories across two divisions at SRI. Each of these SRI technologies is described in more detail below. The integrating architecture makes it easy to combine these in different ways, and to incorporate new analysis technologies developed by our team or by others.

Read more from SRI