SRI Logo
Spacer

Spacer
         
  SRI Logo

Advanced Optical Character Recognition (OCR) Techniques

SRI has developed OCR techniques that do not explicitly segment individual characters, thus eliminating the most error-prone step of character recognition. This approach is based on object recognition techniques that SRI developed for industrial machine vision applications. By treating the shape of the character as a two-dimensional object, this technique can recognize deformed characters and is robust even in the presence of the image noise and other degradations of document quality. By using the constraints of a lexicon to directly drive the recognition process, the technique achieves greater accuracy than the more conventional approach of merely applying the lexicon in a postprocessing step.

Graphics

SRI has also developed an OCR technique that applies contextual constraints by using word-level language models (whole words and relationships between sequences of words) rather than just using lexicons to recognize words in isolation. This technique was integrated into an experimental system that automatically extracts relevant information from printed documents. Such a system can be used by an intelligence analyst to quickly gather information from diverse sources of printed material. This system segments a scanned document page into multiple text regions. It then determines their reading order, recognizes the characters and words in the text, and applies a finite-state-automata-based information extraction process to populate data templates of specific targeted topics of interest.

References

[1] G.K. Myers and C.-H. Chen, Lexicon-based word recognition without word segmentation, in Third Annual Symposium on Document Analysis and Information Retrieval, April 1993, 117-188.

[2] C.-H. Chen, Structuring a large lexicon for word recognition, presented at the Fourth Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, Nevada, 24-26 April 1995.

[3] C.-H. Chen, Lexicon-driven word recognition, in Proc. Third InternatIonal Conference on Document Analysis and Recognition, Montreal, Canada, 14-16 August 1995.

[4] G.K. Myers and P.G. Mulgaonkar, Automatic extraction of information from printed documents, presented at the Fourth Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, Nevada, 24-26 April 1995.

 

About Us  Vertical divider  R&D Divisions  Divider  Careers  Divider  Newsroom  Divider  Contact Us
©2008 SRI International 333 Ravenswood Avenue, Menlo Park, CA 94025-3493
SRI International is an independent, nonprofit corporation. Privacy policy