| |
Advanced Optical Character Recognition (OCR) Techniques
SRI has developed OCR techniques that do not explicitly segment individual characters, thus
eliminating the most error-prone step of character recognition. This approach is based on
object recognition techniques that SRI developed for industrial machine vision applications. By
treating the shape of the character as a two-dimensional object, this technique can recognize
deformed characters and is robust even in the presence of the image noise and other degradations
of document quality. By using the constraints of a lexicon to directly drive the recognition
process, the technique achieves greater accuracy than the more conventional approach of merely
applying the lexicon in a postprocessing step.
SRI has also developed an OCR technique that applies contextual constraints by using word-level
language models (whole words and relationships between sequences of words) rather than just using
lexicons to recognize words in isolation. This technique was integrated into an experimental system
that automatically extracts relevant information from printed documents. Such a system can be
used by an intelligence analyst to quickly gather information from diverse sources of printed
material. This system segments a scanned document page into multiple text regions. It then determines
their reading order, recognizes the characters and words in the text, and applies a
finite-state-automata-based information extraction process to populate data templates of specific
targeted topics of interest.
References
[1] G.K. Myers and C.-H. Chen, Lexicon-based word recognition without word segmentation, in
Third Annual Symposium on Document Analysis and Information Retrieval, April 1993, 117-188.
[2] C.-H. Chen, Structuring a large lexicon for word recognition, presented at the Fourth
Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, Nevada, 24-26 April 1995.
[3] C.-H. Chen, Lexicon-driven word recognition, in Proc. Third InternatIonal Conference on
Document Analysis and Recognition, Montreal, Canada, 14-16 August 1995.
[4] G.K. Myers and P.G. Mulgaonkar, Automatic extraction of information from printed
documents, presented at the Fourth Annual Symposium on Document Analysis and Information
Retrieval, Las Vegas, Nevada, 24-26 April 1995.
|
|