On the reading of tables of contents


Sarkar, P.; Saund, E. On the reading of tables of contents. Eighth IAPR International Workshop on Document Analysis Systems; 2008 September 16-19; Nara, Japan.


This paper presents a framework for understanding tables of contents of books, journals, and magazines. We propose a universal logical structure representation in terms of a hierarchy of entries, each of which may contain a descriptor and a locator. We enumerate graphical and perceptual cues that provide cues to parsing of tables of contents in terms of this formalism. We make initial suggestions about the form of evaluation metrics for comparing groundtruthed tables of contents with the output of recognition algorithms. Typical and atypical tables of contents are used throughout to illustrate significant phenomena that must be dealt with in principled ways in any general TOC interpretation scheme. Finally we discuss implications of our observations on the design of recognition algorithms.

Read more from SRI