Image based typographic analysis of documents

TitleImage based typographic analysis of documents
Publication TypeConference Papers
Year of Publication1993
AuthorsDoermann D, Furuta R
Conference NameDocument Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Date Published1993/10//
Keywords2D, analysis;, attributes;, based, character, commands;, component, data, description, document, DVI, extraction;, feature, figure, file;, formatting, hierarchical, image, language;, languages;, layout;, line, margins;, page, placement;, processing;, read-order;, relationships;, representation;, spacing;, spatial, structures;, syntax;, synthesis;, typographic, understanding;

An approach to image based typographic analysis of documents is provided. The problem requires a spatial understanding of the document layout as well as knowledge of the proper syntax. The system performs a page synthesis from the stream of formatting commands defined in a DVI file. Since the two-dimensional relationships between document components are not explicit in the page language, the authors develop a representation which preserves the two-dimensional layout, the read-order and the attributes of document components. From this hierarchical representation of the page layout we extract and analyze relevant typographic features such as margins, line and character spacing, and figure placement