Document Image Segmentation and Layout Analysis (Special Issue on Document Analysis and Recognition)
スポンサーリンク
概要
- 論文の詳細を見る
A system for segmentation of document image and ordering text areas is described, and applied to complex printed page layouts of both Japanese and English. There is no need to make any assumptions about the shape of blocks, hence the segmentation technique can handle not only skewed images without skew-correction but also documents where columns are not rectangular. In this technique, based on the bottom-up strategy, the connected components are extracted from the reduced image, and classified according to their local information. The connected components classified as characters are then merged into lines, and the lines are merged into areas. Extracted text areas are classified as body, caption, header or footer. A tree graph of the layout of the body texts is made, and the texts ordered by preorder traversal on the graph. We introduce the concept of an influence range of each node, a procedure for handling titles, thus obtaining good results on various documents. The total system is fast and compact.
- 社団法人電子情報通信学会の論文
- 1994-07-25
著者
-
Tachikawa Michiyoshi
Ricoh Information And Communication R&d Center
-
Saitoh Takashi
Ricoh Information And Communication R&d Center
-
Yamaai Toshifumi
Ricoh Information and Communication R&D Center
-
Yamaai Toshifumi
Ricoh Information And Communication R&d Center
関連論文
- A Handwritten Character Recognition System by Efficient Combination of Multiple Classifiers (Special Issue on Character Recognition and Document Understanding)
- Document Image Segmentation and Layout Analysis (Special Issue on Document Analysis and Recognition)