An Introduction to Research on Document Understanding and Character Recognition : Hitachi's Case
スポンサーリンク
概要
- 論文の詳細を見る
Hitachi has researched document understanding and character recognition for over thirty years. This paper is an introduction of our OCR (Optical Character Recognition) products and research work. Our products are consists of two types. One is a conventional Hardware-OCR, which are form OCR and mail sorting machine. The other is a new type OCR product, which are pen device product and camera based OCR, in order to provide new opportunities of use of OCR. In this paper, we present our research work from the point of view of products. Firstly, layout analysis is explained by using form OCR. Form layout analysis needs robustness to low quality images because of general use. Secondly, character segmentation and linguistic processing are explained by using postal address recognition. To solve ambiguity of character segmentation and recognition in address line, we integrate segmentation, classification and Linguistic interpretation at the same time. Thirdly, character classification is explained as common technology for our all OCR products. The classification work is based on directional feature extraction and statistical discriminant models. Finally, we introduce a unconventional OCR technology, which are digital pen product and camera based OCR. The digital pen can link electronic data to paper documents. For camera based OCR, we develop a small-scale Kanji character recognition engine.
- 2006-11-16