Recognition-based Segmentation for Digitization of Korean Historical Document Pages(Character and document processing)
スポンサーリンク
概要
- 論文の詳細を見る
We present a recognition-based digitization method for building digital library of large amount of historical archives. Digitization of historical document pages is essential for providing retrieval service and preventing from damages but needs laborious manual verification for accurate output. In this paper, split-merge approach is applied for segmenting overlapped and touched characters written by thick brushes. Character string images are split into primitive segments by nonlinear segmentation paths passing maximum curvature points. Split segments are merged in single probabilistic framework integrated by layout analysis, context information and recognition result. In experiment, our system achieved 96.4% character recognition rates on test data set, despite the obsolete characters and unique variants used in the archives. In conclusion, our method can be applied for digitizing Korean historical document pages and minimize manual verification.
- 社団法人電子情報通信学会の論文
- 2006-11-17
著者
関連論文
- Methylene Chloride Fraction of Spatholobi Caulis Induces Apoptosis via Caspase Dependent Pathway in U937 Cells(Biochemistry/Molecular Biology)
- Recognition-based Segmentation for Digitization of Korean Historical Document Pages(Character and document processing)