Robust n-Gram Model of Japanese Character and Its Application to Document Recognition (Special Issue on Character Recognition and Document Understanding)
スポンサーリンク
概要
- 論文の詳細を見る
A new postprocessing method using interpolated n-gram model for Japanese document is proposed. The method has the advantages over conventional approaches in enabling high-speed, knowledge-free processing. In parameter estimation of an n-gram model for a large size of vocabulary, it is difficult to obtain sufficient training samples. To overcome poverty of samples, two smoothing methods for Japanese character trigram model are evaluated, and the superiority of deleted interpolation method is shown by using perplexity. A document recognition system based on the trigram model is constructed, which finds maximum likelihood solutions through Viterbi algorithm. Experimental results for three kinds of documents show that the performance is high when using deleted interpolation method for smoothing. 90 % of OCR errors are corrected for the documents similar to training text data, and 75 % of errors are corrected for the documents not so similar to training text data.
- 1996-05-25
著者
-
ASO Hirotomo
Graduate School of Engineering, Tohoku University
-
Aso Hirotomo
Graduate School Of Engineering Tohoku University
-
Makino Shozo
Computer Center Tohoku University
-
Mori Hiroki
Graduate School of Engineering, Utsunomiya University
-
Mori Hiroki
Graduate School Of Engineering Hiroshima University
-
Mori Hiroki
Graduate School Of Engineering Tohoku University
関連論文
- An Approximation Method of the Quadratic Discriminant Function and Its Application to Estimation of High-Dimensional Distribution(Image Recognition and Understanding)
- Molecular Modification of 2,7-Diphenyl[1]benzothieno[3,2-b]benzothiophene (DPh-BTBT) with Diarylamino Substituents : From Crystalline Order to Amorphous State in Evaporated Thin Films
- A Discriminant Function Based on Feature Transformation Considering Normality Improvement of Distribution
- Segmenting Shape Using Deformation Information
- Analysis of vowel formant frequency variations between focus and neutral speech in Mandarin Chinese
- Robust n-Gram Model of Japanese Character and Its Application to Document Recognition (Special Issue on Character Recognition and Document Understanding)
- Decorative Character Recognition by Graph Matching(Image Recognition, Computer Vision)
- Simple Oligothiophene-Based Dyes for Dye-Sensitized Solar Cells (DSSCs) : Anchoring Group Effects on Molecular Properties and Solar Cell Performance
- Screen Pattern Removal for Character Pattern Extraction from High-Resolution Color Document Images(Image Recognition, Computer Vision)
- An analysis of switching pause duration as a paralinguistic feature in expressive dialogues
- Extraction of Structure of Silhouette Images by Weighted Minimum Common Supergraph(Internationa Session 2)
- Dynamic aspects of aizuchi and its influence on the naturalness of dialogues