Probabilistic Automaton-Based Fuzzy English-Text Retrieval(Software Systems)
スポンサーリンク
概要
- 論文の詳細を見る
Optical Character Reader (OCR) incorrect recognition is a serious problem when searching for OCR-scanned documents in databases such as digital librarics. In order to reduce costs, this paper proposes fuzzy retrieval methods for English text containing errors in the recognized text without correcting the errors manually. The proposed methods generate multiple search terms for each input query term based on probabilistic automata which reflect both error-occurrence probabilities and character-connection probabilities. Experimental results of test-set retrieval indicate that one of the proposed methods improves the recall rate from 95.96% to 98.15% at the cost of a decrease in precision from 100.0% to 96.01% with 20 expanded search terms.
- 社団法人電子情報通信学会の論文
- 2003-09-01
著者
-
Ohta Manabu
Graduate School Of Engineering Tokyo Metropolitan University
-
TAKASU Atsuhiro
National Institute of Informatics
-
Takasu Atsuhiro
National Center For Science Information Systems
-
Adachi Jun
National Center For Science And Information Systems
関連論文
- Probabilistic Automaton-Based Fuzzy English-Text Retrieval(Software Systems)
- Load Balancing Scheme on the Basis of Huffman Coding for P2P Information Retrieval
- Special Section on Information Processing Technology for Web Utilization
- A Minimum Path Decomposition of the Hasse Diagram for Testing the Consistency of Functional Dependencies
- Margin-Based Pivot Selection for Similarity Search Indexes
- The 2nd International Conference on Japanese Information in Science, Technology, and Commerce
- Optimal Pivot Selection Method Based on the Partition and the Pruning Effect for Metric Space Indexes
- Comparison of Electrochemical Impedance Spectroscopy between Illumination and Dark Conditions