Class-Based N-Gram Language Model for New Words Using Out-of-Vocabulary to In-Vocabulary Similarity
スポンサーリンク
概要
- 論文の詳細を見る
Out-of-vocabulary (OOV) words create serious problems for automatic speech recognition (ASR) systems. Not only are they miss-recognized as in-vocabulary (IV) words with similar phonetics, but the error also causes further errors in nearby words. Language models (LMs) for most open vocabulary ASR systems treat OOV words as a single entity, ignoring the linguistic information. In this paper we present a class-based n-gram LM that is able to deal with OOV words by treating each of them individually without retraining all the LM parameters. OOV words are assigned to IV classes consisting of similar semantic meanings for IV words. The World Wide Web is used to acquire additional data for finding the relation between the OOV and IV words. An evaluation based on adjusted perplexity and word-error-rate was carried out on the Wall Street Journal corpus. The result suggests the preference of the use of multiple classes for OOV words, instead of one unknown class.
著者
-
Tsuchiya Masatoshi
Information And Media Center Toyohashi University Of Technology
-
Nakagawa Seiichi
Department of Computer Science and Engineering, Toyohashi University of Technology
-
NAPTALI Welly
Department of Computer Science and Engineering, Toyohashi University of Technology
関連論文
- Topic dependent language model based on on-line voting (言語理解とコミュニケーション)
- A transitive translation for Indonesian-Japanese CLQA (自然言語処理)
- A Machine Learning Approach for an Indonesian-English Cross Language Question Answering System(Natural Language Processing)
- Indonesian-Japanese Transitive Translation using English for CLIR
- Topic dependent language model based on on-line voting (音声)
- Topic dependent language model based on clustering of noun word history
- Word and class dependency of N-gram language model (音声言語情報処理)
- Word and class dependency of N-gram language model (言語理解とコミュニケーション・第9回音声言語シンポジウム)
- Word and class dependency of N-gram language model (音声・第9回音声言語シンポジウム)
- LVCSR based on context-dependent syllable acoustic models (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- Robust distant speech recognition by combining variable-term spectrum based position-dependent CMN with conventional CMN (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- Improving Keyword Recognition of Spoken Queries by Combining Multiple Speech Recognizer's Outputs for Speech-driven WEB Retrieval Task(Spoken Language Systems, Corpus-Based Speech Technologies)
- An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems(Spoken Language Systems, Corpus-Based Speech Technologies)
- Speaker Change Detection and Speaker Clustering Using VQ Distortion Measure
- Succeeding Word Prediction for Speech Recognition Based on Stochastic Language Model
- Relationship among Recognition Rate, Rejection Rate and False Alarm Rate in a Spoken Word Recognition System
- Distant Speech Recognition Using a Microphone Array Network
- Auditory perception versus automatic estimation of location and orientation of an acoustic source in a real environment
- Continuous Speech Recognition Using an On-Line Speaker Adaptation Method Based on Automatic Speaker Clustering (Special Issue on Speech Information Processing)
- A Spoken Dialog System for Spontaneous Conversations Considering Response Timing and Response Type
- Indonesian-Japanese Transitive Translation using English for CLIR
- Class-Based N-Gram Language Model for New Words Using Out-of-Vocabulary to In-Vocabulary Similarity