Selected Topics from LVCSR Research for Asian Languages at Tokyo Tech
スポンサーリンク
概要
- 論文の詳細を見る
This paper presents our recent work in regard to building Large Vocabulary Continuous Speech Recognition (LVCSR) systems for the Thai, Indonesian, and Chinese languages. For Thai, since there is no word boundary in the written form, we have proposed a new method for automatically creating word-like units from a text corpus, and applied topic and speaking style adaptation to the language model to recognize spoken-style utterances. For Indonesian, we have applied proper noun-specific adaptation to acoustic modeling, and rule-based English-to-Indonesian phoneme mapping to solve the problem of large variation in proper noun and English word pronunciation in a spoken-query information retrieval system. In spoken Chinese, long organization names are frequently abbreviated, and abbreviated utterances cannot be recognized if the abbreviations are not included in the dictionary. We have proposed a new method for automatically generating Chinese abbreviations, and by expanding the vocabulary using the generated abbreviations, we have significantly improved the performance of spoken query-based search.
- 2012-05-01
著者
関連論文
- Tree-Structured Clustering Methods for Piecewise Linear-Transformation-Based Noise Adaptation(Speech and Hearing)
- Recent Progress in Corpus-Based Spontaneous Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- THE USE OF FINITE-STATE TRANSDUCERS FOR MODELING PHONOLOGICAL AND MORPHOLOGICAL CONSTRAINTS IN AUTOMATIC SPEECH RECOGNITION
- Adaptation to Pronunciation Variations in Indonesian Spoken Query-Based Information Retrieval
- Committee-Based Active Learning for Speech Recognition
- Robust Gait-Based Person Identification against Walking Speed Variations
- Selected Topics from LVCSR Research for Asian Languages at Tokyo Tech
- Active Learning Using Phone-Error Distribution for Speech Modeling
- Distance-based Factor Graph Linearization and Sampled Max-sum Algorithm for Efficient 3D Potential Decoding of Macromolecules
- Active Learning Using Phone-Error Distribution for Speech Modeling