Development of a Mandarin-English Bilingual Speech Recognition System for Real World Music Retrieval
スポンサーリンク
概要
- 論文の詳細を見る
In recent decades, there has been a great deal of research into the problem of bilingual speech recognition-to develop a recognizer that can handle inter- and intra-sentential language switching between two languages. This paper presents our recent work on the development of a grammar-constrained, Mandarin-English bilingual Speech Recognition System (MESRS) for real world music retrieval. Two of the main difficult issues in handling the bilingual speech recognition systems for real world applications are tackled in this paper. One is to balance the performance and the complexity of the bilingual speech recognition system; the other is to effectively deal with the matrix language accents in embedded language**. In order to process the intra-sentential language switching and reduce the amount of data required to robustly estimate statistical models, a compact single set of bilingual acoustic models derived by phone set merging and clustering is developed instead of using two separate monolingual models for each language. In our study, a novel Two-pass phone clustering method based on Confusion Matrix (TCM) is presented and compared with the log-likelihood measure method. Experiments testify that TCM can achieve better performance. Since potential system users native language is Mandarin which is regarded as a matrix language in our application, their pronunciations of English as the embedded language usually contain Mandarin accents. In order to deal with the matrix language accents in embedded language, different non-native adaptation approaches are investigated. Experiments show that model retraining method outperforms the other common adaptation methods such as Maximum A Posteriori (MAP). With the effective incorporation of approaches on phone clustering and non-native adaptation, the Phrase Error Rate (PER) of MESRS for English utterances was reduced by 24.47% relatively compared to the baseline monolingual English system while the PER on Mandarin utterances was comparable to that of the baseline monolingual Mandarin system. The performance for bilingual utterances achieved 22.37% relative PER reduction.
- (社)電子情報通信学会の論文
- 2008-03-01
著者
-
Yan Yonghong
Thinkit Speech Lab. Institute Of Acoustics Chinese Academy Of Sciences
-
Yan Yonghong
Institute Of Acoustics Chinese Academy Of Science
-
Zhao Qingwei
Thinkit Speech Lab Institute Of Acoustics Chinese Academy Of Sciences
-
Yan Yonghong
Thinkit Speech Lab Institute Of Acoustics Chinese Academy Of Sciences
-
Yan Yonghong
Thinkit Speech Lab.
-
Yan Yonghong
Thinkit Speech Laboratory Institute Of Acoustics Chinese Academy Of Sciences Beijing
-
SHAO Jian
ThinkIT Speech Lab.
-
ZHANG Qingqing
ThinkIT Speech Lab.
-
PAN Jielin
ThinkIT Speech Laboratory, Institute of Acoustics Chinese Academy of Sciences Beijing
-
LIN Yang
ThinkIT Speech Laboratory, Institute of Acoustics Chinese Academy of Sciences Beijing
-
Shao Jian
Thinkit Speech Lab Institute Of Acoustics Chinese Academy Of Sciences
-
Pan Jielin
Thinkit Speech Laboratory Institute Of Acoustics Chinese Academy Of Sciences Beijing
-
Zhang Qingqing
Thinkit Speech Laboratory Institute Of Acoustics Chinese Academy Of Sciences Beijing
-
Lin Yang
Thinkit Speech Laboratory Institute Of Acoustics Chinese Academy Of Sciences Beijing
関連論文
- Effects of single-channel speech enhancement algorithms on Mandarin speech intelligibility (応用音響)
- Approximate Decision Function and Optimization for GMM-UBM Based Speaker Verification
- Using a Kind of Novel Phonotactic Information for SVM Based Speaker Recognition
- Robust Speaker Clustering Using Affinity Propagation
- An LVCSR Based Reading Miscue Detection System Using Knowledge of Reference and Error Patterns
- Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech
- A One-Pass Real-Time Decoder Using Memory-Efficient State Network
- Development of a Mandarin-English Bilingual Speech Recognition System for Real World Music Retrieval
- Automatic Singing Performance Evaluation for Untrained Singers
- Melody Track Selection Using Discriminative Language Model
- Automatic Language Identification with Discriminative Language Characterization Based on SVM
- A two-element-microphone-array-based speech recognition system in vehicle environment(Commemoration of the Japan-China Joint Conference on Acoustics 2007 (JCA2007))
- Speech Enhancement Using Improved Adaptive Null-Forming in Frequency Domain with Postfilter
- Effects of the Temporal Fine Structure in Different Frequency Bands on Mandarin Tone Perception
- Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition
- A Hybrid Speech Emotion Recognition System Based on Spectral and Prosodic Features
- Enhancing the Robustness of the Posterior-Based Confidence Measures Using Entropy Information for Speech Recognition
- Two-Microphone Noise Reduction Using Spatial Information-Based Spectral Amplitude Estimation
- A bayesian logistic regression approach to spoken language identification