Dynamic Bayesian Network-Based Acoustic Models Incorporating Speaking Rate Effects(Speech and Hearing)
スポンサーリンク
概要
- 論文の詳細を見る
One of the most important issues in spontaneous speech recognition is how to cope with the degradation of recognition accuracy due to speaking rate fluctuation within an utterance. This paper proposes an acoustic model for adjusting mixture weights and transition probabilities of the HMM for each frame according to the local speaking rate. The proposed model is implemented along with variants and conventional models using the Bayesian network framework. The proposed model has a hidden variable representing variation of the "mode" of the speaking rate, and its value controls the parameters of the underlying HMM. Model training and maximum probability assignment of the variables are conducted using the EM/GEM and inference algorithms for the Bayesian networks. Utterances from meetings and lectures are used for evaluation where the Bayesian network-based acoustic models are used to rescore the likelihood of the N-best lists. In the experiments, the proposed model indicated consistently higher performance than conventional HMMs and regression HMMs using the same speaking rate information.
- 社団法人電子情報通信学会の論文
- 2004-10-01
著者
-
SHINOZAKI Takahiro
Department of Computer Science, Tokyo Institute of Technology
-
Furui Sadaoki
Department Of Computer Science Graduate School Of Information Science And Engineering Tokyo Institut
-
Shinozaki Takahiro
Department Of Computer Science Tokyo Institute Of Technology
-
Furui Sadaoki
Department Of Computer Science Tokyo Institute Of Technology
関連論文
- Gait-based Person Identification Robust against Speed Variation using CHLAC features and HMMs
- Gait-based Person Identification Robust against Speed Variation using CHLAC features and HMMs
- Gait-based Person Identification Robust against Speed Variation using CHLAC features and HMMs
- Tree-Structured Clustering Methods for Piecewise Linear-Transformation-Based Noise Adaptation(Speech and Hearing)
- Robust Scene Extraction Using Multi-Stream HMMs for Baseball Broadcast(Image Processing and Video Processing)
- Automatic recognition of Indonesian declarative questions and statements using polynomial coefficients of the pitch contours
- Initial evaluation of the drivers' Japanese speech corpus in a car environment (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- Accent analysis for Mandarin large vocabulary continuous speech recognition (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- Evaluation of a Noise-Robust Multi-Stream Speaker Verification Method Using F_0 Information
- Topic Extraction Based on Continuous Speech Recognition in Broadcast News Speech
- Cervical Plexus Block Helps in Diagnosis of Orofacial Pain Originating from Cervical Structures
- Noise Robust Speech Recognition Using F_0 Contour Information(Speech Dynamics by Ear, Eye, Mouth and Machine)
- Initial evaluation of the drivers' Japanese speech corpus in a car environment
- Phonology and Morphology Modeling in a Very Large Vocabulary Hungarian Dictation System(Speech and Hearing)
- 連続発話認識のための言語モデル
- Dynamic Bayesian Network-Based Acoustic Models Incorporating Speaking Rate Effects(Speech and Hearing)
- Neural-network-based HMM adaptation for noisy speech recognition
- Speaker Verification Using MMAP Adaptation (言語理解とコミュニケーション)
- Speaker Verification Using MMAP Adaptation (音声)
- Subject Adaptation and Adaptive Training for Gait-based Person Identification
- Subject Adaptation and Adaptive Training for Gait-based Person Identification
- Two-pass Approach for Recognizing Code-Switching Speech
- Two-pass Approach for Recognizing Code-Switching Speech
- Two-pass Approach for Recognizing Code-Switching Speech
- Subject Adaptation and Adaptive Training for Gait-based Person Identification
- Speaker Verification Using MMAP Adaptation
- Two-pass Approach for Recognizing Code-Switching Speech