Unsupervised Speaker Adaptation Using All-Phoneme Ergodic Hidden Markov Network
スポンサーリンク
概要
- 論文の詳細を見る
This paper proposes an unsupervised speaker adaptation method using an "all-phoneme ergodic Hidden Markov Network" that combines allophonic (context-dependent phone) acoustic models with stochastic language constraints. Hidden Markov Network (HMnet) for allophone modeling and allophonic bigram probabilities derived from a large text database are combined to yield a single large ergodic HMM which represents arbitrary speech signals in a particular language so that the model parameters can be re-estimated using text-unknown speech samples with the Baum-Welch algorithm. When combined with the Vector Field Smoothing (VFS) technique, unsupervised speaker adaptation can be effectively performed. This method experimentally gave better performances compared with our previous unsupervised adaptation method which used conventional phonetic HMMs and phoneme bigram probabilities especially when the amount of training data was small.
- 社団法人電子情報通信学会の論文
- 1995-08-25
著者
-
Sagayama Shigeki
Ntt Human Interface Laboratories
-
Matsunaga Shoichi
Atr Interpreting Telecommunications Research Laboratories
-
Takami Jun-ichi
Atr Interpreting Telecommunications Research Laboratories
-
Miyazawa Yasunaga
ATR Interpreting Telecommunications Research Laboratories
関連論文
- Spoken Sentence Recognition Based on HMM-LR with Hybrid Language Modeling (Special Issue on Natural Language Processing and Understanding)
- LR Parsing with a Category Reachability Test Applied to Speech Recognition (Special Issue on Speech and Discourse Processing in Dialogue Systems)
- A pairwise discriminant approach using artificial neural networks for continuous speech recognition
- Speaker-Consistent Parsing for Speaker-Independent Continuous Speech Recognition
- Automatic Determination of the Number of Mixture Components for Continuous HMMs Based on a Uniform Variance Criterion
- Unsupervised Speaker Adaptation Using All-Phoneme Ergodic Hidden Markov Network
- Speech Recognition Using Function-Word N-Grams and Content-Word N-Grams
- Discriminative Training Based on Minimum Classification Error for a Small Amount of Data Enhanced by Vector-Field-Smoothed Bayesian Learning