A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System
スポンサーリンク
概要
- 論文の詳細を見る
In a hidden Markov model (HMM), state duration probabilities decrease exponentially with time, which fails to adequately represent the temporal structure of speech. One of the solutions to this problem is integrating state duration probability distributions explicitly into the HMM. This form is known as a hidden semi-Markov model (HSMM). However, though a number of attempts to use HSMMs in speech recognition systems have been proposed, they are not consistent because various approximations were used in both training and decoding. By avoiding these approximations using a generalized forward-backward algorithm, a context-dependent duration modeling technique and weighted finite-state transducers (WFSTs), we construct a fully consistent HSMM-based speech recognition system. In a speaker-dependent continuous speech recognition experiment, our system achieved about 9.1% relative error reduction over the corresponding HMM-based system.
- (社)電子情報通信学会の論文
- 2008-11-01
著者
-
ZEN Heiga
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
TOKUDA Keiichi
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
Zen Heiga
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Tokuda K
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Tokuda Keiichi
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
NANKAKU Yoshihiko
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
Tokuda Keiichi
The Department Of Computer Science Nagoya Institute Of Technology
-
OURA Keiichiro
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
LEE Akinobu
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
Lee Akinobu
Department Of Computer Science Nagoya Institute Of Technology
-
Lee Akinobu
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Oura Keiichiro
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Nankaku Yoshihiko
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Tokuda Keiichi
Department Of Computer Science Naogya Institute Of Technology
-
Zen Heiga
Department Of Computer Science Naogya Institute Of Technology
関連論文
- The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006
- Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005(Speech and Herring)
- Applying Sparse KPCA for Feature Extraction in Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- On the Use of Kernel PCA for Feature Extraction in Speech Recognition(Speech and Hearing)
- The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006
- A Hidden Semi-Markov Model-Based Speech Synthesis System(Speech and Hearing)
- State Duration Modeling for HMM-Based Speech Synthesis(Speech and Hearing)
- A Training Method of Average Voice Model for HMM-Based Speech Synthesis(Digital Signal Processing)
- A Context Clustering Technique for Average Voice Models (Special Issue on Speech Information Processing)
- Speaker Adaptation of Pitch and Spectrum for HMM-Based Speech Synthesis
- Multi-Space Probability Distribution HMM(Special Issue on the 2000 IEICE Excellent Paper Award)
- Vector Quantization of Speech Spectral Parameters Using Statistics of Static and Dynamic Features
- Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-Space Probability Distribution (Special Issue on Biometric Person Authentication)
- A Reordering Model Using a Source-Side Parse-Tree for Statistical Machine Translation
- Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models(Speech Recognition, Statistical Modeling for Speech Processing)
- A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System
- Mixture Density Models Based on Mel-Cepstral Representation of Gaussian Process(Digital Signal Processing)
- A 16kb/s Wideband CELP-Based Speech Coder Using Mel-Generalized Cepstral Analysis
- Non-Audible Murmur (NAM) Recognition Exploiting Adaptation Techniques
- Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System
- 複数モデルを用いた十分統計量に基く教師なし話者適応における学習話者のクラス化の検討
- LMS-Based Algorithms with Multi-Band Decomposition of the Estimation Error Applied to System Identification (Special Section on Digital Signal Processing)
- Multi-Band Decomposition of the Linear Prediction Error Applied to Adaptive AR Spectral Estimation
- Adaptive AR Spectral Estimation Based on Wavelet Decomposition of the Linear Prediction Error
- A Covariance-Typing Technique for HMM-Based Speech Synthesis
- Unsupervised speaker adaptation for speech-to-speech translation system (言語理解とコミュニケーション)
- Unsupervised speaker adaptation for speech-to-speech translation system (音声)
- Parameter Sharing in Mixture of Factor Analyzers for Speaker Identification(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Continuous Speech Recognition Based on General Factor Dependent Acoustic Models(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Bayesian Context Clustering Using Cross Validation for Speech Recognition
- Reformulating the HMM as a Trajectory Model
- Reformulating the HMM as a Trajectory Model
- Reformulating the HMM as a Trajectory Model
- Speech recognition based on statistical models including multiple phonetic decision trees
- A Bayesian Framework Using Multiple Model Structures for Speech Recognition
- An Extension of Separable Lattice 2-D HMMs for Rotational Data Variations
- Speaker interpolation for HMM-based speech synthesis system
- A Bayesian Framework Using Multiple Model Structures for Speech Recognition