A Bayesian Framework Using Multiple Model Structures for Speech Recognition
スポンサーリンク
概要
- 論文の詳細を見る
This paper proposes an acoustic modeling technique based on Bayesian framework using multiple model structures for speech recognition. The aim of the Bayesian approach is to obtain good prediction of observation by marginalizing all variables related to generative processes. Although the effectiveness of marginalizing model parameters was recently reported in speech recognition, most of these systems use only "one" model structure, e.g., topologies of HMMs, the number of states and mixtures, types of state output distributions, and parameter tying structures. However, it is insufficient to represent a true model distribution, because a family of such models usually does not include a true distribution in most practical cases. One of solutions of this problem is to use multiple model structures. Although several approaches using multiple model structures have already been proposed, the consistent integration of multiple model structures based on the Bayesian approach has not seen in speech recognition. This paper focuses on integrating multiple phonetic decision trees based on the Bayesian framework in HMM based acoustic modeling. The proposed method is derived from a new marginal likelihood function which includes the model structures as a latent variable in addition to HMM state sequences and model parameters, and the posterior distributions of these latent variables are obtained using the variational Bayesian method. Furthermore, to improve the optimization algorithm, the deterministic annealing EM (DAEM) algorithm is applied to the training process. The proposed method effectively utilizes multiple model structures, especially in the early stage of training and this leads to better predictive distributions and improvement of recognition performance.
著者
-
Tokuda Keiichi
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Nankaku Yoshihiko
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Shiota Sayaka
Department of Computer Science, Nagoya Institute of Technology
-
Tokuda Keiichi
Department of Computer Science, Nagoya Institute of Technology
-
HASHIMOTO Kei
Department of Applied Biological Chemistry, Utsunomiya University
関連論文
- Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005(Speech and Herring)
- Antioxidative Effects of Phenolic Acids on Lipid Peroxidation Induced by H_2O_2 in the Presence of Myoglobin
- Determination of Hydrogen Peroxide by High-Performance Liquid Chromatography with a Cation-Exchange Resin Gel Column and Electrochemical Detector
- Absorption and Metabolism of Quercetin in Caco-2 Cells
- Applying Sparse KPCA for Feature Extraction in Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- On the Use of Kernel PCA for Feature Extraction in Speech Recognition(Speech and Hearing)
- The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006
- A Hidden Semi-Markov Model-Based Speech Synthesis System(Speech and Hearing)
- State Duration Modeling for HMM-Based Speech Synthesis(Speech and Hearing)
- A Training Method of Average Voice Model for HMM-Based Speech Synthesis(Digital Signal Processing)
- A Context Clustering Technique for Average Voice Models (Special Issue on Speech Information Processing)
- Multi-Space Probability Distribution HMM(Special Issue on the 2000 IEICE Excellent Paper Award)
- A Reordering Model Using a Source-Side Parse-Tree for Statistical Machine Translation
- Spectral Cosensitization in Organic Solar Cell with Mixed Film of Zinc Porphyrin and Merocyanine
- A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System
- LMS-Based Algorithms with Multi-Band Decomposition of the Estimation Error Applied to System Identification (Special Section on Digital Signal Processing)
- Multi-Band Decomposition of the Linear Prediction Error Applied to Adaptive AR Spectral Estimation
- Inhibitory Effect of Arphamenine A on Intestinal Dipeptide Transport
- Adaptive AR Spectral Estimation Based on Wavelet Decomposition of the Linear Prediction Error
- A Covariance-Typing Technique for HMM-Based Speech Synthesis
- Effects of β-Lactoglobulin on the Tight-junctional Stability of Caco-2-SF Monolayer
- Parameter Sharing in Mixture of Factor Analyzers for Speaker Identification(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Suppression of the Menadione-Induced Cytotoxicity toward Hepalclc7 Murine Hepatoma by Quinone Reductase Inducers
- Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Continuous Speech Recognition Based on General Factor Dependent Acoustic Models(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Bayesian Context Clustering Using Cross Validation for Speech Recognition
- Speech recognition based on statistical models including multiple phonetic decision trees
- A Bayesian Framework Using Multiple Model Structures for Speech Recognition
- Speaker interpolation for HMM-based speech synthesis system
- Inhibitory Effect of Methyl Methanethiosulfinate on β-Glucuronidase Activity