A Covariance-Typing Technique for HMM-Based Speech Synthesis
スポンサーリンク
概要
- 論文の詳細を見る
A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.
- (社)電子情報通信学会の論文
- 2010-03-01
著者
-
ZEN Heiga
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
TOKUDA Keiichi
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
Zen Heiga
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Tokuda K
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Tokuda Keiichi
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
NANKAKU Yoshihiko
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
Tokuda Keiichi
The Department Of Computer Science Nagoya Institute Of Technology
-
OURA Keiichiro
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
LEE Akinobu
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
Lee Akinobu
Department Of Computer Science Nagoya Institute Of Technology
-
Lee Akinobu
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Oura Keiichiro
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Nankaku Yoshihiko
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Tokuda Keiichi
Department Of Computer Science Naogya Institute Of Technology
-
Zen Heiga
Department Of Computer Science Naogya Institute Of Technology
-
Heiga Zen
Department of Computer Science and Engineering, Nagoya Institute of Technology
関連論文
- The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006
- Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005(Speech and Herring)
- Applying Sparse KPCA for Feature Extraction in Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- On the Use of Kernel PCA for Feature Extraction in Speech Recognition(Speech and Hearing)
- The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006
- A Hidden Semi-Markov Model-Based Speech Synthesis System(Speech and Hearing)
- State Duration Modeling for HMM-Based Speech Synthesis(Speech and Hearing)
- A Training Method of Average Voice Model for HMM-Based Speech Synthesis(Digital Signal Processing)
- A Context Clustering Technique for Average Voice Models (Special Issue on Speech Information Processing)
- Speaker Adaptation of Pitch and Spectrum for HMM-Based Speech Synthesis
- Multi-Space Probability Distribution HMM(Special Issue on the 2000 IEICE Excellent Paper Award)
- Vector Quantization of Speech Spectral Parameters Using Statistics of Static and Dynamic Features
- Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-Space Probability Distribution (Special Issue on Biometric Person Authentication)
- A Reordering Model Using a Source-Side Parse-Tree for Statistical Machine Translation
- Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models(Speech Recognition, Statistical Modeling for Speech Processing)
- A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System
- Mixture Density Models Based on Mel-Cepstral Representation of Gaussian Process(Digital Signal Processing)
- A 16kb/s Wideband CELP-Based Speech Coder Using Mel-Generalized Cepstral Analysis
- Non-Audible Murmur (NAM) Recognition Exploiting Adaptation Techniques
- Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System
- 複数モデルを用いた十分統計量に基く教師なし話者適応における学習話者のクラス化の検討
- LMS-Based Algorithms with Multi-Band Decomposition of the Estimation Error Applied to System Identification (Special Section on Digital Signal Processing)
- Multi-Band Decomposition of the Linear Prediction Error Applied to Adaptive AR Spectral Estimation
- Adaptive AR Spectral Estimation Based on Wavelet Decomposition of the Linear Prediction Error
- A Covariance-Typing Technique for HMM-Based Speech Synthesis
- Unsupervised speaker adaptation for speech-to-speech translation system (言語理解とコミュニケーション)
- Unsupervised speaker adaptation for speech-to-speech translation system (音声)
- Parameter Sharing in Mixture of Factor Analyzers for Speaker Identification(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Continuous Speech Recognition Based on General Factor Dependent Acoustic Models(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Bayesian Context Clustering Using Cross Validation for Speech Recognition
- Reformulating the HMM as a Trajectory Model
- Reformulating the HMM as a Trajectory Model
- Reformulating the HMM as a Trajectory Model
- Speech recognition based on statistical models including multiple phonetic decision trees
- A Bayesian Framework Using Multiple Model Structures for Speech Recognition
- An Extension of Separable Lattice 2-D HMMs for Rotational Data Variations
- Speaker interpolation for HMM-based speech synthesis system
- A Bayesian Framework Using Multiple Model Structures for Speech Recognition