The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006
スポンサーリンク
概要
- 論文の詳細を見る
We describe a statistical parametric speech synthesis system developed by a joint group from the Nagoya Institute of Technology (Nitech) and the Nara Institute of Science and Technology (NAIST) for the annual open evaluation of text-to-speech synthesis systems named Blizzard Challenge 2006. To improve our 2005 system (Nitech-HTS 2005), we investigated new features such as mel-generalized cepstrum-based line spectral pairs (MGC-LSPs), maximum likelihood linear transform (MLLT), and a full covariance global variance (GV) probability density function (pdf). A combination of mel-cepstral coefficients, MLLT, and full covariance GV pdf scored highest in subjective listening tests, and the 2006 system performed significantly better than the 2005 system. The Blizzard Challenge 2006 evaluations show that Nitech-NAIST-HTS 2006 is competitive even when working with relatively large speech databases.
- (社)電子情報通信学会の論文
- 2008-06-01
著者
-
ZEN Heiga
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
TODA Tomoki
Graduate School of Information Science, Nara Institute of Science and Technology
-
TOKUDA Keiichi
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
Toda Tomoki
Nara Inst. Sci. And Technol. Ikoma‐shi Jpn
-
Toda Tomoki
Nara Inst. Of Sci. And Technol. Ikoma‐shi Jpn
-
Toda Tomoki
The Graduate School Of Information Science Nara Institute Of Science And Technology
-
Toda Tomoki
Graduate School Of Information Science Nara Institute Of Science And Technology
-
Zen Heiga
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Tokuda Keiichi
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Tokuda Keiichi
National Inst. Information And Communications Technol. Kyoto‐fu Jpn
関連論文
- The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006
- Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005(Speech and Herring)
- A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis(Speech and Hearing)
- Applying Sparse KPCA for Feature Extraction in Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- On the Use of Kernel PCA for Feature Extraction in Speech Recognition(Speech and Hearing)
- The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006
- A Hidden Semi-Markov Model-Based Speech Synthesis System(Speech and Hearing)
- State Duration Modeling for HMM-Based Speech Synthesis(Speech and Hearing)
- A Training Method of Average Voice Model for HMM-Based Speech Synthesis(Digital Signal Processing)
- A Context Clustering Technique for Average Voice Models (Special Issue on Speech Information Processing)
- Multi-Space Probability Distribution HMM(Special Issue on the 2000 IEICE Excellent Paper Award)
- A Reordering Model Using a Source-Side Parse-Tree for Statistical Machine Translation
- Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method
- Cost Reduction of Acoustic Modeling for Real-Environment Applications Using Unsupervised and Selective Training
- Reducing Computation Time of the Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics(Speech and Hearing)
- Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models(Speech Recognition, Statistical Modeling for Speech Processing)
- Utterance-Based Selective Training for the Automatic Creation of Task-Dependent Acoustic Models(Speech Recognition, Statistical Modeling for Speech Processing)
- Designing Target Cost Function Based on Prosody of Speech Database(Speech Synthesis and Prosody, Corpus-Based Speech Technologies)
- Cross-language Voice Conversion Evaluation Using Bilingual Databases (特集 音声言語情報処理とその応用)
- A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System
- LMS-Based Algorithms with Multi-Band Decomposition of the Estimation Error Applied to System Identification (Special Section on Digital Signal Processing)
- Multi-Band Decomposition of the Linear Prediction Error Applied to Adaptive AR Spectral Estimation
- Adaptive AR Spectral Estimation Based on Wavelet Decomposition of the Linear Prediction Error
- A Covariance-Typing Technique for HMM-Based Speech Synthesis
- Parameter Sharing in Mixture of Factor Analyzers for Speaker Identification(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Evaluation of Extremely Small Sound Source Signals Used in Speaking-Aid System with Statistical Voice Conversion
- Improvements of the One-to-Many Eigenvoice Conversion System
- Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models
- Adaptive Training for Voice Conversion Based on Eigenvoices
- Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Continuous Speech Recognition Based on General Factor Dependent Acoustic Models(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Bayesian Context Clustering Using Cross Validation for Speech Recognition
- Speech recognition based on statistical models including multiple phonetic decision trees
- A Bayesian Framework Using Multiple Model Structures for Speech Recognition
- Speaker interpolation for HMM-based speech synthesis system