Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005(Speech and Herring)
スポンサーリンク
概要
- 論文の詳細を見る
In January 2005, an open evaluation of corpus-based text-to-speech synthesis systems using common speech datasets, named Blizzard Challenge 2005, was conducted. Nitech group participated in this challenge, entering an HMM-based speech synthesis system called Nitech-HTS 2005. This paper describes the technical details,, building processes, and performance of our system. We first give an overview of the basic HMM-based speech synthesis system, and then describe new features integrated into Nitech-HTS 2005 such as STRAIGHT-based vocoding, HSMM-based acoustic modeling, and a speech parameter generation algorithm considering GV. Constructed Nitech-HTS 2005 voices can generate speech waveforms at 0.3×RT (real-time ratio) on a 1.6GHz Pentium 4 machine, and footprints of these voices are less than 2 Mbytes. Subjective listening tests showed that the naturalness and intelligibility of the Nitech-HTS 2005 voices were much better than expected.
- 2007-01-01
著者
-
ZEN Heiga
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
TODA Tomoki
Graduate School of Information Science, Nara Institute of Science and Technology
-
TOKUDA Keiichi
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
NAKAMURA Masaru
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
Toda Tomoki
Nara Inst. Sci. And Technol. Ikoma‐shi Jpn
-
Toda Tomoki
Nara Inst. Of Sci. And Technol. Ikoma‐shi Jpn
-
Toda Tomoki
The Graduate School Of Information Science Nara Institute Of Science And Technology
-
Toda Tomoki
Graduate School Of Information Science Nara Institute Of Science And Technology
-
Zen Heiga
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Tokuda Keiichi
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Tokuda Keiichi
National Inst. Information And Communications Technol. Kyoto‐fu Jpn
-
Nakamura Masaru
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Nakamura Masaru
Department Of Basic Life Science Faculty Of Medicine Teikyo University
-
Nakamura Masaru
Department Of Applied Chemistry Faculty Of Science And Engineering Kinki University
関連論文
- The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006
- Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005(Speech and Herring)
- A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis(Speech and Hearing)
- Applying Sparse KPCA for Feature Extraction in Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- On the Use of Kernel PCA for Feature Extraction in Speech Recognition(Speech and Hearing)
- Exogenous Expression of Interferon-β in Cultured Brain Microvessel Endothelial Cells(Biochemistry/Molecular Biology)
- Expression and Visualization of a Human Interferon-β-Enhanced Green Fluorescent Protein Chimeric Molecule in Cultured Cells (Biochemistry/Molecular Biology)
- The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006
- A Hidden Semi-Markov Model-Based Speech Synthesis System(Speech and Hearing)
- State Duration Modeling for HMM-Based Speech Synthesis(Speech and Hearing)
- A Training Method of Average Voice Model for HMM-Based Speech Synthesis(Digital Signal Processing)
- A Context Clustering Technique for Average Voice Models (Special Issue on Speech Information Processing)
- Multi-Space Probability Distribution HMM(Special Issue on the 2000 IEICE Excellent Paper Award)
- A Reordering Model Using a Source-Side Parse-Tree for Statistical Machine Translation
- Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method
- Cost Reduction of Acoustic Modeling for Real-Environment Applications Using Unsupervised and Selective Training
- Reducing Computation Time of the Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics(Speech and Hearing)
- Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models(Speech Recognition, Statistical Modeling for Speech Processing)
- Utterance-Based Selective Training for the Automatic Creation of Task-Dependent Acoustic Models(Speech Recognition, Statistical Modeling for Speech Processing)
- Designing Target Cost Function Based on Prosody of Speech Database(Speech Synthesis and Prosody, Corpus-Based Speech Technologies)
- Cross-language Voice Conversion Evaluation Using Bilingual Databases (特集 音声言語情報処理とその応用)
- A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System
- EXPRESSION OF ESTROGEN RECEPTORS AND STEROIDOGENIC ENZYMES DURING GONADAL DIFFERENTIATION IN A TELEOST FISH, TILAPIA NIROTICUS(Developmental Biology)(Proceedings of the Seventieth Annual Meeting of the Zoological Society of Japan)
- Fish 3β-Hydroxysteroid Dehydrogenase/Δ^5-Δ^4 Isomerase : Antibody Production and Their Use for the Immunohistochemical Detection of Fish Steroidogenic Tissues
- LMS-Based Algorithms with Multi-Band Decomposition of the Estimation Error Applied to System Identification (Special Section on Digital Signal Processing)
- Multi-Band Decomposition of the Linear Prediction Error Applied to Adaptive AR Spectral Estimation
- Regulation of Tissue-Type Plasminogen Activator(tPA)and Type-1 Plasminogen Activator Inhibitor(PAI-1)Gene Expression in Rat Hepatocytes in Primary Culture^1
- A Facile Preparation and Properties of (2E, 4E, 6E, 8E)-1-(3-Guaiazulenyl)-3, 7-dimethyl-9-(2, 6, 6-trimethyl-1-cyclohexen-1-yl)-2, 4, 6, 8-nonatetraen-1-ylium Hexafluorophosphate
- Adaptive AR Spectral Estimation Based on Wavelet Decomposition of the Linear Prediction Error
- A Covariance-Typing Technique for HMM-Based Speech Synthesis
- Energy Metabolism of Sea Urchin Spermatozoa : An Approach Based on Echinoid Phylogeny
- Parameter Sharing in Mixture of Factor Analyzers for Speaker Identification(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- The Ultrastructures of Atypical and Anaplastic Meningiomas
- Ultrastructural Study of Endogenous Energy Substrates in Spermatozoa of the Sea Urchins Arbacia lixula and Paracentrotus lividus
- Evaluation of Extremely Small Sound Source Signals Used in Speaking-Aid System with Statistical Voice Conversion
- Improvements of the One-to-Many Eigenvoice Conversion System
- Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models
- Adaptive Training for Voice Conversion Based on Eigenvoices
- Behavioural Studies on Schooling of Fishes II : Leading-Following Relationship of the Immature Yellowtail, Seriola quinqueradiata Temminck et Schlegel in Captivity
- Phosphatidylcholine Is an Endogenous Substrate for Energy Metabolism in Spermatozoa of Sea Urchins of the Order Echinoidea
- Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Continuous Speech Recognition Based on General Factor Dependent Acoustic Models(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Reproductive Characteristics of Precociously Mature Triploid Male Masu Salmon, Oncorhynchus masou
- Innervation of Steroid-Producing Cells in the Ovary of Tilapia Oreochromis niloticus
- Bayesian Context Clustering Using Cross Validation for Speech Recognition
- Speech recognition based on statistical models including multiple phonetic decision trees
- Studies of the Conditioned Reflex of Fish in Groups : II. Conditioned Reflex in Two Groups of Goldfish, Carassius auratus
- Studies of the Conditioned Reflex of Fish in Groups I : Relationship between the Conditioning Speed of Individuals and the Status in Social Hierarchy in a Group of Swordtails, Xiphophorus helleri
- Significant Correlation between Chromosomal Aberration and Nuclear Morphology in Urothelial Carcinoma
- A Bayesian Framework Using Multiple Model Structures for Speech Recognition
- Speaker interpolation for HMM-based speech synthesis system
- Metabolic Effects of Sodium Valproate on Atypical Antipsychotics in Japanese Psychotic Patients
- Duloxetine-induced Hyponatremia in the Elderly