Continuous Speech Recognition Based on General Factor Dependent Acoustic Models(Feature Extraction and Acoustic Medelings, <Special Section>Corpus-Based Speech Technologies)
スポンサーリンク
概要
- 論文の詳細を見る
This paper describes continuous speech recognition incorporating the additional complement information, e.g., voice characteristics, speaking styles, linguistic information and noise environment, into HMM-based acoustic modeling. In speech recognition systems, context-dependent HMMs, i.e., triphone, and the tree-based context clustering have commonly been used. Several attempts to utilize not only phonetic contexts, but additional complement information based on context (factor) dependent HMMs have been made in recent years. However, when the additional factors for testing data are unobserved, methods for obtaining factor labels is required before decoding. In this paper, we propose a model integration technique based on general factor dependent HMMs for decoding. The integrated HMMs can be used by a conventional decoder as standard triphone HMMs with Gaussian mixture densities. Moreover, by using the results of context clustering, the proposed method can determine an optimal number of mixture components for each state dependently of the degree of influence from additional factors. Phoneme recognition experiments using voice characteristic labels show significant improvements with a small number of model parameters, and a 19.3% error reduction was obtained in noise environment experiments.
- 社団法人電子情報通信学会の論文
- 2005-03-01
著者
-
Zen Heiga
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Tokuda Keiichi
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
KITAMURA Tadashi
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
MIYAJIMA Chiyomi
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
Suzuki Hiroyuki
Department Of Applied Physics School Of Science And Engineering Waseda University:(present Address)h
-
Miyajima Chiyomi
Department Of Media Science Nagoya University
-
Kitamura Tadashi
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Kitamura Tadashi
Department Of Cardiothoracic Surgery The University Of Tokyo
-
Nankaku Yoshihiko
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
Tokuda Keiichi
Department Of Computer Science Naogya Institute Of Technology
-
Zen Heiga
Department Of Computer Science Naogya Institute Of Technology
-
Kitamura Tadashi
Department Of Cardiothoracic Surgery Faculty Of Medicine University Of Tokyo
-
Suzuki Hiroyuki
Department Of Computer Science And Engineering Nagoya Institute Of Technology:(present Address)denso Corporation
-
Suzuki Hiroyuki
Department Of Animal Science Faculty Of Agriculture Hokkaido University
関連論文
- High expression of Pirh2, an E3 ligase for p27, is associated with low expression of p27 and poor prognosis in head and neck cancers
- Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005(Speech and Herring)
- Symmetry Breaking in a Frustrated Heisenberg Spin System, ZnCr_2O_4 : I. Magnetic Measurements(Condensed matter: electronic structure and electrical, magnetic, and optical properties)
- Conversion from Total Cavopulmonary Shunt to Fontan Circulation : Improved Cyanosis with an 11-Year Interval
- Role of Neurofibromin in Modulation of Expression of the Tyroshinase-Related Protein 2 Gene
- Differential Tissue-Specific Expression of Neurofibromin Isoform mRNAs in Rat^1
- Evidence for the Presence of Two Amino-Terminal Isoforms of Neurofibromin, a Gene Product Responsible for Neurofibromatosis Type 1
- Crossover from Magnetic to Nonmagnetic Ground State in the Kondo Alloy System Ce (Ni_Pd_x) Sn
- Magnetic Properties of New Ternary Rare Earth Compounds RPt (Pd) Sb
- Merkel Cells in the Vellus Hair Follicles of Human Facial Skin : A Study Using Confocal Laser Microscopy
- MCH-01 DEVELOPMENT OF A NOVEL METHOD FOR STRETCHING DNA FIBERS ON MICROBRIDGES FABRICATED BY SINGLE-MASK INCLINED UV LITHOGRAPHY(Micro/Nanomechatronics I,Technical Program of Oral Presentations)
- Applying Sparse KPCA for Feature Extraction in Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- On the Use of Kernel PCA for Feature Extraction in Speech Recognition(Speech and Hearing)
- Strong Anisotropy of Transport Properties in Metamagnetic CePtAs
- Long-term rectal temperature measurements in a patient with menstrual-associated sleep disorder
- Effects of nocturnal bright light on saliva melatonin, core body temperature and sleep propensity rhythms in human subjects
- Reservoir Competence of the Vole, Clethrionomys rufocanus bedfordiae, for Borrelia garinii or Borrelia afzelii
- COMMON PATHOGENIC MECHANISM IN DEVELOPMENT PROGRESSION OF LIVER INJURY CAUSED BY NON-ALCOHOLIC OR ALCOHOLIC STEATOHEPATITIS
- Transcriptional Activation of the Melanocyte-Specific Genes by the Human Homolog of the Mouse Microphthalmia Protein
- Tristetraprolin (TTP) gene polymorphisms in patients with rheumatoid arthritis and healthy individuals
- Purification and Some Properties of NADH-Dependent Sulfite Reductase from Escherichia coli Harboring Plasmid pTHS1,Which Has the S-Adenosyl-L-Methionine : Uroporphyrinogen III Methyltransferase Gene of Thiobacillus ferrooxidans
- A DNA Region That Complements on Escherichia coli cysG Mutation in Thiobacillus ferrooxidans
- NADH-dependent Sulfite Reductase Activity in the Periplasmic Space of Thiobacillus ferrooxidans
- Inhibition of Sulfur Use by Sulfite Ion in Thiobacillus ferrooxidans(Microbiology & Fermentation Industry)
- Purificaion and Some Properties of a Hydrogen Sulfide-binding Protein That Is Involved in Sulfur Oxidation of Thiobacillus ferrooxidans(Microbiology & Fermentation Industry)
- The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006
- A Hidden Semi-Markov Model-Based Speech Synthesis System(Speech and Hearing)
- State Duration Modeling for HMM-Based Speech Synthesis(Speech and Hearing)
- A Training Method of Average Voice Model for HMM-Based Speech Synthesis(Digital Signal Processing)
- A Context Clustering Technique for Average Voice Models (Special Issue on Speech Information Processing)
- Multi-Space Probability Distribution HMM(Special Issue on the 2000 IEICE Excellent Paper Award)
- Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-Space Probability Distribution (Special Issue on Biometric Person Authentication)
- Diurnal fluctuation of time perception under 30-h sustained wakefulness
- Time estimation during nocturnal sleep in human subjects
- Effects of small dose of brotizolam on P300
- A Reordering Model Using a Source-Side Parse-Tree for Statistical Machine Translation
- Clinical Experience with Cryopreserved Allografts for Aortic Infection
- Possible Involvement of 3-Dehydroteasterone in the Conversion of Teasterone to Typhasterol in Cultured Cells of Catharanthus roseus
- Serum levels of neutrophil activation cytokines in Kawasaki disease
- Evaluation of Pretreatment with Pleurotus ostreatus for Enzymatic Hydrolysis of Rice Straw(ENVIRONMENTAL BIOTECHNOLOGY)
- A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System
- Estimation of Collector Current Spreading in InGaAs SHBT Having 75-nm-Thick Collector
- First NMR Observation of Valence Fluctuation in Rare-Earth Compounds : ^Se NMR Studies of Temperature-Activated Valence Fluctuation in Sm_3Se_4
- Two nap sleep test : An easy objective sleepiness test
- Evidence-Based Infection Control in Thoracic Surgery
- Mixture Density Models Based on Mel-Cepstral Representation of Gaussian Process(Digital Signal Processing)
- Identification of Brassinolide, Castasterone, Typhasterol, and Teasterone from the Pollen of Lilium elegans
- Pseudoaneurysm Developed after Aortic Root Homograft Implantation
- Circadian fluctuation of time perception in healthy human subjects
- Comparison of OspA Serotypes for Borrelia burgdorferi Sensu Lato from Japan, Europe and North America
- Presence of Common Antigenic Epitope in Outer Surface Protein(Osp)A and OspB of Japanese Isolates Identified as Borrelia garinii
- REGULATION OF TGF-β SIGNALING AND ITS ROLES IN PROGRESSION OF TUMORS
- Regulation of TGF-β signaling and its roles in progression of tumors
- Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- LMS-Based Algorithms with Multi-Band Decomposition of the Estimation Error Applied to System Identification (Special Section on Digital Signal Processing)
- Multi-Band Decomposition of the Linear Prediction Error Applied to Adaptive AR Spectral Estimation
- Strategies for Treatment of Acute Aortic Dissection with Involvement of Sinus of Valsalva
- Biosynthesis of Brassinosteroids in Seedlings of Catharanthus roseus, Nicotiana tabacum, and Oryza sativa
- Adaptive AR Spectral Estimation Based on Wavelet Decomposition of the Linear Prediction Error
- A Covariance-Typing Technique for HMM-Based Speech Synthesis
- Changes in cuticle of scalp hair in mild acquired zinc deficiency: A study using scanning electron microscopy
- Lymphoepithelial Cyst in the Sublingual Region : Report of a case and review of literature
- Neoplasms in three patients following Kawasaki disease
- Characteristics of Multi-Layer Perceptron Models in Enhancing Degraded Speech
- Transforming growth factor-β signaling is differentially inhibited by Smad2D450E and Smad3D407E
- Self-Injection-Locked, Narrow-Linewidth, Flashlamp-Pumped Ti:Al_2O_3 Laser
- Parameter Sharing in Mixture of Factor Analyzers for Speaker Identification(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- A Case of Terminal Deletion of the Long Arm of Chromosome 4, del(4)(q33→ter)
- Non-Kramer's Doublet Ground State in PrPtBi
- Clinical Characteristics of Patients With Kawasaki Disease and Levels of Peripheral Endothelial Progenitor Cells and Blood Monocyte Subpopulations
- Construction of a System that Simultaneously Evaluates CYP1A1 and CYP1A2 Induction in a Stable Human-derived Cell Line using a Dual Reporter Plasmid
- Time estimation during sleep relates to the amount of slow wave sleep in humans
- Thermal Dissociation of Disilenes into Silylenes
- Estimation of collector current spreading in InGaAs SHBT having 75-nm-thick collector
- Estimation of collector current spreading in InGaAs SHBT having 75-nm-thick collector
- Pruritic Papular Eruptions and Candidiasis Due to HIV Infection
- Acquired Reactive Perforating Collagenosis with IgA Nephropathy
- Adaptive Nonlinear Regression Using Multiple Distributed Microphones for In-Car Speech Recognition(Speech Enhancement, Multi-channel Acoustic Signal Processing)
- A case of diaphragmatic clear cell carcinoma in a patient with a medical history of ovarian endometriosis
- Cerebrospinal Fluid Cytokines in Salmonella Urbana Encephalopathy
- Bullous Pemphigoid in an HB Virus Carrier : Interaction between Corticosteroids and HB Virus
- Edematous Changes in Mucosal Folds of the Internal Os of the Rabbit Cervix
- Lipid-rich Keratin Spherules Containing Cholesterol Crystals in an Epidermal Cyst : A Study Using Electron Microscopy
- ^Sn NMR Studies of the Heavy-Electron Compound U_3Au_3Sn_4
- Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Continuous Speech Recognition Based on General Factor Dependent Acoustic Models(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Successful treatment of refractory warts with topical vitamin D_3 derivative (maxacalcitol, 1α, 25-dihydroxy-22-oxacalcitriol) in 17 patients
- Coarse Columnar Structure of Transformation-Grown Ferrite in Pure Iron : On Wrought Iron and Sintered Iron
- Morphological Studies on the Oviductal Mucosa of the Mare
- Effects of Prostaglandin F2a on Egg Recovery from the Vagina and Egg Transport in Superovulated Rabbits
- Ciliation in Endometrial Epithelium of the Rabbit Following Ovariectomy
- Cervical Epithelium of the Rabbit Following Ovariectomy
- Bayesian Context Clustering Using Cross Validation for Speech Recognition
- Reformulating the HMM as a Trajectory Model
- Reformulating the HMM as a Trajectory Model
- Reformulating the HMM as a Trajectory Model
- Speech recognition based on statistical models including multiple phonetic decision trees
- On the use of two-mass vocal cord model in characterizing the stress speech (音声)
- A Bayesian Framework Using Multiple Model Structures for Speech Recognition
- Speaker interpolation for HMM-based speech synthesis system