Physiologically-Based Speech Synthesis Using Neural Networks (Special Section on Speech Synthesis: Current Technologies and Equipment)
スポンサーリンク
概要
- 論文の詳細を見る
This paper focuses on two areas in our effort to synthesize speech from neuromotor input using neural network models that effect transforms between cognitive intentions to speak, their physiological effects on vocal tract structures, and subsequent realization as acoustic signals. The first area concerns the biomechanical transform between motor commands to muscles and the ensuing articulator behavior. Using physiological data of muscle EMG (electromyography) and articulator movements during natural English speech utterances, three articulator-specific neural networks learn the forward dynamics that relate motor commands to the muscles and motion of the tongue, jaw, and lips. Compared to a fully-connected network, mapping muscle EMG and motion for all three sets of articulators at once, this modular approach has improved performance by reducing network complexity and has eliminated some of the confounding influence of functional coupling among articulators. Network independence has also allowed us to identify and assess the effects of technical and empirical limitations on an articulator-by-articulator basis. This is particularly important for modeling the tongue whose complex structure is very difficult to examine empirically. The second area of progress concerns the transform between articulator motion and the speech acoustics. From the articulatory movement trajectories, a second neural network generates PARCOR (partial correlation) coefficients which are then used to synthesize the speech acoustics. In the current implementation, articulator velocities have been added as the inputs to the network. As a result, the model now follows the fast changes of the coefficients for consonants generated by relatively slow articulatory movements during natural English utterances. Although much work still needs to be done, progress in these areas brings us closer to our goal of emulating speech production processes computationally.
- 社団法人電子情報通信学会の論文
- 1993-11-25
著者
-
KAWATO Mitsuo
ATR Human Information Processing Research Laboratories
-
Kawato M
Atr Computational Neurosci. Lab. Kyoto Jpn
-
Hirayama M
Ulsi Laboratory Mitsubishi Electric Corporation
-
Hirayama Makoto
Lsi R&d Lab. Mitsubishi Electric Corp.
-
Hirayama Makoto
ATR Human Information Processing Research Laboratories
-
Bateson EricVatikiotis
ATR Human Information Processing Research Laboratories
-
Kawato Mitsuo
Atr Human Information Processing Res. Labs.
関連論文
- Impact of Organic Contaminants from the Environment of Electrical Characteristics of Thin Gate Oxides
- Highly Reliable SiO_2 Films Formed by UV-O_2 Oxidation
- Highly Reliable SiO_2 Films Formed by UV-O_2 Oxidation
- Quantitative examinations for multi joint arm trajectory planning-using a robust calculation algorithm of the minimum commanded torque change trajectory
- A Kendama Learning Robot Based on Bi-directional Theory
- High-Quality CVD/Thermal Stacked Gate Oxide Films with Hydrogen-Free CVD SiO_2 Formed in a SiCl_4-N_2O System
- Kinetic Study of Silicon Nitride Growth from Dichlorosilane and Ammonia
- Purkinje Cell Activity in the Middle Zone of the Cerebellar Flocculus during Optokinetic and Vestibular Eye Movement in Cats
- Quantitative Examinations for Human Arm Trajectory Planning in Three-Dimensional Space
- Optical Absorption in Silicon Oxide Film Near the SiO_2/Si Interface
- Optical Absorption in Ultrathin Silicon Oxide Film (SOLID STATE DEVICES AND MATERIALS 1)
- Fabrication of Storage Capacitance-Enhanced Capacitors with a Rough Electrode
- Acquisition and contextual switching of multiple internal models for different viscous force fields
- Acquisition of Multiple Internal Models under Multiple Viscous Force Fields
- 7-6 Perceived Motion in Structure-from-Motion : Pointing Responses to the Axis of Rotation
- Refractive Index Distribution in Photoresist Thin Film Formed by the Spin Coating Method
- Adhesion Improvement of Photoresist on TiN/Al Multilayer by Ozone Treatment
- Feedforward impedance control efficiently reduce motor variability
- Physiologically-Based Speech Synthesis Using Neural Networks (Special Section on Speech Synthesis: Current Technologies and Equipment)
- A mathematical analysis of the characteristics of the system connecting the cerebellar ventral paraflocculus and extraoculomotor nucleus of alert monkeys during upward ocular following responses
- A mathematical model that reproduces vertical ocular following responses from visual stimuli by reproducing the simple spike firing frequency of Purkinje cells in the cerebellum
- Introduction : 1999 Special Issue Organisation of computation in brain-line systems
- Estimation of Arm Posture in 3D-Space from Surface EMG Signals Using a Neural Network Model (Special Issue on Neurocomputing)
- A tennis serve and upswing learning robot based on bi-directional theory
- A Computational Model for Recognizing Objects and Planning Hand Shapes in Grasping Movements