Applying generation process model constraint to fundamental frequency contours generated by hidden-Markov-model-based speech synthesis
スポンサーリンク
概要
- 論文の詳細を見る
Speech synthesis based on hidden Markov models (HMMs) processes both segmental and prosodic features of speech together in a frame-by-frame manner. One benefit of this method is that time alignment of both features is kept automatically. However, when the training data are limited, frame-by-frame representation is not appropriate for prosodic features, which tightly related to speech units spreading a wide time span, such as words, phrases and so on. This causes an inherit problem in fundamental frequency (F0) contour generation by HMM-based speech synthesis. A method is developed to modify F0 contours in the framework of generation process model (henceforth, F0 model) by referring to linguistic information of input text (word boundary and accent type). It takes F0 variances obtained through HMM-based speech synthesis into account during the process. Through a listening experiment on synthetic speech, the method is proved to generate better quality as compared to the HMM-based speech synthesis on average. Since the F0 model can clearly relate its commands and linguistic (and para-/non- linguistic) information, the method has an additional advantage; changing speech styles, and/or adding further information (such as emphasis) can be easily done through manipulating the commands.
- 一般社団法人 日本音響学会の論文
著者
-
MINEMATSU Nobuaki
Graduate School of Information Science and Technology, The University of Tokyo
-
HIROSE Keikichi
Graduate School of Information Science and Technology, The University of Tokyo
-
Matsuda Tetsuya
Graduate School of Information Science and Technology, The University of Tokyo
関連論文
- Regularized Maximum Likelihood Linear Regression Adaptation for Computer-Assisted Language Learning Systems
- Speaker Verification in Realistic Noisy Environment in Forensic Science
- Applying generation process model constraint to fundamental frequency contours generated by hidden-Markov-model-based speech synthesis