HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation

元データ 2009-03-01 (社)電子情報通信学会

概要

This paper presents methods for controlling the intensity of emotional expressions and speaking styles of an arbitrary speakers synthetic speech by using a small amount of his/her speech data in HMM-based speech synthesis. Model adaptation approaches are introduced into the style control technique based on the multiple-regression hidden semi-Markov model (MRHSMM). Two different approaches are proposed for training a target speakers MRHSMMs. The first one is MRHSMM-based model adaptation in which the pretrained MRHSMM is adapted to the target speakers model. For this purpose, we formulate the MLLR adaptation algorithm for the MRHSMM. The second method utilizes simultaneous adaptation of speaker and style from an average voice model to obtain the target speakers style-dependent HSMMs which are used for the initialization of the MRHSMM. From the result of subjective evaluation using adaptation data of 50 sentences of each style, we show that the proposed methods outperform the conventional speaker-dependent model training when using the same size of speech data of the target speaker.

著者

Masuko Takashi 近畿大学 薬学部細胞生物学
Tachibana Makoto Interdisciplinary Graduate School Of Science And Engineering Tokyo Institute Of Technology:(present
Masuko Takashi Department Of Agricultural And Biological Chemistry College Of Bioresource Sciences Nihon University
NOSE Takashi Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
Tachibana Makoto Interdisciplinary Graduate School Of Science And Engineering Tokyo Institute Of Technology
Nose Takashi Interdisciplinary Graduate School Of Science And Engineering Tokyo Institute Of Technology
Kobayashi Takao Interdisciplinary Graduate School Of Science And Engineering Tokyo Institute Of Technology
Matsuyama Taiji Department Of Pharmacy Shizuoka Kousei Hospital

関連論文

▼もっと見る