Conversational Speech Synthesis (and the need for some laughter)

概要

論文の詳細を見る
This paper describes our recent work on the synthesis of conversational speech; a result of collecting a very large corpus of expressive speech in normal everyday situations. With recent developments in concatenative techniques, speech synthesis has overcome the barrier of realistically portraying extra-linguistic information by using the actual voice of a recognisable person as a source for units, combined with minimal use of signal processing, but it still faces the barrier of expressing paralinguistic information, i.e., the variety in the types of speech and laughter that a person might use in everyday social interactions. Paralinguistic modification of an utterance portrays the speaker's affective states and shows his or her relationships with the speaker through variations in the manner of speaking, by means of prosody and voice quality. These inflections are carried on the propositional content of an utterance, and can perhaps be modelled by rule, but they are also expresssed through non-verbal utterances, the complexity of which may be beyond the capabilities of many current synthesis methods. We show that this problem can be solved by the use of phrase-sized utterance units taken intact from a large corpus.
社団法人電子情報通信学会の論文
2005-05-20