テキスト及び音声からの唇動画像の自動生成

概要

論文の詳細を見る
This paper presents a technique for synthesizing lip movements that synchronize with given utterances based on HMM. In the training stage of the technique, speech unit HMMs are trained with audio and visual parameter vector sequences that represent speech and mouth shapes. Then speech unit HMMS are splitted into speech and visual parameter parts. In the recognition stage, input speech is converted into a transcription and a state sequence using the speech part of the HMMs. In the synthesis stage, a sentence HMM is constructed by concatenating visual parameter part of the HMMs corresponding to the transcription for the given speech. Then an optimum parameter vector sequence in an ML sense is obtained from the sentence HMM. The generated parameter sequence reflects statistical information of both static and dynamic features, and synthetic lip animation becomes quite smooth and natural.
1998-06-01