画像と音声情報の併用による雑音に頑強な発話検出

概要

論文の詳細を見る
In this paper, we propose a method to detect speech by audio and visual modalities. It is well known that the accuracy of speech detection affects speech recognition accuracy. Because the detection by audio modality is intrinsically disturbed by audio noise, we have researched on the video modality speech detection. The method is not only robust to the audio noise, but also robust to the speaker's motion and other video modality disturbances. However, the accuracy of detection is less accurate because the duration of speech motion is intrinsically longer than the duration of speech. Thus, we propose a bimodal speech detection method. Proposed method is able to eliminate the false detection caused by audio noise. The experiment confirms that the proposed method improves the word accuracy not only in clean condition, but also in the noisy condition(SNR 10dB).
一般社団法人情報処理学会の論文
2001-07-13