Voice Activity Detection Based on High Order Statistics and Online EM Algorithm
スポンサーリンク
概要
- 論文の詳細を見る
A new online, unsupervised voice activity detection (VAD) method is proposed. The method is based on a feature derived from high-order statistics (HOS), enhanced by a second metric based on normalized autocorrelation peaks to improve its robustness to non-Gaussian noises. This feature is also oriented for discriminating between close-talk and far-field speech, thus providing a VAD method in the context of human-to-human interaction independent of the energy level. The classification is done by an online variation of the Expectation-Maximization (EM) algorithm, to track and adapt to noise variations in the speech signal. Performance of the proposed method is evaluated on an in-house data and on CENSREC-1-C, a publicly available database used for VAD in the context of automatic speech recognition (ASR). On both test sets, the proposed method outperforms a simple energy-based algorithm and is shown to be more robust against the change in speech sparsity, SNR variability and the noise type.
- (社)電子情報通信学会の論文
- 2008-12-01
著者
-
Kawahara Tatsuya
School Of Informatics Kyoto University
-
Cournapeau David
School Of Informatics Kyoto University
関連論文
- Modeling and automatic detection of English sentence stress for computer-assisted English prosody learning system
- Voice Activity Detection Based on High Order Statistics and Online EM Algorithm
- Language Model Adaptation Based on PLSA of Topics and Speakers for Automatic Transcription of Panel Discussions(Spoken Language Systems, Corpus-Based Speech Technologies)
- Dialogue Speech Recognition by Combining Hierarchical Topic Classification and Language Model Switching(Spoken Language Systems, Corpus-Based Speech Technologies)
- Difference of acoustic modeling for read speech and dialogue speech