Omnidirectional Audio-Visual Talker Localization Based on Dynamic Fusion of Audio-Visual Features Using Validity and Reliability Criteria
スポンサーリンク
概要
- 論文の詳細を見る
This paper proposes a robust omnidirectional audio-visual (AV) talker localizer for AV applications. The proposed localizer consists of two innovations. One of them is robust omnidirectional audio and visual features. The direction of arrival (DOA) estimation using an equilateral triangular microphone array, and human position estimation using an omnidirectional video camera extract the AV features. The other is a dynamic fusion of the AV features. The validity criterion, called the audioor visual-localization counter, validates each audio- or visual-feature. The reliability criterion, called the speech arriving evaluator, acts as a dynamic weight to eliminate any prior statistical properties from its fusion procedure. The proposed localizer can compatibly achieve talker localization in a speech activity and user localization in a non-speech activity under the identical fusion rule. Talker localization experiments were conducted in an actual room to evaluate the effectiveness of the proposed localizer. The results confirmed that the talker localization performance of the proposed AV localizer using the validity and reliability criteria is superior to that of conventional localizers.
- (社)電子情報通信学会の論文
- 2008-03-01
著者
-
DENDA Yuki
Ritsumeikan University
-
Denda Yuki
Ritsumeikan Univ.
-
Yamashita Yoichi
College Of Information Science And Engineering Ritsumeikan University
-
Nishiura Takanobu
College Of Information Science And Engineering Ritsumeikan University
-
DENDA Yuki
Graduate School of Science and Engineering, Ritsumeikan University
-
Denda Yuki
Graduate School Of Science And Engineering Ritsumeikan University
-
Nishiura Takanobu
Ritsumeikan Univ. Kusatsu‐shi Jpn
-
Yamashita Yoichi
Ritsumeikan Univ. Kusatsu‐shi Jpn
関連論文
- CENSREC-1-C : An evaluation framework for voice activity detection under noisy environments
- AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition(Speech Corpora and Related Topics, Corpus-Based Speech Technologies)
- Omnidirectional Audio-Visual Talker Localization Based on Dynamic Fusion of Audio-Visual Features Using Validity and Reliability Criteria
- Robust Talker Direction Estimation Based on Weighted CSP Analysis and Maximum Likelihood Estimation(Speech Enhancement, Statistical Modeling for Speech Processing)
- Multiple Sound Source Localization Based on Inter-Channel Correlation Using a Distributed Microphone System in a Real Environment
- Construction of a Test Collection for Spoken Document Retrieval from Lecture Audio Data
- Multiple-nulls-steering beamformer based on both talker and noise direction-of-arrival estimation