Audio-Visual Speech Recognition Based on Optimized Product HMMs and GMM Based-MCE-GPD Stream Weight Estimation (<Special Issue>Special Issue on Speech Information Processing)
スポンサーリンク
概要
- 論文の詳細を見る
In this paper, we describe an adaptive integration method for an audio-visual speech recognition system that, uses not only the speaker's audio speech signal but visual speech signals like lip images. Human beings communicate with each other by integrating multiple types of sensory information such as hearing and vision. Such integration can be applied to automatic speech recognition, too. In the integration of audio and visual speech features for speech recognition, there are two important issues, i.e., (1) a model that represents the synchronous and asynchronous characteristics between audio and visual features, and makes the best use of a whole database that includes uni-modal, audio only, or visual only data as well as audio-visual data, and (2) the adaptive estimation of reliability weights for the, audio and visual information. This paper mainly investigates two issues and proposes a novel method to effectively integrate audio and visual information in an audio-visual Automatic Speech Recognition (ASR) system. First, as the model that integrates audio-visual speech information, we apply a product of hidden Markov models (product HMM), the product of an audio HMM and a visual HMM. We newly propose a method that re-estimates the product HMM using audio-visual synchronous speech data so as to train the synchronicity of the audio-visual information, while the original product HMM assumes independence from audio-visual features. Second, for the optimal audio-visual information reliability weight estimation, we propose a Gaussian mixture model (GMM) based-MCE-GPD (minimum classification error and generalized probabilistic descent) algorithm, which enables reductions in the amount of adaptation data and amount of computations required for the GMM estimation. Evaluation experiments show that the proposed audio-visual speech recognition system improves the recognition accuracy over conventional ones even if the audio signals are clean.
- 2003-03-01
著者
-
Nakamura Satoshi
Atr Spoken Language Communication Res. Lab. Kyoto‐fu Jpn
-
KUMATANI Kenichi
Graduate School of Information Science, Nara Institute of Science and Technology
-
KUMATANI Kenichi
Graduate School of Information Science, Nara Institute of Science and Technology:(present address)Sharp Corporation
関連論文
- Noise and Channel Distortion Robust ASR System for DARPA SPINE2 Task (Special Issue on Speech Information Processing)
- A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech (Special Issue on Speech Information Processing)
- AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition(Speech Corpora and Related Topics, Corpus-Based Speech Technologies)
- Missing Feature Theory Applied to Robust Speech Recognition over IP Network(Speech Dynamics by Ear, Eye, Mouth and Machine)
- CENSREC-3: An Evaluation Framework for Japanese Speech Recognition in Real Car-Driving Environments(Speech and Hearing)
- A Design for a Collaborative Steering System of Microphone Array and Video Camera Toward Multi-Lingual Tele-Conference (特集 インタラクション技術の革新と実用化)
- A design of adaptive beamformer based on average speech spectrum for noisy speech recognition
- A Microphone Array-Based 3-D N-Best Search Method for Recognizing Multiple Sound Sources
- The present status, progress, and usage of speech databases in Japan
- IMPROVING ACCURACY IN PARAMETER ESTIMATION IN AN EXTENDED KALMAN PARTICLE FILTERS FOR NOISY SPEECH RECOGNITION
- ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles(Speech Recognition, Statistical Modeling for Speech Processing)
- Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation(Life-like Agent and its Communication)
- Passive hybrid subtractive beamformer for near-field sound sources
- An Acoustic Modeling Method Robustagainst Changes of Speaking Stylein Error Recovery
- A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency(Speech Recognition, Statistical Modeling for Speech Processing)
- Improving Acoustic Model Precision by Incorporating a Wide Phonetic Context Based on a Bayesian Framework(Speech Recognition, Statistical Modeling for Speech Processing)
- A Hybrid HMM/BN Acoustic Model for Automatic Speech Recognition (Special Issue on Speech Information Processing)
- MIXTURE OF FACTOR ANALYZED HMM
- Iterative Estimation and Compensation of Signal Direction for Moving Sound Source by Mobile Microphone Array(Engineering Acoustics)
- TIME-VARYING NOISE COMPENSATION BY SEQUENTIAL MONTE CARLO METHOD
- Burst Error Recovery for Huffman Coding(Algorithm Theory)
- Audio-Visual Speech Recognition Based on Optimized Product HMMs and GMM Based-MCE-GPD Stream Weight Estimation (Special Issue on Speech Information Processing)