Lip Location Normalized Training for Visual Speech Recognition
スポンサーリンク
概要
- 論文の詳細を見る
This paper describes a method to normalize the lip position for improving the performance of a visual-information-based speech recognition system. Basically, there are two types of information useful in speech recognition processes; the first one is the speech signal itself and the second one is the visual information from the lips in motion. This paper tries to solve some problems caused by using images from the lips in motion such as the effect produced by the variation of the lip location. The proposed lip location normalization method is based on a search algorithm of the lip position in which the location normalization is integrated into the model training. Experiments of speaker-independent isolated word recognition were carried out on the Tulips1 and M2VTS databases. Experiments showed a recognition rate of 74.5% and an error reduction rate of 35.7% for the ten digits word recognition M2VTS database.
- 社団法人電子情報通信学会の論文
- 2000-11-25
著者
-
Tokuda Keiichi
Nagoya Institute Of Technology
-
Tokuda Keiichi
The Authors Are With The Nagoya Institute Of Technology
-
KITAMURA Tadashi
The authors are with the Department of Information Processing, Interdisciplinary Graduate School of
-
Kitamura Tadashi
Nagoya Institute Of Technology
-
VANEGAS Oscar
The authors are with the Nagoya Institute of Technology
-
TOKUDA Keiichi
The author is with the Department of Computer Science, Nagoya Institute of Technology
関連論文
- Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-Space Probability Distribution (Special Issue on Biometric Person Authentication)
- A 16kb/s Wideband CELP-Based Speech Coder Using Mel-Generalized Cepstral Analysis
- Unsupervised speaker adaptation for speech-to-speech translation system (言語理解とコミュニケーション)
- Lip Location Normalized Training for Visual Speech Recognition