An Isolated Word Speech Recognition Based on Fusion of Visual and Auditory Information Using 30-frame/s and 24-bit Color Image (Special Section on Digital Signal Processing)
スポンサーリンク
概要
- 論文の詳細を見る
In the field of speech recognition, many researchers have proposed speech recognition methods using auditory information like acoustic signal or visual information like shape and motion of lips. Auditory information has valid features for speech recognition, but it is difficult to accomplish speech recognition in noisy environment. On the other side, visual information has advantage to accomplish speech recognition in noisy environment, but it is difficult to extract effective features for speech recognition. Thus, in case of using either auditory information or visual information, it is difficult to accomplish speech recognition perfectly. In this paper, we propose a method to fuse auditory information and visual information in order to realize more accurate speech recognition. The proposed method consists of two processes: (1) two probabilities for auditory information and visual information are calculated by HMM, (2) these probabilities are fused by using linear combination. We have performed speech recognition experiments of isolated words, whose auditory information (22.05kHz sampling, 8-bit quantization) and visual information (30-framc/s sampling, 24-bit quantization) are captured with multi-media personal computer, and have confirmed the validity of the proposed method.
- 社団法人電子情報通信学会の論文
- 1997-08-25
著者
-
Ogihara Akio
College Of Engineering Osaka Prefecture University
-
ASAO Shinobu
College of Engineering, Osaka Prefecture University
-
Asao Shinobu
College Of Engineering Osaka Prefecture University
関連論文
- An Analysis on Minimum Searching Principle of Chaotic Neural Network (Special Section of Selected Papers from the 8th Karuizawa Workshop on Circuits and Systems)
- A Study on Mouth Shape Features Suitable for HMM Speech Recognition Using Fusion of Visual and Auditory Information
- Speech Recognition Using HMM Based on Fusion of Visual and Auditory Information : Special Section of Letters Selected from 1994 IEICE Spring Conference
- Asymmetric Neural Network and Its Application to Knapsack Problem
- An Improvement of the Pseudoinverse Rule with Diagonal Elements (Special Section of Papers Selected from JTC-CSCC'93)
- An Isolated Word Speech Recognition Using Fusion of Auditory and Visual Information (Special Section of Papers Selected from JTC-CSCC'95)
- Binary Neural Network with Negative Self-Feedback and Its Application to N-Queens Problem (Special Issue on Neurocomputing)
- An Extraction Method of Lip Shape for Independent Speaker
- An Isolated Word Speech Recognition Based on Fusion of Visual and Auditory Information Using 30-frame/s and 24-bit Color Image (Special Section on Digital Signal Processing)
- A Correcting Method for Pitch Extraction Using Neural Networks (Special Section of Papers Selected from JTC-CSCC'93)