Speech Recognition Using Finger Tapping Timings(Speech and Hearing)
スポンサーリンク
概要
- 論文の詳細を見る
Behavioral synchronization between speech and finger tapping provides a novel approach to improving speech recognition accuracy. We combine a sequence of finger tapping timings recorded alongside an utterance using two distinct methods: in the first method, HMM state transition probabilities at the word boundaries are controlled by the timing of the finger tapping ; in the second, the probability (relative frequency) of the finger tapping is used as a 'feature' and combined with MFCC in a HMM recognition system. We evaluate these methods through connected digit recognition under different noise conditions (AURORA-2J). Leveraging the synchrony between speech and finger tapping provides a 46% relative improvement in connected digit recognition experiments.
- 社団法人電子情報通信学会の論文
- 2005-03-01
著者
-
TAKEDA Kazuya
Nagoya University
-
Takeda Kazuya
Nagoya Univ.
-
MIYAJIMA Chiyomi
The authors are with the Department of Computer Science, Nagoya Institute of Technology
-
Takeda Kazuya
Nagoya Univ. Nagoya‐shi Jpn
-
MIYAJIMA Chiyomi
Nagoya University
-
Itou Katsunobu
Faculty Of Computer And Information Sciences Hosei University
-
Miyajima Chiyomi
The Graduate School Of Information Science Nagoya University
-
ITAKURA Fumitada
Graduate School of Information Engineering, Meijo University
-
Itakura Fumitada
The Faculty Of Science And Technology Meijo University
-
BAN Hiromitsu
The author is with the Graduate School of Engineering, Nagoya University
-
ITOU Katsunobu
The authors are with the Graduate School of Information Science, Nagoya University
-
TAKEDA Kazuya
The authors are with the Graduate School of Information Science, Nagoya University
-
ITAKURA Fumitada
The author is with Meijo University
-
Ban Hiromitsu
The Author Is With The Graduate School Of Engineering Nagoya University
関連論文
- Acoustic Feature Transformation Combining Average and Maximum Classification Error Minimization Criteria
- Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition
- CENSREC-1-C : An evaluation framework for voice activity detection under noisy environments
- Driver Identification Using Driving Behavior Signals(Human-computer Interaction)
- Text-Independent Speaker Identification Using Gaussian Mixture Models Based on Multi-Space Probability Distribution (Special Issue on Biometric Person Authentication)
- AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition(Speech Corpora and Related Topics, Corpus-Based Speech Technologies)
- CENSREC-3: An Evaluation Framework for Japanese Speech Recognition in Real Car-Driving Environments(Speech and Hearing)
- Evaluation of HRTFs estimated using physical features
- Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Evaluation of Combinational Use of Discriminant Analysis-Based Acoustic Feature Transformation and Discriminative Training
- Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition
- Gamma Modeling of Speech Power and Its On-Line Estimation for Statistical Speech Enhancement(Speech Enhancement, Statistical Modeling for Speech Processing)
- Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation
- SNR and sub-band SNR estimation based on Gaussian mixture modeling in the log power domain with application for speech enhancements (第6回音声言語シンポジウム)
- SNR and sub-band SNR estimation based on Gaussian mixture modeling in the log power domain with application for speech enhancements (第6回音声言語シンポジウム)
- SNR and sub-band SNR estimation based on Gaussian mixture modeling in the log power domain with application for speech enhancements (第6回音声言語シンポジウム)
- Acoustic Feature Transformation Combining Average and Maximum Classification Error Minimization Criteria
- Driver's irritation detection using speech recognition results (音声・第10回音声言語シンポジウム)
- Driver's irritation detection using speech recognition results (音声言語情報処理)
- Driver's irritation detection using speech recognition results (言語理解とコミュニケーション・第10回音声言語シンポジウム)
- Predicting the Degradation of Speech Recognition Performance from Sub-band Dynamic Ranges (特集 音声言語情報処理とその応用)
- A model of perceptual distance for group delays based on ellipsoidal mapping
- An Acoustically Oriented Vocal-Tract Model
- Estimation of speaker and listener positions in a car using binaural signals
- Sound localization under conditions of covered ears on the horizontal plane
- Single-Channel Multiple Regression for In-Car Speech Enhancement
- Adaptive Nonlinear Regression Using Multiple Distributed Microphones for In-Car Speech Recognition(Speech Enhancement, Multi-channel Acoustic Signal Processing)
- Speech Recognition Using Finger Tapping Timings(Speech and Hearing)
- CIAIR In-Car Speech Corpus : Influence of Driving Status(Corpus-Based Speech Technologies)
- Construction and Evaluation of a Large In-Car Speech Corpus(Speech Corpora and Related Topics, Corpus-Based Speech Technologies)
- Method for determining sound localization by auditory masking
- Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition
- CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments
- A Graph-Based Spoken Dialog Strategy Utilizing Multiple Understanding Hypotheses
- Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition