Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones(Feature Extraction and Acoustic Medelings, <Special Section>Corpus-Based Speech Technologies)
スポンサーリンク
概要
- 論文の詳細を見る
This paper describes a new multi-channel method of noisy speech recognition, which estimates the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by distributed microphones. The advantages of the proposed method are as follows: 1) The method does not require a sensitive geometric layout, calibration of the sensors nor additional pre-processing for tracking the speech source ; 2) System works in very small computation amounts ; and 3) Regression weights can be statistically optimized over the given training data. Once the optimal regression weights are obtained by regression learning, they can be utilized to generate the estimated log spectrum in the recognition phase, where the speech of close-talking is no longer required. The performance of the proposed method is illustrated by speech recognition of real in-car dialogue data. In comparison to the nearest distant microphone and multi-microphone adaptive beamformer, the proposed approach obtains relative word error rate (WER) reductions of 9.8% and 3.6%, respectively.
- 社団法人電子情報通信学会の論文
- 2005-03-01
著者
-
TAKEDA Kazuya
Nagoya University
-
TAKEDA Kazuya
Department of Nuclear Engineering, School of Engineering, Tokai University
-
Takeda Kazuya
Nagoya Univ.
-
Takeda Kazuya
Department Of Information Electronics Graduate School Of Engineering Nagoya University
-
MIYAJIMA Chiyomi
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
Takeda Kazuya
Nagoya Univ. Nagoya‐shi Jpn
-
MIYAJIMA Chiyomi
Nagoya University
-
Li Weifeng
The Department Of Information Electronics Graduate School Of Engineering Nagoya University
-
Shinde T
Department Of Information Electronics Graduate School Of Engineering Nagoya University
-
LI Weifeng
Department of Information Electronics, Graduate School of Engineering, Nagoya University
-
SHINDE Tetsuya
Department of Information Electronics, Graduate School of Engineering, Nagoya University
-
FUJIMURA Hiroshi
Department of Media Science, Graduate School of Information Science, Nagoya University
-
NISHINO Takanori
Department of Media Science, Graduate School of Information Science, Nagoya University
-
ITOU Katunobu
Department of Media Science, Graduate School of Information Science, Nagoya University
-
ITAKURA Fumitada
Faculty of Science and Technology, Meijo University
-
Itou Katsunobu
Faculty Of Computer And Information Sciences Hosei University
-
Miyajima Chiyomi
The Graduate School Of Information Science Nagoya University
-
ITAKURA Fumitada
Graduate School of Information Engineering, Meijo University
-
Shinde Tetsuya
Department Of Information Electronics Graduate School Of Engineering Nagoya University
-
Nishino Takanori
Center For Information Media Studies Nagoya University
-
Itakura Fumitada
The Faculty Of Science And Technology Meijo University
-
Fujimura Hiroshi
Department Of Media Science Graduate School Of Information Science Nagoya University
-
Nishino Takanori
Mie Univ. Tsu‐shi Jpn
-
LI Weifeng
Department of Electronic Engineering, Graduate School at Shenzhen, Tsinghua University, Beijing, China and Shenzhen Key Laboratory of Information Science and Technology
関連論文
- Acoustic Feature Transformation Combining Average and Maximum Classification Error Minimization Criteria
- Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition
- 磁化シートプラズマを用いたガス・ダイバータの基礎実験
- CENSREC-1-C : An evaluation framework for voice activity detection under noisy environments
- Driver Identification Using Driving Behavior Signals(Human-computer Interaction)
- On the Use of Kernel PCA for Feature Extraction in Speech Recognition(Speech and Hearing)
- IMPROVEMENT OF CHOLEDOCHOSCOPY : CHROMOENDOCHOLEDOCHOSCOPY, AUTOFLUORESCENCE IMAGING, OR NARROW-BAND IMAGING
- AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition(Speech Corpora and Related Topics, Corpus-Based Speech Technologies)
- Selective Listening Point Audio Based on Blind Signal Separation and Stereophonic Technology
- Head-Related Transfer Function measurement in sagittal and frontal coordinates
- CENSREC-3: An Evaluation Framework for Japanese Speech Recognition in Real Car-Driving Environments(Speech and Hearing)
- Evaluation of HRTFs estimated using physical features
- MC-32 Development of microdrive assembly process
- Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Evaluation of Combinational Use of Discriminant Analysis-Based Acoustic Feature Transformation and Discriminative Training
- Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition
- Gamma Modeling of Speech Power and Its On-Line Estimation for Statistical Speech Enhancement(Speech Enhancement, Statistical Modeling for Speech Processing)
- Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation
- SNR and sub-band SNR estimation based on Gaussian mixture modeling in the log power domain with application for speech enhancements (第6回音声言語シンポジウム)
- SNR and sub-band SNR estimation based on Gaussian mixture modeling in the log power domain with application for speech enhancements (第6回音声言語シンポジウム)
- SNR and sub-band SNR estimation based on Gaussian mixture modeling in the log power domain with application for speech enhancements (第6回音声言語シンポジウム)
- Acoustic Feature Transformation Combining Average and Maximum Classification Error Minimization Criteria
- Driver's irritation detection using speech recognition results (音声・第10回音声言語シンポジウム)
- Driver's irritation detection using speech recognition results (音声言語情報処理)
- Driver's irritation detection using speech recognition results (言語理解とコミュニケーション・第10回音声言語シンポジウム)
- Parameter Sharing in Mixture of Factor Analyzers for Speaker Identification(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Lack of Interaction Between Cefdinir and Calcium Polycarbophil : In vitro and In vivo Studies
- Predicting the Degradation of Speech Recognition Performance from Sub-band Dynamic Ranges (特集 音声言語情報処理とその応用)
- A model of perceptual distance for group delays based on ellipsoidal mapping
- The effect of group delay spectrum on timbre
- Direction of Arrival Estimation Using Nonlinear Microphone Array
- Speech Enhancement Using Nonlinear Microphone Array Based on Noise Adaptive Complementary Beamforming
- Speech Enhancement Using Nonlinear Microphone Array Based on Complementary Beamforming (Special Section on Digital Signal Processing)
- Noise Robust Speech Recognition Using Subband-Crosscorrelation Analysis
- An Acoustically Oriented Vocal-Tract Model
- Estimation of speaker and listener positions in a car using binaural signals
- Sound localization under conditions of covered ears on the horizontal plane
- Single-Channel Multiple Regression for In-Car Speech Enhancement
- Adaptive Nonlinear Regression Using Multiple Distributed Microphones for In-Car Speech Recognition(Speech Enhancement, Multi-channel Acoustic Signal Processing)
- Speech Recognition Using Finger Tapping Timings(Speech and Hearing)
- CIAIR In-Car Speech Corpus : Influence of Driving Status(Corpus-Based Speech Technologies)
- Construction and Evaluation of a Large In-Car Speech Corpus(Speech Corpora and Related Topics, Corpus-Based Speech Technologies)
- Blind Source Separation Using Dodecahedral Microphone Array under Reverberant Conditions
- Deterministic Annealing EM Algorithm in Acoustic Modeling for Speaker and Speech Recognition(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Continuous Speech Recognition Based on General Factor Dependent Acoustic Models(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Method for determining sound localization by auditory masking
- On the use of two-mass vocal cord model in characterizing the stress speech (音声)
- Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition
- CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments
- Particle Size Distribution Measurement of Free-Falling Fine Particles in a Dusty Plasma Experiment
- A Graph-Based Spoken Dialog Strategy Utilizing Multiple Understanding Hypotheses
- Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition
- Relaxation behavior of laser-peening residual stress under tensile loading investigated by X-ray and neutron diffraction