Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition
スポンサーリンク
概要
- 論文の詳細を見る
To improve speech recognition performance, feature transformation based on discriminant analysis has been widely used to reduce the redundant dimensions of acoustic features. Linear discriminant analysis (LDA) and heteroscedastic discriminant analysis (HDA) are often used for this purpose, and a generalization method for LDA and HDA, called power LDA (PLDA), has been proposed. However, these methods may result in an unexpected dimensionality reduction for multimodal data. It is important to preserve the local structure of the data when reducing the dimensionality of multimodal data. In this paper we introduce two methods, locality-preserving HDA and locality-preserving PLDA, to reduce dimensionality of multimodal data appropriately. We also propose an approximate calculation scheme to calculate sub-optimal projections rapidly. Experimental results show that the locality-preserving methods yield better performance than the traditional ones in speech recognition.
- (社)電子情報通信学会の論文
- 2010-05-01
著者
-
SAKAI Makoto
DENSO CORPORATION
-
KITAOKA Norihide
Nagoya University
-
TAKEDA Kazuya
Nagoya University
-
Takeda Kazuya
Nagoya Univ.
-
Takeda Kazuya
Graduate School Of Information Science Nagoya University
-
Takeda Kazuya
Nagoya Univ. Nagoya‐shi Jpn
-
Kitaoka Norihide
Toyohashi University Of Technology
-
Kitaoka Norihide
Nagoya Univ.
-
TAKEDA Kazuya
DENSO CORPORATION
-
SAKAI Makoto
Nagoya University
関連論文
- Acoustic Feature Transformation Combining Average and Maximum Classification Error Minimization Criteria
- Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition
- AN INTEGRATED AUDIO-VISUAL VIEWER FOR A LARGE SCALE MULTIPOINT CAMERAS AND MICROPHONES(International Workshop on Advanced Image Technology 2007)
- CENSREC-1-C : An evaluation framework for voice activity detection under noisy environments
- Driver Identification Using Driving Behavior Signals(Human-computer Interaction)
- AN INTEGRATED AUDIO-VISUAL VIEWER FOR A LARGE SCALE MULTIPOINT CAMERAS AND MICROPHONES
- G_007 Arbitrary Listening-point Generation Using Acoustic Transfer Function Interpolation in A Large Microphone Array
- THE SUB-BAND SOUND WAVE RAY-SPACE REPRESENTATION(International Workshop on Advanced Image Technology 2006)
- A-16-24 3D Sound Wave Field Representation Based on Ray-Space Method(A-16. マルチメディア・仮想環境基礎, 基礎・境界)
- AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition(Speech Corpora and Related Topics, Corpus-Based Speech Technologies)
- Selective Listening Point Audio Based on Blind Signal Separation and Stereophonic Technology
- CENSREC-3: An Evaluation Framework for Japanese Speech Recognition in Real Car-Driving Environments(Speech and Hearing)
- Evaluation of HRTFs estimated using physical features
- Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Evaluation of Combinational Use of Discriminant Analysis-Based Acoustic Feature Transformation and Discriminative Training
- Robust distant speech recognition by combining variable-term spectrum based position-dependent CMN with conventional CMN (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- Linear Discriminant Analysis Using a Generalized Mean of Class Covariances and Its Application to Speech Recognition
- Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN
- Robust distant speech recognition by combining variable-term spectrum based position-dependent CMN with conventional CMN
- Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition
- Gamma Modeling of Speech Power and Its On-Line Estimation for Statistical Speech Enhancement(Speech Enhancement, Statistical Modeling for Speech Processing)
- Noisy Speech Recognition Based on Integration/Selection of Multiple Noise Suppression Methods Using Noise GMMs
- Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation
- SNR and sub-band SNR estimation based on Gaussian mixture modeling in the log power domain with application for speech enhancements (第6回音声言語シンポジウム)
- SNR and sub-band SNR estimation based on Gaussian mixture modeling in the log power domain with application for speech enhancements (第6回音声言語シンポジウム)
- SNR and sub-band SNR estimation based on Gaussian mixture modeling in the log power domain with application for speech enhancements (第6回音声言語シンポジウム)
- Acoustic Feature Transformation Combining Average and Maximum Classification Error Minimization Criteria
- Driver's irritation detection using speech recognition results (音声・第10回音声言語シンポジウム)
- Driver's irritation detection using speech recognition results (音声言語情報処理)
- Driver's irritation detection using speech recognition results (言語理解とコミュニケーション・第10回音声言語シンポジウム)
- サブバンドに含まれる周波数成分の瞬時周波数に基づく推定
- Predicting the Degradation of Speech Recognition Performance from Sub-band Dynamic Ranges (特集 音声言語情報処理とその応用)
- An Acoustically Oriented Vocal-Tract Model
- Comparison of acoustic measures for evaluating speech recognition performance in an automobile
- Estimation of speaker and listener positions in a car using binaural signals
- Sound localization under conditions of covered ears on the horizontal plane
- Single-Channel Multiple Regression for In-Car Speech Enhancement
- Adaptive Nonlinear Regression Using Multiple Distributed Microphones for In-Car Speech Recognition(Speech Enhancement, Multi-channel Acoustic Signal Processing)
- Speech Recognition Using Finger Tapping Timings(Speech and Hearing)
- CIAIR In-Car Speech Corpus : Influence of Driving Status(Corpus-Based Speech Technologies)
- Construction and Evaluation of a Large In-Car Speech Corpus(Speech Corpora and Related Topics, Corpus-Based Speech Technologies)
- Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
- Response Timing Detection Using Prosodic and Linguistic Information for Human-friendly Spoken Dialog Systems
- Method for determining sound localization by auditory masking
- Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition
- Selective Gammatone Envelope Feature for Robust Sound Event Recognition
- CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments
- Selective Gammatone Envelope Feature for Robust Sound Event Recognition
- A Graph-Based Spoken Dialog Strategy Utilizing Multiple Understanding Hypotheses
- Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition