Nonlinear Normalization Using g-Logarithm for Robust Speech Recognition
スポンサーリンク
概要
- 論文の詳細を見る
The performance of speech recognition degrades significantly in the noisy environment. Most compensation methods to improve the robustness of speech recognition assume uncorrelated speech, additive and convolutive noise. However, the nonlinear nature of speech and noise signals may cause noise and speech to be correlated. To deal with this problem, we propose compensation in an intermediate domain using q-logarithmic function. The q-logarithmic function is a generalization of the natural logarithmic function and recently used to generalize Shannon entropy. The q-logarithmic function enables us to transform the spectral from linear to log spectral by varying the q-value. In this paper, we implement spectral mean normalization (LSMN) using q-logarithmic function and call it Generalized-log spectral mean normalization (GLSMN). Our experiments on the Aurora-2 database show that GLSMN improve speech recognition accuracies by 20% compared to cepstral mean normalization (CMN) in mel-frequency domain.
- 2011-07-14
著者
-
Shinoda Koichi
Department Of Computer Science Graduate School Of Information Science And Engineering Tokyo Institut
-
Iwano Koji
Faculty Of Environmental And Information Studies Tokyo City University
-
Shinoda Koichi
Department Of Computer Science Graduate School Of Information Science And Engineering Tokyo Institut
-
PARDEDE Hilman
Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Instit
-
Pardede Hilman
Department Of Computer Science Graduate School Of Information Science And Engineering Tokyo Institut
関連論文
- Gait-based Person Identification Robust against Speed Variation using CHLAC features and HMMs
- Gait-based Person Identification Robust against Speed Variation using CHLAC features and HMMs
- Gait-based Person Identification Robust against Speed Variation using CHLAC features and HMMs
- Robust Scene Extraction Using Multi-Stream HMMs for Baseball Broadcast(Image Processing and Video Processing)
- Automatic recognition of Indonesian declarative questions and statements using polynomial coefficients of the pitch contours
- Initial evaluation of the drivers' Japanese speech corpus in a car environment (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- Robust Acoustic Modeling for Speech Recognition
- Invited: Robust Acoustic Modeling for Speech Recognition (国際ワークショップ"Beyond HMM")
- Robust Acoustic Modeling for Speech Recognition
- Nonlinear Normalization Using g-Logarithm for Robust Speech Recognition
- Speaker Verification Using MMAP Adaptation (言語理解とコミュニケーション)
- Speaker Verification Using MMAP Adaptation (音声)
- Subject Adaptation and Adaptive Training for Gait-based Person Identification
- Subject Adaptation and Adaptive Training for Gait-based Person Identification
- Two-pass Approach for Recognizing Code-Switching Speech
- Two-pass Approach for Recognizing Code-Switching Speech
- Two-pass Approach for Recognizing Code-Switching Speech
- Subject Adaptation and Adaptive Training for Gait-based Person Identification
- Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model
- A video watermarking method to objects robust against various attacks
- Speaker Verification Using MMAP Adaptation
- A video watermarking method to objects robust against various attacks
- A video watermarking method to objects robust against various attacks
- Two-pass Approach for Recognizing Code-Switching Speech