Improved HMM Separation for Distant-Talking Speech Recognition(<Special Section>Speech Dynamics by Ear, Eye, Mouth and Machine)
スポンサーリンク
概要
- 論文の詳細を見る
In distant-talking speech recognition, the recognition accuracy is seriously degraded by reverberation and environmental noise. A robust speech recognition technique in such environments, HMM separation and composition, has been described in [1]. HMM separation estimates the model parameters of the acoustic transfer function using adaptation data uttered from an unknown position in noisy and reverberant environments, and HMM composition builds an HMM of noisy and reverberant speech, using the acoustic transfer function estimated by HMM separation. Previously, HMM separation has been applied to the acoustic transfer function based on a single Gaussian distribution. However the improvement was smaller than expected for the impulse response with long reverberations. This is because the variance of the acoustic transfer function in each frame increases, since the length of the impulse response of the room reverberation is longer than that of the spectral analysis window. In this paper, HMM separation is extended to estimate the acoustic transfer function based on the Gaussian mixture components in order to compensate for the greater variability of the acoustic transfer function, and the re-estimation formulae are derived. In addition, this paper introduces a technique to adapt the noise weight for each mel-spaced frequency in order to improve the performance of the HMM separation in the linear-spectral domain, since the use of the HMM separation in the linear-spectral domain sometimes causes a negative mean output due to the subtraction operation. The extended HMM separation is evaluated on distant-talking speech recognition tasks. The results of the experiments clarify the effectiveness of the proposed method.
- 社団法人電子情報通信学会の論文
- 2004-05-01
著者
関連論文
- Sound Source Localization Using a Profile Fitting Method with Sound Reflectors(Speech Dynamics by Ear, Eye, Mouth and Machine)
- Speech Enhancement by Profile Fitting Method (Special Issue on Speech Information Processing)
- Improved HMM Separation for Distant-Talking Speech Recognition(Speech Dynamics by Ear, Eye, Mouth and Machine)