SNR and sub-band SNR estimation based on Gaussian mixture modeling in the log power domain with application for speech enhancements (第6回音声言語シンポジウム)

概要

論文の詳細を見る
This work presents a flexible blind SNR estimation method based on Gaussian mixture modeling (GMM) in the log-power domain. Considering the local noise and noisy speech powers as two log-normal distributed random variables, their distribution parameters are estimated via the EM algorithm and used to derive the segmental SNR, which is defined as the expectation distance between two subspace distributions. A compensation mode for the estimation under low SNR conditions ais also proposed. The experimental resuits, evaluated on the AURORA2 database, show the more consistency of proposed estimation method in both the noise conditions, compared to the conventional methods. The second application presented in this work is sub-band SNRs estimations for speech enhancement systems. Here the GMM is applied to each frequency bin, and then two methods of the sub-band SNRs estimation are proposed by using the maximum a posterior (MAP) decomposition and cumulative distribution function (CDF) equalization. Furthermore, the sub-band SNR is used for the Wiener filtering systems. The evaluation experiments demonstrate the improvements of the proposed speech enhancement method in both segmental SNR and ASR performances.
一般社団法人情報処理学会の論文
2004-12-20