話者正規化スペクトルサブバンドパラメータを用いた雑音下での音声認識

概要

論文の詳細を見る
This paper proposes speaker normalized spectral subband centroids (SSCs) as supplementary features in noise environment speech recognition. SSCs are computed as frequency centroids for each subband from the power spectrum of the speech signal. This feature can be obtained reliably even under noisy conditions because SSC are mainly computed from spectral peaks such as formants whose positions are almost unchanged in a noisy environment. Since the conventional SSCs depend on formant frequencies of a speaker, the distributions of SSCs computed from large amounts of speakers will be highly overlapped between different phones. Therefore, we introduce a speaker normalization technique into SSC computation to reduce the speaker variability. Experimental results on spontaneous speech recognition show that the speaker normalized SSCs are more useful as supplementary features for improving the recognition performance than the conventional SSCs. We observed a significant improvement in error rate by 20.3% and 14.3% at SNR=15dB by adding speaker normalized SSCs to the conventional features and by incorporating a speaker normalized technique into the conventional SSCs, respectively.
社団法人日本音響学会の論文