Fusing deep speaker specific features and MFCC for robust speaker verification
スポンサーリンク
概要
- 論文の詳細を見る
Acoustic representations typically used in speaker recognition are general and carry mixed information, including information that is irrelevant to the specific task of speaker recognition. Extracting specific information components from the speech signal for a desired task, such as extracting the speaker information component for speaker verification, is challenging. In this study, a nonlinear feature transformation discriminatively trained to extract speaker specific features from MFCCs is combined with a Gaussian mixture model support vector machine (GMM-SVM) system. Separation of the speaker information component and non-speaker related information in the speech signal is accomplished using a regularized siamese deep network (RSDN). RSDN learns a hidden representation that well characterizes speaker information by training a subset of the hidden units using pairs of speech segments. The hybrid RSDN GMM-SVM system achieves about 5% relative improvement over the baseline GMM-SVM system when applied to text-independent speaker verification using a subset of the NIST SRE 2006 1conv4w-1conv4w task. Speaker verification systems that fuse information typically provide better performance than those based on a single input modality. Score level fusion, in which scores from several classifiers are combined, is commonly employed as a fusion method for speaker verification. This study explores several fusion methods for RSDN and MFCC information, including score fusion, and the much less widely utilized fusion methods of GMM supervector fusion, and feature fusion. Score fusion and GMM supervector fusion offered further performance improvement, both achieving a 6.6% relative improvement over the baseline GMM-SVM system.
- 一般社団法人情報処理学会の論文
- 2013-07-18
著者
-
Koichi Shinoda
Tokyo Institute of Technology
-
Sangeeta Biswas
Tokyo Institute of Technology
-
Koichi Shinoda
Tokyo Insitute Of Technology
-
Ryan Price
Tokyo Institute of Technology
関連論文
- Inter-speaker weighted MAP adaptation for GMM-supervector speaker recognition
- Optimal use of trees in structural MAP adaptation for speaker verification
- Speaker Adaptation for Dialogue Act Classification
- Fusing deep speaker specific features and MFCC for robust speaker verification