Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition
スポンサーリンク
概要
- 論文の詳細を見る
This paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1kHz and 3kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15dB and 0dB SNR (signal to noise ratio) respectively, and 64.50% for the three years 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.
- 2010-09-01
著者
-
SUN Yanqing
Institute of Acoustics, Chinese Academy of Sciences
-
Yan Yonghong
Institute Of Acoustics Chinese Academy Of Science
-
Zhao Qingwei
Thinkit Speech Lab Institute Of Acoustics Chinese Academy Of Sciences
-
Yan Yonghong
Thinkit Speech Lab.
-
Yan Yonghong
Thinkit Speech Laboratory Institute Of Acoustics Chinese Academy Of Sciences Beijing
-
Zhou Yu
Institute For Advanced Ceramics School Of Materials Science And Engineering Harbin Institute Of Tech
-
Zhou Yu
Institute Of Acoustics Chinese Academy Of Sciences
-
Zhao Qingwei
Institute Of Acoustics Chinese Academy Of Sciences
-
Sun Yanqing
Institute Of Acoustics Chinese Academy Of Sciences
関連論文
- Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition
- Effects of single-channel speech enhancement algorithms on Mandarin speech intelligibility (応用音響)
- ICONE11-36512 RESEARCHING ON KNOWLEDGE ARCHITECTURE OF DESIGN BY ANALYSIS BASED ON ASEME CODE
- ICONE11-36511 AN OBJECT-ORIENTED HYBRID KNOWLEDGE REPRESENTATION METHOD BASED ON THE ASME CODE
- Approximate Decision Function and Optimization for GMM-UBM Based Speaker Verification
- Using a Kind of Novel Phonotactic Information for SVM Based Speaker Recognition
- Robust Speaker Clustering Using Affinity Propagation
- An LVCSR Based Reading Miscue Detection System Using Knowledge of Reference and Error Patterns
- Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech
- A One-Pass Real-Time Decoder Using Memory-Efficient State Network