A Hybrid Speech Emotion Recognition System Based on Spectral and Prosodic Features
スポンサーリンク
概要
- 論文の詳細を見る
In this paper, we present a hybrid speech emotion recognition system exploiting both spectral and prosodic features in speech. For capturing the emotional information in the spectral domain, we propose a new spectral feature extraction method by applying a novel non-uniform subband processing, instead of the mel-frequency subbands used in Mel-Frequency Cepstral Coefficients (MFCC). For prosodic features, a set of features that are closely correlated with speech emotional states are selected. In the proposed hybrid emotion recognition system, due to the inherently different characteristics of these two kinds of features (e.g., data size), the newly extracted spectral features are modeled by Gaussian Mixture Model (GMM) and the selected prosodic features are modeled by Support Vector Machine (SVM). The final result of the proposed emotion recognition system is obtained by combining the results from these two subsystems. Experimental results show that (1) the proposed non-uniform spectral features are more effective than the traditional MFCC features for emotion recognition; (2) the proposed hybrid emotion recognition system using both spectral and prosodic features yields the relative recognition error reduction rate of 17.0% over the traditional recognition systems using only the spectral features, and 62.3% over those using only the prosodic features.
- (社)電子情報通信学会の論文
- 2010-10-01
著者
-
SUN Yanqing
Institute of Acoustics, Chinese Academy of Sciences
-
Li Junfeng
Japan Advanced Inst. Sci. And Technol.
-
Li Junfeng
School Of Information Science Japan Advanced Institute Of Science And Technology
-
Li Junfeng
School Of Aerospace Tsinghua University
-
Akagi Masato
School Of Information Science Japan Advanced Institute Of Science And Technology
-
Akagi Masato
School Of Information Sci. Japan Advanced Inst. Of Sci. And Technol. (jaist) 1-1 Asahidai Nomi Ishik
-
Yan Yonghong
Institute Of Acoustics Chinese Academy Of Science
-
Zhang Jianping
Institute Of Acoustics Chinese Academy Of Science
-
Yan Yonghong
Thinkit Speech Lab.
-
Yan Yonghong
Thinkit Speech Laboratory Institute Of Acoustics Chinese Academy Of Sciences Beijing
-
Zhou Yu
Institute For Advanced Ceramics School Of Materials Science And Engineering Harbin Institute Of Tech
-
Zhou Yu
Institute Of Acoustics Chinese Academy Of Sciences
-
Sun Yanqing
Institute Of Acoustics Chinese Academy Of Sciences
-
Akagi Masato
School Of Information Sci. Japan Advanced Inst. Of Sci. And Technol.
関連論文
- Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition
- A study on the LP-based blind model in restoring bone-conducted speech (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- An LP-based blind restoration method for improving intelligibility of bone-conducted speech (音声)
- A flexible spectral modification method based on temporal decomposition and Gaussian mixture model
- Trajectory Optimization of Multi-Asteroids Exploration with Low Thrust
- A speech dereverberation method based on the MTF concept in power envelope restoration
- An improved method based on the MTF concept for restoring the power envelope from a reverberant signal
- A DOA estimation algorithm based on equalization-cancellation theory (応用音響)
- Effects of single-channel speech enhancement algorithms on Mandarin speech intelligibility (応用音響)
- Improvement of robustness using selective sound segregation for automatic speech recognition systems in noisy environments (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- LP-baesd method of blind restoration to improve intelligibility of bone-conducted speech
- ICONE11-36512 RESEARCHING ON KNOWLEDGE ARCHITECTURE OF DESIGN BY ANALYSIS BASED ON ASEME CODE
- ICONE11-36511 AN OBJECT-ORIENTED HYBRID KNOWLEDGE REPRESENTATION METHOD BASED ON THE ASME CODE
- Approximate Decision Function and Optimization for GMM-UBM Based Speaker Verification
- Using a Kind of Novel Phonotactic Information for SVM Based Speaker Recognition
- Robust Speaker Clustering Using Affinity Propagation
- A Noise Reduction System in Localized and Non-Localized Noise Environments
- Noise reduction method based on generalized subtractive beamformer
- An LVCSR Based Reading Miscue Detection System Using Knowledge of Reference and Error Patterns
- Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech
- A One-Pass Real-Time Decoder Using Memory-Efficient State Network
- Development of a Mandarin-English Bilingual Speech Recognition System for Real World Music Retrieval
- Automatic Singing Performance Evaluation for Untrained Singers
- Melody Track Selection Using Discriminative Language Model
- Automatic Language Identification with Discriminative Language Characterization Based on SVM
- Speech Enhancement Using Improved Adaptive Null-Forming in Frequency Domain with Postfilter
- Fundamental frequency estimation for noisy speech based on instantaneous amplitude and frequency
- A Noise Reduction Method Based on a Generalized Subtractive Beamformer
- Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems
- Sub-Band Temporal Envelope Restoration for ASR in Reverberation Environment (国際ワークショップ Frontiers in Speech and Hearing Research)
- A study on expressive speech and perception of semantic primitives: comparison between Taiwanese and Japanese (音声)
- A flexible temporal decomposition-based spectral modification method using asymmetric Gaussian mixture model (音声)
- A Study on Restoration of Bone-Conducted Speech with LPC-Based Model (国際ワークショップ Frontiers in Speech and Hearing Research)
- A computational model of co-modulation masking release
- A method of signal extraction from noisy signal based on auditory scene analysis
- Modified Restricted Temporal Decomposition and Its Application to Low Rate Speech Coding
- Foreword to the special issue on "Applied Systems"
- Microstructural Characterization of Spark Plasma Sintered In Situ TiB Reinforced Ti Matrix Composite by EBSD and TEM
- Evaluations of TS-BASE for speech enhancement and binaural benefits preservation (応用音響)
- Adaptive β-order Generalized Spectral Subtraction for Speech Enhancement
- Effects of the Temporal Fine Structure in Different Frequency Bands on Mandarin Tone Perception
- A Two-Microphone Noise Reduction Method in Highly Non-stationary Multiple-Noise-Source Environments
- Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition
- A Hybrid Speech Emotion Recognition System Based on Spectral and Prosodic Features
- Enhanced Electrical Conductivities of Complex Hydrides Li_2(BH_4)(NH_2) and Li_4(BH_4)(NH_2)_3 by Melting
- Enhancing the Robustness of the Posterior-Based Confidence Measures Using Entropy Information for Speech Recognition
- Effects of single-channel speech enhancement algorithms on Mandarin speech intelligibility
- Two-Microphone Noise Reduction Using Spatial Information-Based Spectral Amplitude Estimation
- 605 Phytolith evidence for rice cultivation and spread in Mid-Late Neolithic archaeological sites in central North China
- 606 Phytolith analysis for differentiating between foxtail millet (Setaria italica) and green foxtail (Setaria viridis)
- 575 Phytolith evidence of millet agriculture during about 6000-2100 cal. aBP. in the Guanzhong Basin, China
- In Situ Resistance Measurement of Nickel-Induced Lateral Crystallization of Amorphous Silicon
- Adaptive equalization-cancellation model and its application to sound localization in noisy reverberant environments
- Loss of heterozygosity and methylation of multiple tumor suppressor genes on chromosome 3 in hepatocellular carcinoma