A flexible spectral modification method based on temporal decomposition and Gaussian mixture model
スポンサーリンク
概要
- 論文の詳細を見る
Manipulating spectral structure often leads to degradation of speech quality, which is mainly due to insufficient smoothness of the modified spectra between frames, and ineffective spectralmodification. This paper presents a new spectral modification method to improve the quality ofmodified speech. If frames are processed independently, discontinuous features may be generated. Therefore, a speech analysis technique called temporal decomposition (TD), which decomposes speech into event targets and event functions, is used to model the spectral evolution effectively. Instead of modifying the speech spectra frame by frame, we only need to modify event targets and event functions. This feature leads to easy modification of the speech spectra, and the smoothness of modified speech is ensured by the shape of event functions. To improve spectral modification, we explore Gaussian mixture model parameters (spectral-GMM parameters) to model the spectral envelope of each event target, and develop a new algorithm for modifying spectral-GMM parameters in accordance with formant scaling factors. We first evaluate the effectiveness of our proposed method in spectra modeling, and then apply it to two areas which require different amounts of spectral modification, emotional speech synthesis and voice gender conversion. Experimental results show that the effectiveness of our proposed method is verified for spectra modeling and spectral modification.
- Acoustical Society of Japan(日本音響学会)の論文
- 2009-05-01
著者
-
Akagi Masato
School of Information Science, Japan Advanced Institute of Science and Technology
-
Nguyen Binh
School Of Information Sci. Japan Advanced Inst. Of Sci. And Technol.
-
Akagi Masato
School Of Information Sci. Japan Advanced Inst. Of Sci. And Technol. (jaist) 1-1 Asahidai Nomi Ishik
-
Nguyen Binh
School Of Information Science Japan Advanced Institute Of Science And Technology
-
Akagi Masato
School Of Information Sci. Japan Advanced Inst. Of Sci. And Technol.
関連論文
- A DOA estimation algorithm based on equalization-cancellation theory (応用音響)
- A study on the LP-based blind model in restoring bone-conducted speech (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- An LP-based blind restoration method for improving intelligibility of bone-conducted speech (音声)
- A flexible spectral modification method based on temporal decomposition and Gaussian mixture model
- Limited error based event localizing temporal decomposition and its application to variable-rate speech coding
- A speech dereverberation method based on the MTF concept in power envelope restoration
- An improved method based on the MTF concept for restoring the power envelope from a reverberant signal
- A DOA estimation algorithm based on equalization-cancellation theory (応用音響)
- Effects of single-channel speech enhancement algorithms on Mandarin speech intelligibility (応用音響)
- Improvement of robustness using selective sound segregation for automatic speech recognition systems in noisy environments (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- LP-baesd method of blind restoration to improve intelligibility of bone-conducted speech
- A Noise Reduction System in Localized and Non-Localized Noise Environments
- Noise reduction method based on generalized subtractive beamformer
- Fundamental frequency estimation for noisy speech based on instantaneous amplitude and frequency
- Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems
- Sub-Band Temporal Envelope Restoration for ASR in Reverberation Environment (国際ワークショップ Frontiers in Speech and Hearing Research)
- A study on expressive speech and perception of semantic primitives: comparison between Taiwanese and Japanese (音声)
- A flexible temporal decomposition-based spectral modification method using asymmetric Gaussian mixture model (音声)
- A Study on Restoration of Bone-Conducted Speech with LPC-Based Model (国際ワークショップ Frontiers in Speech and Hearing Research)
- A computational model of co-modulation masking release
- A method of signal extraction from noisy signal based on auditory scene analysis
- Modified Restricted Temporal Decomposition and Its Application to Low Rate Speech Coding
- Foreword to the special issue on "Applied Systems"
- Evaluations of TS-BASE for speech enhancement and binaural benefits preservation (応用音響)
- Adaptive β-order Generalized Spectral Subtraction for Speech Enhancement
- A Two-Microphone Noise Reduction Method in Highly Non-stationary Multiple-Noise-Source Environments
- A Hybrid Speech Emotion Recognition System Based on Spectral and Prosodic Features
- Adaptive equalization-cancellation model and its application to sound localization in noisy reverberant environments