Combining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise
スポンサーリンク
概要
- 論文の詳細を見る
This paper presents our entry to a speech-in-noise intelligibility enhancement evaluation: the Hurricane Challenge. The system consists of a Text-To-Speech voice manipulated through a combination of enhancement strategies, each of which is known to be individually successful: a perceptually-motivated spectral shaper based on the Glimpse Proportion measure, dynamic range compression, and adaptation to Lombard excitation and duration patterns. We achieved substantial intelligibility improvements relative to unmodified synthetic speech: 4.9 dB in competing speaker and 4.1 dB in speech-shaped noise. An analysis conducted across this and other two similar evaluations shows that the spectral shaper and the compressor (both of which are loudness boosters) contribute most under higher SNR conditions, particularly for speech-shaped noise. Duration and excitation Lombard-adapted changes are more beneficial in lower SNR conditions, and for competing speaker noise.
- 一般社団法人電子情報通信学会の論文
- 2013-06-06
著者
-
King Simon
Centre For Speech Technology Research University Of Edinburgh
-
山岸 順一
国立情報学研究所コンテンツ科学研究系
-
Valentini-Botinhao Cassia
Centre for Speech Technology Research, University of Edinburgh
-
Stylianou Yannis
Institute of Computer Science, Foundation of Research and Technology Hellas
関連論文
- Asynchronous Articulatory Feature Recognition Using Dynamic Bayesian Networks
- Asynchronous Articulatory Feature Recognition Using Dynamic Bayesian Networks
- Asynchronous Articulatory Feature Recognition Using Dynamic Bayesian Networks
- Combining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise
- Combining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise