Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems
スポンサーリンク
概要
- 論文の詳細を見る
To reduce speech degradation in reverberant environments, we previously proposed a modulation-transfer-function (MTF)-based method of speech dereverberation. By considering the temporal modulation properties of speech, and the exponential decay properties of the power envelope of the impulse response of room acoustics, we obtained the following MTF relation: the sub-band power envelope of reverberant speech that can be represented as a convolution between the sub-band power envelope of clean speech and the power envelope of the impulse response of room acoustics. On the basis of the MTF relation, inverse MTF filtering can be applied to restoring the power envelopes of reverberant speech. Therefore, the impulse response of the room acoustics in this restoration dose not need to be measured at any time since we model the power envelope of the impulse response as an exponential decay function. We have tested how effective this method is as a front-end for automatic speech recognition (ASR) systems in artificial and real reverberant environments. Reverberant speech signals were created by simply convoluting clean speech (AURORA-2J database) with the artificially produced or real impulse responses of room acoustics. A method based on the auditory power spectrum was used as a baseline for comparison. Compared with the baseline, the proposed method for artificial reverberant environments produced a 35.67% relative improvement in the error reduction rate (on average, for reverberation times from 0.2 to 2.0 s), and for real reverberant environments (43 reverberant impulse responses), it produced a 25.78% relative improvement in the error reduction rate. The results demonstrate that our new approach can improve the robustness of speech-recognition systems in reverberant environments, and it performs better than conventional methods.
- 社団法人日本音響学会の論文
著者
-
Unoki Masashi
School of Information Science, Japan Advanced Institute of Science and Technology
-
Akagi Masato
School of Information Science, Japan Advanced Institute of Science and Technology
-
Lu Xugang
School of Information Science, Japan Advanced Institute of Science and Technology
-
Akagi Masato
School Of Information Sci. Japan Advanced Inst. Of Sci. And Technol. (jaist) 1-1 Asahidai Nomi Ishik
-
Unoki Masashi
School Of Information Science Japan Advanced Institute Of Science And Technology
-
Lu Xugang
Atr Spoken Language Communication Res. Laboratories
-
Lu Xugang
School Of Information Science Japan Advanced Institute Of Science And Technology
-
Unoki Masashi
Information School Japan Advanced Institute Of Science And Technology
-
Unoki Masashi
Japan Advanced Inst. Sci. And Technol. Ishikawa Jpn
-
Akagi Masato
School Of Information Sci. Japan Advanced Inst. Of Sci. And Technol.
関連論文
- A DOA estimation algorithm based on equalization-cancellation theory (応用音響)
- Study on a method of suppressing noise based on the MTF concept
- An MTF-based method of blind restoration for improving intelligibility of bone-conducted speech
- A study on the LP-based blind model in restoring bone-conducted speech (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- An LP-based blind restoration method for improving intelligibility of bone-conducted speech (音声)
- Robust voice activity detection based on noise eigenspace
- A flexible spectral modification method based on temporal decomposition and Gaussian mixture model
- A speech dereverberation method based on the MTF concept in power envelope restoration
- An improved method based on the MTF concept for restoring the power envelope from a reverberant signal
- A DOA estimation algorithm based on equalization-cancellation theory (応用音響)
- Effects of single-channel speech enhancement algorithms on Mandarin speech intelligibility (応用音響)
- Improvement of robustness using selective sound segregation for automatic speech recognition systems in noisy environments (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- Improvement of robustness using selective sound segregation for automatic speech recognition systems in noisy environments
- LP-baesd method of blind restoration to improve intelligibility of bone-conducted speech
- A model-based investigation of activations of the tongue muscles in vowel production
- A Noise Reduction System in Localized and Non-Localized Noise Environments
- Noise reduction method based on generalized subtractive beamformer
- A study on audio watermarking method based on the cochlear delay characteristics
- Fundamental frequency estimation for noisy speech based on instantaneous amplitude and frequency
- Estimation of fundamental frequency of reverberant speech by utilizing complex cepstrum analysis
- Speech Enhancement based on Noise Eigenspace Projection
- A speech enhancement framework based on noise eigenspace projection (音声)
- Estimate of auditory filter shape using notched-noise masking for various signal frequencies
- Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems
- Sub-Band Temporal Envelope Restoration for ASR in Reverberation Environment (国際ワークショップ Frontiers in Speech and Hearing Research)
- A study on expressive speech and perception of semantic primitives: comparison between Taiwanese and Japanese (音声)
- A flexible temporal decomposition-based spectral modification method using asymmetric Gaussian mixture model (音声)
- A Study on Restoration of Bone-Conducted Speech with LPC-Based Model (国際ワークショップ Frontiers in Speech and Hearing Research)
- A computational model of co-modulation masking release
- A method of signal extraction from noisy signal based on auditory scene analysis
- Modified Restricted Temporal Decomposition and Its Application to Low Rate Speech Coding
- Foreword to the special issue on "Applied Systems"
- A Model-Based Learning Process for Modeling Coarticulation of Human Speech(Knowledge, Information and Creativity Support System)
- Normalization of vocal tract shape using radial basis function (音声)
- Normalization of vocal tract shape using radial basis function
- Optimization and Evaluation of a Coarticulation Model based on Observation and Simulation
- Parameter Optimization for a Coarticulation Model Based on Observation and Simulation (国際ワークショップ Frontiers in Speech and Hearing Research)
- Extraction of Low Dimensional Representation of Vowels in Articulatory Space (国際ワークショップ Frontiers in Speech and Hearing Research)
- Evaluations of TS-BASE for speech enhancement and binaural benefits preservation (応用音響)
- Adaptive β-order Generalized Spectral Subtraction for Speech Enhancement
- A Two-Microphone Noise Reduction Method in Highly Non-stationary Multiple-Noise-Source Environments
- A Hybrid Speech Emotion Recognition System Based on Spectral and Prosodic Features
- Adaptive equalization-cancellation model and its application to sound localization in noisy reverberant environments
- Study on Speech Watermarking Based on Modifications to LSFs for Tampering Detection
- Study on Speech Watermarking Based on Modifications to LSFs for Tampering Detection
- Study on Speech Watermarking Based on Modifications to LSFs for Tampering Detection
- Study on Blind Method of Estimating Speech Transmission Index from Noisy Reverberant Amplitude-Modulated-Signals
- Study on Semi-scramble Method for Speech Signals Based on Phonemic Restoration