Iterative mapping function estimation and environment structure refinement in the online phase of the ESSEM approach (音声)
スポンサーリンク
概要
- 論文の詳細を見る
Recently, we proposed an ensemble speaker and speaking environment modeling (ESSEM) approach to improve automatic speech recognition (ASR) performance under noisy testing conditions. The ESSEM approach consists of offline and online phases. For the offline, ESSEM prepares an environment structure. In our previous study, we have developed environment clustering and environment partitioning algorithms to further improve the environment structure. For the online, ESSEM estimates a mapping function to obtain a set of acoustic models that matches the testing condition based on the offline prepared environment structure. In our previous study, we have studied several techniques to enhance the accuracy of mapping function estimation. In this study, we further improve the ESSEM online phase by using a two-stage optimization procedure that iteratively calculates the mapping function and refines the environment structure. We evaluated the proposed method on the Aurora-2 task. When compared with the conventional ESSEM (single-stage), clear improvements are observed by using either maximum likelihood (ML) or maximum a posteriori (MAP) criteria for the environment structure refinement procedure.
- 社団法人電子情報通信学会の論文
- 2011-01-20
著者
-
NAKAMURA Satoshi
Spoken Language Communication Group, Knowledge Creating Communication Research Center, National Inst
-
Kawai Hisashi
Spoken Language Communication Group National Institute Of Information And Communications Technology
-
TSAO Yu
Spoken Language Communication Group National Institute of Information and Communications Technology
-
ISOTANI Ryosuke
Spoken Language Communication Group National Institute of Information and Communications Technology
-
Nakamura Satoshi
Spoken Language Communication Group National Institute Of Information And Communications Technology
-
Nakamura Satoshi
Spoken Language Communication Group Knowledge Creating Communication Research Center National Institute Of Information And Communications Technology
関連論文
- An Improved Greedy Search Algorithm for the Development of a Phonetically Rich Speech Corpus
- Using Mutual Information Criterion to Design an Efficient Phoneme Set for Chinese Speech Recognition
- Automatic Generation of Non-uniform and Context-Dependent HMMs Based on the Variational Bayesian Approach(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Automatic Generation of Non-uniform HMM Topologies Based on the MDL Criterion(Speech and Hearing)
- Iterative mapping function estimation and environment structure refinement in the online phase of the ESSEM approach (音声)
- An Unsupervised Model of Redundancy for Answer Validation