Multi-Input Feature Combination in the Cepstral Domain for Practical Speech Recognition Systems
スポンサーリンク
概要
- 論文の詳細を見る
In this paper we describe a new framework of feature combination in the cepstral domain for multi-input robust speech recognition. The general framework of working in the cepstral domain has various advantages over working in the time or hypothesis domain. It is stable, easy to maintain, and less expensive because it does not require precise calibration. It is also easy to configure in a complex speech recognition system. However, it is not straightforward to improve the recognition performance by increasing the number of inputs, and we introduce the concept of variance re-scaling to compensate the negative effect of averaging several input features. Finally, we propose to take another advantage of working in the cepstral domain. The speech can be modeled using hidden Markov models, and the model can be used as prior knowledge. This approach is formulated as a new algorithm, referred to as Hypothesis-Based Feature Combination. The effectiveness of various algorithms are evaluated using two sets of speech databases. We also refer to automatic optimization of some parameters in the proposed algorithms.
- 2009-04-01
著者
-
OBUCHI Yasunari
Central Research Laboratory, Hitachi Ltd.
-
Obuchi Yasunari
Central Research Laboratory Hitachi Ltd.
-
Obuchi Yasunari
Hitachi Ltd. Kokubunji‐shi Jpn
-
Hataoka Nobuo
Department Of Electronics And Intelligent Systems Graduate School Of Electronics Tohoku Institute Of
関連論文
- Intentional Voice Command Detection for Trigger-Free Speech Interface
- Multi-Input Feature Combination in the Cepstral Domain for Practical Speech Recognition Systems
- Intentional Voice Command Detection for Trigger-Free Speech Interface
- Emotion Recognition using Mel-Frequency Cepstral Coefficients
- Stepwise Phase Difference Restoration Method for DOA Estimation of Multiple Sources
- Multichannel Two-Stage Beamforming with Unconstrained Beamformer and Distortion Reduction
- Noise suppression method for preprocessor of time-lag speech recognition system based on bidirectional optimally modified log spectral amplitude estimation
- Emotion Recognition using Mel-Frequency Cepstral Coefficients