An HMM State Duration Control Algorithm Applied to Large-Vocabulary Spontaneous Speech Recognition
スポンサーリンク
概要
- 論文の詳細を見る
Although Hidden Markov Modeling (HMM) is widely and successfully used in many speech recognition applications, duration control for HMMs is still an important issue in improving recognition accuracy since a HMM places no constraints on duration. For compensating this defect, some duration control algorithms that employ precise duration models have been proposed. However, they suffer from greatly increased computational complexity. This paper proposes a new state duration control algorithm for limiting both the maximum and the minimum state durations. The algorithm is for the HMM trellis likelihood calculation, not for the Viterbi calculation. The amount of computation required by this algorithm is only order one (O(1)) for the maximum state duration n; that is, the computation amount is independent of the maximum state duration while many conventional duration control algorithm require computation in the amount of order n or order n^2. Thus, the algorithm can drastically reduce the computation needed for duration control. The algorithm uses the property that the trellis likelihood calculation is a summation of many path likelihoods. At each frame, the path likelihood that exceeds the maximum likelihood is subtracted, and the path likelihood that satisfies the minimum likelihood is added to the forward probability. By iterating this procedure, the algorithm calculates the trellis likelihood efficiently. The algorithm was evaluated using a large-vocabulary speaker-independent spontaneous speech recognition system for telephone directory assistance. The average reduction in error rate for sentence understanding was about 7% when using context-independent HMMs, and 3% when using context-dependent HMMs. We could confirm the improvement by using the proposed state duration control algorithm even though the maximum and the minimum state durations were not optimized for the task (speaker-independent duration settings obtained from a different task were used).
- 社団法人電子情報通信学会の論文
- 1995-06-25
著者
-
SHIKANO Kiyohiro
Nara Institute of Science and Technology
-
Minami Yasuhiro
Ntt Human Interface Laboratories
-
Takahashi Satoshi
NTT Human Interface Laboratories
関連論文
- Development of real-time audio localization control system (応用音響)
- EA2010-24 Development of real-time audio localization control system
- Sound reproduction based on multi-channel inverse filtering and WFS
- Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method
- Cost Reduction of Acoustic Modeling for Real-Environment Applications Using Unsupervised and Selective Training
- Reducing Computation Time of the Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics(Speech and Hearing)
- Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models(Speech Recognition, Statistical Modeling for Speech Processing)
- Utterance-Based Selective Training for the Automatic Creation of Task-Dependent Acoustic Models(Speech Recognition, Statistical Modeling for Speech Processing)
- Designing Target Cost Function Based on Prosody of Speech Database(Speech Synthesis and Prosody, Corpus-Based Speech Technologies)
- Designing Target Cost Function Based on Prosody of Speech Database
- A MAP Estimator for the Enhancement of Speech Signal Separated by ICA Algorithm (国際ワークショップ Frontiers in Speech and Hearing Research)
- Blind Separation of Speech by Fixed-Point ICA with Source Adaptive Negentropy Approximation(Blind Source Separation, Multi-channel Acoustic Signal Processing)
- A Microphone Array-Based 3-D N-Best Search Method for Recognizing Multiple Sound Sources
- 複数話者の音声認識における音源方向経路間距離を用いた3-D N-best探索法の評価
- An HMM State Duration Control Algorithm Applied to Large-Vocabulary Spontaneous Speech Recognition
- Development and evaluation of pocket-size real-time blind source separation microphone
- Objective sound quality comparison based on higher-order statistics for nonlinear noise reduction methods (応用音響)
- Objective sound quality evaluation for combination method of beamforming and spectral subtraction (応用音響)
- Fast Convergence Blind Source Separation Using Frequency Subband Interpolation by Null Beamforming
- Rapid Compensation of Temperature Fluctuation Effect for Multichannel Sound Field Reproduction System
- Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System
- Evaluation of Extremely Small Sound Source Signals Used in Speaking-Aid System with Statistical Voice Conversion
- Improvements of the One-to-Many Eigenvoice Conversion System
- Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models
- Adaptive Training for Voice Conversion Based on Eigenvoices
- A Speech Dialogue System with Multimodal Interface for Telephone Directory Assistance
- Sound Field Reproduction by Wavefront Synthesis Using Directly Aligned Multi Point Control
- Isolated Word Recognition Using Pitch Pattern Information
- Theoretical Analysis of Amounts of Musical Noise and Speech Distortion in Structure-Generalized Parametric Blind Spatial Subtraction Array
- Speech Prior Estimation for Generalized Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator
- Comparison of Methods for Topic Classification of Spoken Inquiries