One-Pass Semi-Dynamic Network Decoding Using a Subnetwork Caching Model for Large Vocabulary Continuous Speech Recongnition(<Special Section>Speech Dynamics by Ear, Eye, Mouth and Machine)
スポンサーリンク
概要
- 論文の詳細を見る
This paper presents a new decoding framework for large vocabulary continuous speech recognition that can handle a static search network dynamically. Generally, a static network decoder can use a search space that is globally optimized in advance, and therefore it can run at high speed during decoding. However, its large memory requirement due to the large network size or the spatial complexity of the optimization algorithm often makes it impractical. Our new one-pass semi-dynamic network decoding scheme aims at incorporating such an optimized search network with memory efficiency, but without losing speed. In this framework, a complete search network is organized on the basis of self-structuring subnetworks and is nearly minimized using a modified tail-sharing algorithm. While the decoder runs, it caches subnetworks needed for decoding in memory, whereas static network decoders keep the complete network in memory. The subnetwork caching model is controlled by two levels of caches: local cache obtained by subnetwork caching operations and global cache obtained by subnetwork preloading operations. The model can also be controlled adaptively by using subnetwork pro filing operations. Furthermore, it is made simple and fast with compactly designed self-structuring subnetworks. Experimental results on a 25 k-word Korean broadcast news transcription task show that the semi-dynamic decoder can run almost as fast as an equivalent static network decoder under various memory configurations by using the subnetwork caching model.
- 社団法人電子情報通信学会の論文
- 2004-05-01
著者
-
Ahn Dong-hoon
Dept. Of Computer Science Sogang University
-
CHUNG Minhwa
Dept. of Computer Science, Sogang University
-
Chung Minhwa
Dept. Of Computer Science Sogang University
関連論文
- Soft Magnetic Properties and Microstructures of FeTaNC Films
- One-Pass Semi-Dynamic Network Decoding Using a Subnetwork Caching Model for Large Vocabulary Continuous Speech Recongnition(Speech Dynamics by Ear, Eye, Mouth and Machine)