A Study on Acoustic Modeling for Speech Recognition of Predominantly Monosyllabic Languages(<Special Section>Speech Dynamics by Ear, Eye, Mouth and Machine)
スポンサーリンク
概要
- 論文の詳細を見る
This paper presents a study on acoustic modeling for speech recognition of predominantly monosyllabic languages. Various speech units used in speech recognition systems have been investigated. To evaluate the effectiveness of these acoustic models, the Thai language is selected, since it is a predominantly monosyllabic language and has a complex vowel system. Several experiments have been carried out to find the proper speech unit that can accurately create acoustic model and give a higher recognition rate. Results of recognition rates under different acoustic models are given and compared. In addition, this paper proposes a new speech unit for speech recognition, namely onset-rhyme unit. Two models are proposed-the Phonotactic Onset-Rhyme Model (FORM) and the Contextual Onset-Rhyme Model (CORM). The models comprise a pair of onset and rhyme units, which makes up a syllable. An onset comprises an initial consonant and its transition towards the following vowel. Together with the onset, the rhyme consists of a steady vowel segment and a final consonant. Experimental results show that the onset-rhyme model improves on the efficiency of other speech units. The onset-rhyme model improves on the accuracy of the inter-syllable triphone model by nearly 9.3% and of the context-dependent Initial-Final model by nearly 4.7% for the speaker-dependent systems using only an acoustic model, and 5.6% and 4.5% for the speaker-dependent systems using both acoustic and language model respectively. The results show that the onset-rhyme models attain a high recognition rate. Moreover, they also give more efficiency in terms of system complexity.
- 社団法人電子情報通信学会の論文
- 2004-05-01
著者
-
Jitapunkul Somchai
Digital Signal Processing Research Laboratory Department Of Electrical Engineering Faculty Of Engine
-
MANEENOI Ekkarit
Digital Signal Processing Research Laboratory, Department of Electrical Engineering, Faculty of Engi
-
AHKUPUTRA Visarut
Digital Signal Processing Research Laboratory, Department of Electrical Engineering, Faculty of Engi
-
LUKSANEEYANAWIN Sudaporn
Centre for Research in Speech and Language Processing, Department of Linguistics, Faculty of Arts, C
-
Maneenoi Ekkarit
Digital Signal Processing Research Laboratory Department Of Electrical Engineering Faculty Of Engine
-
Ahkuputra Visarut
Digital Signal Processing Research Laboratory Department Of Electrical Engineering Faculty Of Engine
-
Luksaneeyanawin Sudaporn
Centre For Research In Speech And Language Processing Department Of Linguistics Faculty Of Arts Chul
関連論文
- Multiple Description Pattern Analysis : Robustness to Misclassification Using Local Discriminant Frame Expansions(Image Recognition and Understanding)
- A Study on Acoustic Modeling for Speech Recognition of Predominantly Monosyllabic Languages(Speech Dynamics by Ear, Eye, Mouth and Machine)