音素認識に基づく不特定話者大語彙単語音声認識システム

概要

論文の詳細を見る
This paper describes a speaker-independent spoken word recognition system for a large size vocabulary. Speech is analyzed by the filter bank, from whose logarithmic spectrum the 11 features are extracted every 10 ms. Using the features the speech is first segmented and the primary phoneme recognition is carried out for every segment using the Bayes decision method. After correcting errors in segmentation and phoneme recognition, secondary recognition of a part of the consonants is carried out and the phonemic sequence is determined. The word dictionary item having maximum likelihood to the sequence is chosen as the recognition output. The 75. 9% score for the phoneme recognition and the 92. 4% score for the word recognition are obtained for the training samples in the 212 words uttered by 10 male and 10 female speakers. For the same words uttered by 30 male and 20 female speakers different from the above speakers, the 88. 1% word recognition score is obtained.