不特定話者電話音声認識と標準パターン作成

概要

論文の詳細を見る
This paper describes the speaker independent isolated word speech recognition method developed for telephone speech response systems. To recognize speech, input utterances are first frequency analyzed by 19 channel BFPs. The frame cycle used is 8ms. Then the analyzed data undergo logarithmic conversion, normalization of voice chords sound source characteristics by least squares approximation line and time normalization by linear companding to 32 frames. The speech patterns thus obtained undergo pattern matching with multiple reference patterns generated separately for male and female speakers in advance. In applying this recognition method, it is necessary to optimize the reference patterns so that the speech can be correctly recognized in spite of the difference of formant frequencies, the differences in individual speaker's habits, the variation of phonetic positions, non-vocalization, and slight segmentation errors. To evaluate the performance of this recognition method, voices of about 2, 000 persons were recorded through long distance telephone lines. A 16 Japanese words vocabulary was used. A total of 256 male and female reference patterns were generated using the training voice data of about 570 persons. The speech recognition accuracy of this method in recognizing non-training voice data was 97. 8%.
社団法人日本音響学会の論文