中国語連続音声認識における音素的セグメンテーション

概要

論文の詳細を見る
An algorithm is proposed in this paper for phonemic segmentation to improve the performance of a continuous Mandarin speech recognition systems. The coefficient of time variation of spectral envelope and the coefficient of time. variation of zero order cepstrum are extracted using Unbiased Estimation of Log Spectrum (UELS). The parameter curves based on these coefficients are very smooth, therefore, the relation between parameter's maximum values and phoneme boundaries are easy to be found. By these smooth curves, the maximum value can be used as a criterion to delimit phonemes, rather than the threshold that is used in conventional systems, hence it is possible to get precise segmentation results. 300 sentences were used for an experiment, and the results show the system performance is better than traditional methods. The average phoneme-deletion rate is 1.3%, average phoneme-insertion rate is 3%. For evaluation, the segmentation results were used for a phoneme recognition experiment. 95.5% consonants recognition rate and 92.5% vowel recognition rate were obtained. The results show the approach is highly effective.
社団法人日本音響学会の論文