A Forced Alignment Based Approach for English Passage Reading Assessment
スポンサーリンク
概要
- 論文の詳細を見る
This paper presents our investigation into improving the performance of our previous automatic reading quality assessment system. The method of the baseline system is calculating the average value of the Phone Log-Posterior Probability (PLPP) of all phones in the voice to be assessed, and the average value is used as the reading quality assessment feature. In this paper, we presents three improvements. First, we cluster the triphones, and then calculate the average value of the normalized PLPP for each classification separately, and use this average values as the multi-dimensional assessment features instead of the original one-dimensional assessment feature. This method is simple but effective, which made the score difference of the machine scoring and manual scoring decrease by 30.2% relatively. Second, in order to assess the reading rhythm, we train Gaussian Mixture Models (GMM), which contain the information of each triphone's relative duration under standard pronunciation. Using the GMM, we can calculate the probability that the relative duration of each phone is conform to the standard pronunciation, and the average value of the probabilities is added to the assessment feature vector as a dimension of feature, which decreased the score difference between the machine scoring and manual scoring by 9.7% relatively. Third, we detect Filled Pauses (FP) by analyzing the formant curve, and then calculate the relative duration of FP, and add the relative duration of FP to the assessment feature vector as a dimension of feature. This method made the score difference between the machine scoring and manual scoring be further decreased by 10.2% relatively. Finally, when the feature vector extracted by the three methods are used together, the score difference between the machine scoring and manual scoring was decreased by 43.9% relatively compared to the baseline system.
著者
-
ZHAO Qingwei
Key Laboratory of Speech Acoustics and Content Understanding, Chinese Academy of Sciences
-
YAN Yonghong
Key Laboratory of Speech Acoustics and Content Understanding
-
PAN Fuping
Key Laboratory of Speech Acoustics and Content Understanding
-
ZHANG Junbo
Key Laboratory of Speech Acoustics and Content Understanding
-
DONG Bin
Key Laboratory of Speech Acoustics and Content Understanding
-
ZHAO Qingwei
Key Laboratory of Speech Acoustics and Content Understanding
関連論文
- Factor Analysis of Neighborhood-Preserving Embedding for Speaker Verification
- Logarithmic Adaptive Quantization Projection for Audio Watermarking
- Noise Robust Feature Scheme for Automatic Speech Recognition Based on Auditory Perceptual Mechanisms
- A Forced Alignment Based Approach for English Passage Reading Assessment
- A Novel Discriminative Method for Pronunciation Quality Assessment
- Discriminative Approach to Build Hybrid Vocabulary for Conversational Telephone Speech Recognition of Agglutinative Languages
- Logarithmic Adaptive Quantization Projection for Audio Watermarking
- Smoothing Method for Improved Minimum Phone Error Linear Regression