Recognition of Connected Digit Speech in Japanese Collected over the Telephone Network
スポンサーリンク
概要
- 論文の詳細を見る
This paper describes experimental results on whole word HMM-based speech recognition of connected digits in Japanese with special focus on the training data size and the "sheep and goats" problem. The training data comprises 757000 digits uttered by 2000 speakers, while the testing data comprises 399000 digits uttered by 1700 speakers. The best word error rate for unknown length strings was 1.64% obtained using context dependent HMMs. The word error rate was measured for various subsets of the training data reduced both in the number of speakers (s) and the number of utterances per speakers (u). As a result, an empirical formula of s[{min(0.62s^<0.75>u)}^<0.74>+{max(0, u-0.62s^<0.75>)}^<0.27>]=D(E_w) was developed, where E_w and D(E_w) designate word error rate and effective data size, respectively. Analyses were conducted on several aspects of the low performance speakers accounting for the major part of recognition errors. Attempts were also made to improve their recognition performance. It was found that 33% of the low performance speakers are improved to the normal level by speaker clustering centered around each low performance speaker.
- 社団法人電子情報通信学会の論文
- 2001-03-01
著者
-
Kawai H
Atr Spoken Language Translation Research Laboratories
-
Kawai Hisashi
Kdd R&d Laboratories Inc.
-
Shimizu T
Kdd R&d Laboratories Inc.
-
Shimizu T
Ntt Network Innovation Lab. Yokosuka‐shi Jpn
-
Higuchi Norio
Kdd R&d Laboratories Inc.
-
SHIMIZU Tohru
KDD R&D Laboratories Inc.
関連論文
- Detection and correction of the channel variability in a Mandarin speech corpus
- The Number of Elements in Minimum Test Set for Locally Exhaustive Testing of Combinational Circuits with Five Outputs
- Recognition of Connected Digit Speech in Japanese Collected over the Telephone Network
- A Portable Text-to-Speech System Using a Pocket-Sized Formant Speech Synthesizer (Special Section on Speech Synthesis: Current Technologies and Equipment)