Automatic Prosody Labeling Using Multiple Models for Japanese(Speech and Hearing)
スポンサーリンク
概要
- 論文の詳細を見る
Automatic prosody labeling is the task of automatically annotating prosodic labels such as syllable stresses or break indices into speech corpora. Prosody-labeled corpora are important for speech synthesis and automatic speech understanding. However, the subtleness of physical features makes accurate labeling difficult. Since errors in the prosodic labels can lead to incorrect prosody estimation and unnatural synthetic sound, the accuracy of the labels is a key factor for text-to-speech (TTS) systems. In particular, mora accent labels relevant to pitch are very important for Japanese, since Japanese is a pitch-accent language and Japanese people have a particularly keen sense of pitch accents. However, the determination of the mora accents of Japanese is a more difficult task than English stress detection in a way. This is because the context of words changes the mora accents within the word, which is different from English stress where the stress is normally put at the lexical primary stress of a word. In this paper, we propose a method that can accurately determine the prosodic labels of Japanese using both acoustic and linguistic models. A speaker-independent linguistic model provides mora-level knowledge about the possible correct accentuations in Japanese, and contributes to reduction of the required size of the speaker-dependent speech corpus for training the other stochastic models. Our experiments show the effectiveness of the combination of models.
- 2007-11-01
著者
-
Nagano Tohru
Tokyo Research Lab. Ibm Japan
-
NISHIMURA Masafumi
Tokyo Research Laboratory, IBM Japan Ltd.
-
TACHIBANA Ryuki
Tokyo Research Lab., IBM Japan
-
KURATA Gakuto
Tokyo Research Lab., IBM Japan
-
BABAGUCHI Noboru
Graduate School of Engineering, Osaka University
-
Kurata Gakuto
Tokyo Research Lab. Ibm Japan
-
Babaguchi Noboru
Osaka Univ. Suita‐shi Jpn
-
Babaguchi Noboru
Graduate School Of Engineering Osaka University
-
Nishimura Masafumi
Tokyo Research Lab. Ibm Japan
-
Tachibana Ryuki
Tokyo Research Lab. Ibm Japan
-
Babaguchi Noboru
Graduate School Of Engineering Osaka Univ.
関連論文
- Automatic Prosody Labeling Using Multiple Models for Japanese(Speech and Hearing)
- Local Peak Enhancement for In-Car Speech Recognition in Noisy Environment
- Simultaneous Adaptation of Echo Cancellation and Spectral Subtraction for In-Car Speech Recognition(Speech Enhancement, Multi-channel Acoustic Signal Processing)
- Sound Source Localization Using a Profile Fitting Method with Sound Reflectors(Speech Dynamics by Ear, Eye, Mouth and Machine)
- Speech Enhancement by Profile Fitting Method (Special Issue on Speech Information Processing)
- Theoretical Analysis of the Performance of Anonymous Communication System 3-Mode Net
- Story Segmentation of Broadcasted Sports Videos for Semantic Content Acquisition
- Analysis of Audio-Visual Synchronous Patterns in Edited Videos : Towards an Aid for Attractive Video Editing(Videos)
- User and Device Adaptation in Summarizing Sports Videos
- Indoor Positioning System Using Digital Audio Watermarking