Spectral Features for Perceptually Natural Phoneme Replacement by Another Speaker's Speech
スポンサーリンク
概要
- 論文の詳細を見る
The frequency regions and spectral features that can be used to measure the perceived similarity and continuity of voice quality are reported here. A perceptual evaluation test was conducted to assess the naturalness of spoken sentences in which either a vowel or a long vowel of the original speaker was replaced by that of another. Correlation analysis between the evaluation score and the spectral feature distance was conducted to select the spectral features that were expected to be effective in measuring the voice quality and to identify the appropriate speech segment of another speaker. The mel-frequency cepstrum coefficient (MFCC) and the spectral center of gravity (COG) in the low-, middle-, and high-frequency regions were selected. A perceptual paired comparison test was carried out to confirm the effectiveness of the spectral features. The results showed that the MFCC was effective for spectra across a wide range of frequency regions, the COG was effective in the low- and high-frequency regions, and the effective spectral features differed among the original speakers.
- The Institute of Electronics, Information and Communication Engineersの論文
- 2012-04-01
著者
-
Takagi Tohru
Science And Technical Research Laboratories
-
Seiyama Nobumasa
Science And Technical Research Laboratories
-
Segi Hiroyuki
Science And Technical Research Laboratories
-
Seiyama Nobumasa
Nhk (japan Broadcasting Corp.) Science And Technology Research Laboratories
-
TAKOU Reiko
NHK (Japan Broadcasting Corp.) Science and Technology Research Laboratories
-
SEGI Hiroyuki
NHK (Japan Broadcasting Corp.) Science and Technology Research Laboratories
-
TAKAGI Tohru
NHK Engineering Services, Inc.
-
Takagi Tohru
Nhk Engineering Services Inc.
関連論文
- Improved high-quality MPEG-2/4 advanced audio coding encoder
- Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR
- Bi-Spectral Acoustic Features for Robust Speech Recognition
- Filter Bank Subtraction for Robust Speech Recognition (Special Issue on Speech Information Processing)
- Simultaneous Subtitling System for Broadcast News Programs with a Speech Recognizer(Special Issue on the 2001 IEICE Excellent Paper Award)
- Acoustic Model Adaptation by Selective Training Using 2-Stage Clustering
- Spectral Features for Perceptually Natural Phoneme Replacement by Another Speaker's Speech