男声・女声変換実験
スポンサーリンク
概要
- 論文の詳細を見る
This paper describes a method of male to female voice conversation as an application of speech analysis and synthesis by liner predication. The method was demonstrated in the open house of the NHK Technical Research Lab's in 1975, where a synthesized female voice was presented, the original of which was a sentence from a weather forecast announcement spoken by a male announcer. The average format frequencies of female voices are approximately 1. 2 times as high as those of male voice as shown in Fig. 2, and the average bandwidths of the first format of female voices is approximately 1. 3 times as wide as that of male voices as shown in Fig. 3. In this experiment, both the pole frequencies and the bandwidths of the input speech spectra were multiplied by 1. 3 by simply setting the sampling frequency of the D/A converter at the value of 1. 3 times as high as that of the A/D converter. It is known that the pitch frequency of female voices is approximately twice as high as that of male voices, and that the optimal pitch frequency region exists corresponding to format frequencies. Therefore, we tried several multiplying factors for pitch frequencies between 1. 7 and 2. 5 and decided for 2. 1 as the best by an informal listening test. To soften the shrillness of the synthesized voice, we designed a filter to compensate for the difference of the glottal wave forms between female voice and male voice, the input-output relation of which is given by (9). In a standard case, in which the shorter of the rising time and the falling time of the glottal wave form of female voices is twice as long as that of male voice, the difference of the glottal wave forms can be compensated by a filter with the frequency characteristics shown in Fig. 4. The spectra of the vowel segments of the male voice used in this experiment have dips around 2 kHz, which corresponds to the rising time (or falling time) of 0. 5 ms. The optimum value, determined by an informal listening test, for the constant τ of the compensating filter appearing in (10) was also approximately 0. 5 ms. The synthesized voice has excessive amplitude in one part as shown in Fig. 7. To remove this deficiency, a saturating operation was performed on the intensity of the driving signal. By this method we obtained an almost satisfactory female voice without any different processing for each phoneme.
- 社団法人日本音響学会の論文
- 1976-06-01
著者
関連論文
- 受像面上の雑音スペクトル
- 学習可能性の理論 (情報科学の数学的理論)
- テレビジョンを利用したレンズ測定器の二,三の改良 : II. 光学
- TV伝送系の線形ひずみ
- フィルム送像装置に関係したアイコノスコープのガンマ特性
- Incremental Gainによる映像機器の直線性測定法
- 多集団の線形判別関数による母音スペクトルの特徴抽出
- アイコノスコープを用いたフィルム送像装置の改良について
- 自乗音声波のピーク検出による基本周波数抽出
- 録音編集方式によるロ-カル天気予報の音声出力実験
- 男声・女声変換実験
- テレビ信号の低周波波形ひずみ