声の韻質と声質 : 音響的声道模型による音声の合成
スポンサーリンク
概要
- 論文の詳細を見る
This paper describes a simple device which simulates a human vocal tract acoustically, and results obtained from systhesized speech sounds produced by the device. It is simple and easy to deal with as compared with an electrical vocal tract simulator. Acoustic models of vocal tracts are made of transparent acryl-resin. They are of box shape. The vocal tract length of a man's model is 17. 5cm, that of woman's model 14cm, about 80% of that of the man's and that of children's models about 11cm and 9cm. The height is 2. 5cm. The cross-sectional areas of these models are made variable by moving 1-cm thick plastic strips which are closely inserted from one side. They have a nasal branch as well. Glottal sounds are sent into one end (glottis) of the models and let out of the other end (mouth). Various vowels and other sustained sounds are produced accrding to the configuration of the models at that time. The driving unit of a horn speaker (NEC-555M, Japan) was used as a sound source. Considering that the acoustic impedance at the human glottis is very high, a bundle of steel wires, each 1. 5mm in diameter and 14mm in length was packed tightly into the throat of the loud speaker. Consequently, the cross-sectional area of the throat is about 1. 3cm^2. By observing sustained seech sounds, we find two features in them. One is phonemic feature, in other words a feature that distinguishes one phoneme from others, and the other is a feature that contributes to naturalness, in other words, a feature that distinguishes not only males, females and children but also individuals from one another. We have successfully made these features clear physically. When the length of a vocal tract is reduced gradually from 17. 5cm, it will be seen that the configuration of phoneme is reduced similarly without spoiling the phonemic feature. As for the cross-sectional areas, relative values are only required. So we can normalize the vocal tract configuration of every phoneme with respect to the vocal tract length and the cross-sectional areas. The use of the normalized configurations will afford us normalized spectra of phonemes. The relation between the vocal tract length and the fundamental frequency of voice, which serves to distinguish speakers from one another in sex and age, can also be nomalized. That is to say, if the ratio of frequency, the wavelength of which is four times as long as a tract to the fundamental frequency of voice, is called a normalized pitch, we can obtain natural synthesized speech sounds when the normalized pitch ranges from 2. 5 to 5. 0. The waveform of glottal sounds referred to herein is saw-tooth form. The decay time T_3(msec) of the saw-tooth-wave form being sufficiently short as compared with one cycle T, the first zero point of glottal spectrum will apper at 1/(T_3)(kc). Glottal sound spectrum has a great influence upon characteristics of each individual's voice. The shorter the decay times, the sharper a voice becomes. Longer decay time (about 0. 6-1. 0 msec) is better for a female voice while shorter one (about 0. 2-0. 5 msec) is better for a child's voice.
- 社団法人日本音響学会の論文
- 1966-07-30
著者
関連論文
- 座談会 : 音声研究 (音声研究)
- 聴覚をめぐる最近の話題 (<特集>聴覚)
- 周波数変化速度の弁別とピッチの動的知覚
- 音の聞こえと認識 (音の認知・知覚)
- 過渡的周波数変化の検知の手掛かりについて
- 日本語音声規則合成のための構文解析
- 九州芸術工科大学における音響心理学の教育 (<小特集>音響の教育(その2):展望と課題)
- 楽譜に記された時価と演奏家の実現する長さとの系統的なくい違いについて
- 聴覚における時間処理 (聴覚特集号) -- (聴覚をめぐる最近の話題)
- 聴覚系での識別臨界速度と情報処理能力
- カナダのNRC : 音響研究室
- 声の韻質と声質 : 音響的声道模型による音声の合成
- 音響学1964年の展望 : 音響心理
- 音響学1963年の展望 : 聴覚
- 音響的声道模型による母音の合成
- 音響学1959年の展望 : 聽覚
- 音の大きさに関する心理尺度 : ホンとソーン
- 日本語50音合成のプログラミング・システムについて
- 連続メッセージにおける母音の分析 ()
- 母音におよぼす子音の影響