- 論文の詳細を見る
Most of the schemes for recognition of connected speech are based on segmentation of the speech signal into separate units and their subsequent recognition as individual syllables. In view of the face that the acoustical properties of these units are quite different from those of monosyllables uttered in isolation, however, it is to be expected that these units, when taken out from connected speech and presented in isolation, are perceptually different from the corresponding isolated monosyllables. If such perceptual differences are shown to exist, they may serve as improtant evidences for the necessity of incorporating systematic removal of contextual variations into schemes for automatic recognition of connected speech. From this point of view, investigations have been made on the perceptual properties of vowels segmented from isolated monosyllables and connected speech, as well as those of monosyllables and larger units segmented from connected speech, and the following results were obtained: 1. In the case of an isolated C-V syllable, the perception of the consonant was found bo be affrected by the systematic removal of the initial portion of the syllable, unitil finally only the vowel was perceived. The perception of the vowel, however, remained unaffected (Figs. 2 and 3). 2. On the other hand, the perception of a vowel in connected speech was found to be seriously impaired by the removal of its environment. In the case of the vowel/a/, for example, only 12 out of 32 samples in clearly pronounced connected speech received more than 50% correct judgment when taken out from their environments and presented in isolation. The average score of correct identification for the 32 samples was only 57% (Figs. 4 and 5). The scores for the vowels /i/, /e/, /o/ and /u/ ranged from 52% to 70%. These perceptual confusions of vowels in connected speech were found highly correlated with their acoustical properties in the F_1- F_2 plane (Fig. 6). 3. When monosyllabic segments were taken out from connected sppeech and presented in isolation, only 92 out of 219 samples were identified correctly, corresponding to a score as low as 42%. Because of the existence of the preceding consonantal environment, however, the score for the vowels in this case was improved up to about 80%. In the case of bisyllabic segments, the scores of correct identification for the first and the second syllables were 62% and 76% respectively (Fig. 7). In the case of trisyllabic segments, the score for the middle syllable was further improved to 95%, and the score for the middle vowel was as high as 97% (Figs. 8, 9, 10 and 11). These experimental results indicate that the perception of vowels or monosyllables in connected speech is seriously impaired by the complete removal of their environments, and that at least two syllables, one preceding and one following, are necessary to provide a perceptual environment for their correct identification.
- 社団法人日本音響学会の論文
- 1972-05-01
- 研究用日本語音声データベースの構築
- ベクトル量子化による声質変換
- 国内における音声データベースの現状 : 開発,管理及び音声研究への利用
- 音声研究の40年 : 音声知覚、音声個人性、声質変換、音声データベースの研究(若手研究者育成レクチャーシリーズ)(音声の基礎と応用シンポジウム)
- XML 教材構築システムの作成
- 波形処理による鼻濁音の分析
- ホルマント周波数の時間変化パターンと2連および3連母音の知覚
- 合成音声の両耳融合と動的母音の知覚機構
- ホルマント周波数の種々の時間変化に対する母音の知覚
- 動的合成母音による音韻境界の知覚的検討
- 連続音声中の母音連鎖における調音結合効果の正規化
- 連続音声中の切り出し母音および音節の音韻知覚
- 耳の感受性と音のうるささ
- 5. [若手研究者育成レクチャーシリーズ]音声研究の40年 : 音声知覚,音声個人性,声質変換,音声データベースの研究(第309回研究例会発表要旨)