- 論文の詳細を見る
In order to realize the reliable automatic recognition of phonemes in connected sppech, effective means are required to cope with the variations in their acoustic characteristic due to the idiosyncrasy of speakers and coarticulation. This paper describes a new scheme for carrying out the segmentation and recognition of connected vowels and semivowels, based on a speakeradaptive model of the coarticulatory process. The process of coarticulation between the adjoining phonemes in connected vowels can be modeled in the domain of formant frequencies by a smoothing system which converts the stepwise varying target value corresponding to each successive vowels into the actual formant trajectory (Fig. 1). As the characteristics of this system, those of a critically-damped second-order linear system are generally valid as shown by the example of the word /ie/ (Fig. 2), but further elaborations, taking the continuity and coupling of reasonance modes into consideration, are required in case of the combinations of front and back vowels, as shown by the example of the word /ai/ (Fig. 3). As the input, the proposed scheme (Fig. 4) uses the trajectories of the first three formant frequencies, extrated pitch-synchronously from the short-term frequency spectra of speech, but converted to the sample values at uniform intervals by interpolation. Since highly accurate recognition of initial vowels is possible by the established techniques for the recoanition of sustained vowels, their formant frequencies can be used to estimate the target values of other vowels of the same speaker. The estimation is based on the average relationships found among the formant frequencies of all five vowels of many speakers, and by this stimation, the coarticulatory model can be adapted to an arbitrary speaker. The model can then be used for determining the underlying targets from observed formant trajectories by the method of analysis-by-synthesis, thereby accomplishing successive segmention and recognition of each phoneme in connected vowels. The validity of the scheme was proved by having obtained the overall rate of correct recognition of 98. 7% (Table 1) for a total of 445 utterances consisting of vowel dyads, triads, and quadruplets by three male speakers. The scheme can be extended to the recognition of semivowels. It has been found that formant targets of the semivowels /j/ and /w/ are quite close to these of the vowels /i/ and /u/, respectively, but their command durations are significantly different (Fig. 7). The utilization of the speech rate information, represented by the command duration of the immediately following vowel, is necessary for the accurate separation of /j/, /i/, and /ij/, when the speech rate varies over a wide range (Fig. 8). If the speech rate information is given, the rate of correct recognition of these categories is 97. 5% for a total of 270 utterances of 15 words containig semivowels, vowels, and vowel-semivowel combinations in the same context.
- 1978-03-01
- 音声情報処理の将来を考える
- 国際シンポジウム"音楽と情報科学"
- 近畿方言2拍単語アクセント型の分析及び知覚
- 87 獲得した知識の体系度を指標とする帰納的推論能力の育成
- 単語音声中の半母音の認識
- 調音結合過程の機能的モデルを用いた連続母音の認識
- 未知語を含む文の形態素解析システム
- ホルマント周波数上での調音結合の定式化と音声自動認識への適用
- 305 言語能力と図形検査を中心とした非言語能力の発達過程の関連性(発達12 空間認知,研究発表)
- 209 語彙及び単音からみる言語能力の発達過程(発達2,研究発表)
- 316 談話構造の分析に基づく言語発達過程の検討(幼児の言語発達,発達)
- 231 幼児に於ける語連鎖の発達(発達5,発達)
- 230 幼児に於ける音素及び語の獲得過程(発達5,発達)
- 日本語文章音声の合成のための韻律規則
- 日本語単語アクセントの基本周波数パタンとその生成機構のモデル
- 日本語連続音声のピッチパタンの合成のためのモデル
- テーマ・キー概念・キーワード間の階層構造を利用する新聞記事情報の分類・検索システム
- 新聞記事を対象とする用字調査
- 高機能な検索のできる大規模日本語データベースの構成
- 音声研究の現状と将来を語る
- SEARMA法による音声分析における観測区間の適応的制御
- 極・零モデルに基づく無声破裂音の分析と特徴抽出
- IEEE ASSP SocietyのTokyo Chapterの設立と、1984年IEEE音響・音声・信号処理国際会議(ICASSP 84)報告
- 定常母音の分析・正規化および認識
- 調音機構のモデル化に関するシンポジウム
- 意味空間での操作に基づく自立語の意味の獲得
- 声道伝達関数の極に基づく面積関数の推定
- 時領域における音声のピッチ抽出の一方式
- 日本語無声摩擦音の分析と認識
- 音声通信系における信号の最適線形処理
- 第103回アメリカ音響学会会議報告
- 1982年 IEEE 音響・音声・信号処理国際会議 : ICASSP 82
- 合成音声の弁別と言語音知覚機構のモデル
- Auditory Perception of Temporal Duration and Visual Perception of Stroke Length in Aphasic Patients