音声によるオンライン質問回答システム
スポンサーリンク
概要
- 論文の詳細を見る
Recently, the research of Speech Understanding System (SUS) has attracted great interest as a new approach to continuous speech recognition. The features of the concept of SUS are the following three points. (1) The contents of conversation are restricted to some defined area. (2) Emphasis is placed on understanding the meanings and contents of input speech rather than recognizing each word or phrase. (3) The recognition of input speech is performed through question-answering between a computer and a user. This paper describes on the contents of the SUS which the authors have studied from 1974 to 1976 and which can operate in on-line mode. The task to be performed with the system is the reservation service of train seats, and 28 stations and 181 trains are treated. Table 2 shows the seven items of reservation. The vocabulary of input speech consists of 112 words. The system consists of three parts as shown in Fig. 1. They are the acoustic processor, the linguistic processor and the audio response unit. Figure 2 illustrates the computer system on which the question-answer system in implemented. The acoustic processor and the audio response unit are implemented on NEAC 3200/70, and the linguistic processor on PF U-400. The use of high-speed speech processors connected to NEAC 3200/70 and the high-speed data transmission between these computers makes the one-line processing possible. The detailed construction of the system is shown in Fig. 3. In the acoustic processor, the feature extraction and the phoneme recognition are executed, and the results of treatment are represented in the form of phoneme lattice. In the linguistic processor, the meanings and contents of input speech are grasped through the word recognition, the syntatic analysis and the inference. Then corresponding to the recognition results, the sentences for response are composed. The audio response unit synthesizes these sentences as the response to the user. Input speech to the system must have short pauses fo more than 0. 5sec between adjacent phrases. But except this constraint, a user may speak freely to the system without being restricted by the order of reservation items or the grammar. A model of conversation was prepared so that a computer and a user can make smooth and natural question-answering. Table 3 shows the seven states in the conversation model, for each of which particular response sentences are prepared. Figure 4 shows the transition among these states. The inference by the use of time table is executed during the transition among states, which is useful to reduce the number of question-answering cycles. The output speech from the system is synthesized using words or phrases as units. For this purpose, 23 kinds of sentence patterns and 460 kinds of words or phrases to be inserted into these sentences are prepared. The performance of the system was tested by on-line question-answering experiments. Eight male speakers tried to make 320 kinds of seat reservations in total (40 reservations for each speaker), and 99. 1% of all the reservations were successfully completed. The average number of question-answering cycles, excluding the first input, was 3. 21 to complete the reservations. The detailed analysis of the contents of the question-answer is shown in Table 5, which reveals that the number of times of reinput due to rejection or misrecognition was small. These results show that the system operates fairly well in the on-line question-answering mode. The average time for acoustic and linguistic processing is 5. 0 times as much as the real-time. Figure 6 shows an example of the time chart of processing.
- 社団法人日本音響学会の論文
- 1978-03-01
著者
-
伊藤 憲三
Nttヒューマンインターフェース研究所:岩手県立大学
-
中津 良平
日本電信電話公社横須賀電気通信研究所
-
好田 正紀
日本電信電話公社武蔵野電気通信研究所
-
鹿野 清宏
日本電信電話公社武蔵野電気通信研究所
-
伊藤 憲三
日本電信電話公社武蔵野電気通信研究所
関連論文
- 身体特徴と声質との関連性に関する一検討
- A-16-22 インタラクティブストーリー自動生成システムの検討(A-16.マルチメディア・仮想環境基礎,一般講演)
- スペクトル歪最小基準による駆動音源信号の生成と音声合成
- IFIP近況報告 : 情報処理国際連合
- 音声コミュニケーションに関わるバリアフリー(音支援(音バリアフリー)を考える)
- D-12-62 日常生活を支援するロボットの研究(D-12.パターン認識・メディア理解,一般講演)
- D-12-61 身体動作によるロボットの制御(D-12.パターン認識・メディア理解,一般講演)
- 音声によるオンライン質問回答システム
- 会話音声の音響処理部と言語処理部の検討 (時系列パターンの認識システムの研究)
- 濃淡図形を線図形に変換する一方法(WPM)の諸性質について
- DSP処理を目的とした簡便な雑音抑圧処理に関する検討
- 会話音声の機械認識における音響処理
- 連続して発声した単語音声の認識
- VCV音節を単位とした単語音声の認識
- 会話音声の機械認識における音響処理
- 音声認識技術 (音声情報処理)
- 音楽聴取を目的とした補聴システムにおけるフィッティング手法の検討
- 2.1 コミュニケーションとエンタテインメント : エンタテインメントコンピューティングの事例(エンタテインメントコンピューティング)
- コンデンサマイクロホンの拡散音場感度の検討
- コンデンサマイクロホンの拡散音場感度の検討
- 自動利得制御と雑音抑圧処理が難聴者の音声知覚に及ぼす影響
- 雑音抑圧処理と自動利得制御による難聴者補聴システムの検討
- 音声/非音声識別機能を有する環境騒音抑圧法の検討
- 信号の雑音区間に着目した環境騒音抑圧法の検討
- マルチマイク収音系が難聴者の音声知覚に及ぼす影響
- パワー包絡の変動が音声知覚に及ぼす影響
- フレーム処理に基づく音量制御が音声知覚に及ぼす影響
- きめ細やかな補聴技術(バリアフリーと音響技術)
- クラスタリング手法を用いた波形合成ユニットの生成と音声合成
- D-5-4 音声対話からのロボットジェスチャの生成の検討(D-5.言語理解とコミュニケーション,一般講演)
- B-15-4 携帯電話ロボットの検討(B-15.モバイルマルチメディア通信,一般講演)