Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System
スポンサーリンク
概要
- 論文の詳細を見る
In this paper, the development, long-term operation and portability of a practical ASR application in a real environment is investigated. The target application is a speech-oriented guidance system installed at the local community center. The system has been exposed to ordinary people since November 2002. More than 300 hours or more than 700,000 inputs have been collected during four years. The outcome is a rare example of a large scale real-environment speech database. A simulation experiment is carried out with this database to investigate how the systems performance improves during the first two years of operation. The purpose is to determine empirically the amount of real-environment data which has to be prepared to build a system with reasonable speech recognition performance and response accuracy. Furthermore, the relative importance of developing the main system components, i. e. speech recognizer and the response generation module, is assessed. Although depending on the systems modeling capacities and domain complexity, experimental results show that overall performance stagnates after employing about 10-15k utterances for training the acoustic model, 40-50k utterances for training the language model and 40k-50k utterances for compiling the question and answer database. The Q & A database was most important for improving the systems response accuracy. Finally, the portability of the well-trained first system prototype for a different environment, a local subway station, is investigated. Since collection and preparation of large amounts of real data is impractical in general, only one month of data from the new environment is employed for system adaptation. While the speech recognition component of the first prototype has a high degree of portability, the response accuracy is lower than in the first environment. The main reason is a domain difference between the two systems, since they are installed in different environments. This implicates that it is imperative to take the behavior of users under real conditions into account to build a system with high user satisfaction.
- (社)電子情報通信学会の論文
- 2008-03-01
著者
-
SARUWATARI Hiroshi
Nara Institute of Science and Technology
-
SHIKANO Kiyohiro
Nara Institute of Science and Technology
-
SARUWATARI Hiroshi
Graduate School of Information Science, Nara Institute of Science and Technology
-
SHIKANO Kiyohiro
Graduate School of Information Science, Nara Institute of Science and Technology
-
CINCAREK Tobias
Graduate School of Information Science, Nara Institute of Science and Technology
-
KAWANAMI HIROMICHI
Graduate School of Information Science, Nara Institute of Science and Technology
-
Cincarek Tobias
Graduate School Of Information Science Nara Institute Of Science And Technology
-
Shikano Kiyohiro
Graduate School Of Information Science Nara Institute Of Science And Technology
-
Kawanami Hiromichi
Graduate School Of Information Science Nara Institute Of Science And Technology
-
Saruwatari Hiroshi
Graduate School Of Information Science Nara Institute Of Science And Technology
-
LEE Akinobu
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
Shikano K
Chiba University And National Institute Of Information And Communications Technology
-
Lee Akinobu
Department Of Computer Science Nagoya Institute Of Technology
-
Lee Akinobu
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
NISIMURA Ryuichi
Faculty of Systems Engineering, Wakayama University
-
Sawada H
Graduate School Of Information Science Nara Institute Of Science And Technology
-
Nisimura Ryuichi
Faculty Of Systems Engineering Wakayama University
関連論文
- ユーザ負担のない話者・環境適応性を実現する自然な音声対話処理技術の総合開発(総合報告)
- 括弧表現に基づくWebテキストマイニングを用いた流行語への自動読み付与の提案
- 実環境向け音声対話ロボット「キタちゃん」の開発
- 音声対話システムにおけるWeb検索タスクの発話分析とWeb検索のための大規模単語コーパスの検討(言語モデル)
- Google N-gramを用いた音声認識のタスク汎用性評価の試み (音声)
- 3Q-3 NAMマイクによる心音の収録とその明瞭化(音声の分析・合成,学生セッション,人工知能と認知科学)
- Development of real-time audio localization control system (応用音響)
- 多対多最小パターンアライメントアルゴリズムの提案と自動読み付与による評価
- Stacked Generalization for Topic Classification of Spoken Inquiries
- EA2010-24 Development of real-time audio localization control system