Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System
スポンサーリンク
概要
- 論文の詳細を見る
In this paper, the development, long-term operation and portability of a practical ASR application in a real environment is investigated. The target application is a speech-oriented guidance system installed at the local community center. The system has been exposed to ordinary people since November 2002. More than 300 hours or more than 700,000 inputs have been collected during four years. The outcome is a rare example of a large scale real-environment speech database. A simulation experiment is carried out with this database to investigate how the systems performance improves during the first two years of operation. The purpose is to determine empirically the amount of real-environment data which has to be prepared to build a system with reasonable speech recognition performance and response accuracy. Furthermore, the relative importance of developing the main system components, i. e. speech recognizer and the response generation module, is assessed. Although depending on the systems modeling capacities and domain complexity, experimental results show that overall performance stagnates after employing about 10-15k utterances for training the acoustic model, 40-50k utterances for training the language model and 40k-50k utterances for compiling the question and answer database. The Q & A database was most important for improving the systems response accuracy. Finally, the portability of the well-trained first system prototype for a different environment, a local subway station, is investigated. Since collection and preparation of large amounts of real data is impractical in general, only one month of data from the new environment is employed for system adaptation. While the speech recognition component of the first prototype has a high degree of portability, the response accuracy is lower than in the first environment. The main reason is a domain difference between the two systems, since they are installed in different environments. This implicates that it is imperative to take the behavior of users under real conditions into account to build a system with high user satisfaction.
- (社)電子情報通信学会の論文
- 2008-03-01
著者
-
SARUWATARI Hiroshi
Nara Institute of Science and Technology
-
SHIKANO Kiyohiro
Nara Institute of Science and Technology
-
SARUWATARI Hiroshi
Graduate School of Information Science, Nara Institute of Science and Technology
-
SHIKANO Kiyohiro
Graduate School of Information Science, Nara Institute of Science and Technology
-
CINCAREK Tobias
Graduate School of Information Science, Nara Institute of Science and Technology
-
KAWANAMI HIROMICHI
Graduate School of Information Science, Nara Institute of Science and Technology
-
Cincarek Tobias
Graduate School Of Information Science Nara Institute Of Science And Technology
-
Shikano Kiyohiro
Graduate School Of Information Science Nara Institute Of Science And Technology
-
Kawanami Hiromichi
Graduate School Of Information Science Nara Institute Of Science And Technology
-
Saruwatari Hiroshi
Graduate School Of Information Science Nara Institute Of Science And Technology
-
LEE Akinobu
Department of Computer Science and Engineering, Nagoya Institute of Technology
-
Shikano K
Chiba University And National Institute Of Information And Communications Technology
-
Lee Akinobu
Department Of Computer Science Nagoya Institute Of Technology
-
Lee Akinobu
Department Of Computer Science And Engineering Nagoya Institute Of Technology
-
NISIMURA Ryuichi
Faculty of Systems Engineering, Wakayama University
-
Sawada H
Graduate School Of Information Science Nara Institute Of Science And Technology
-
Nisimura Ryuichi
Faculty Of Systems Engineering Wakayama University
関連論文
- ユーザ負担のない話者・環境適応性を実現する自然な音声対話処理技術の総合開発(総合報告)
- 括弧表現に基づくWebテキストマイニングを用いた流行語への自動読み付与の提案
- 実環境向け音声対話ロボット「キタちゃん」の開発
- 音声対話システムにおけるWeb検索タスクの発話分析とWeb検索のための大規模単語コーパスの検討(言語モデル)
- Google N-gramを用いた音声認識のタスク汎用性評価の試み (音声)
- 3Q-3 NAMマイクによる心音の収録とその明瞭化(音声の分析・合成,学生セッション,人工知能と認知科学)
- Development of real-time audio localization control system (応用音響)
- 多対多最小パターンアライメントアルゴリズムの提案と自動読み付与による評価
- Stacked Generalization for Topic Classification of Spoken Inquiries
- EA2010-24 Development of real-time audio localization control system
- 未知語認識のための仮名・漢字単位の構築手法と性能評価
- Google N-gramを用いた音声認識のタスク汎用性評価の試み
- 単語の頻度と音響の特徴を利用したSVMによる無効入力の棄却
- 音声情報案内システムにおけるSVMを用いたタスク外発話の検出
- Sound reproduction based on multi-channel inverse filtering and WFS
- Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method
- Cost Reduction of Acoustic Modeling for Real-Environment Applications Using Unsupervised and Selective Training
- Reducing Computation Time of the Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics(Speech and Hearing)
- Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models(Speech Recognition, Statistical Modeling for Speech Processing)
- Utterance-Based Selective Training for the Automatic Creation of Task-Dependent Acoustic Models(Speech Recognition, Statistical Modeling for Speech Processing)
- Designing Target Cost Function Based on Prosody of Speech Database(Speech Synthesis and Prosody, Corpus-Based Speech Technologies)
- Designing Target Cost Function Based on Prosody of Speech Database
- Cross-language Voice Conversion Evaluation Using Bilingual Databases (特集 音声言語情報処理とその応用)
- 音声情報案内システムにおける質問応答データベース構築コスト削減の検討
- A MAP Estimator for the Enhancement of Speech Signal Separated by ICA Algorithm (国際ワークショップ Frontiers in Speech and Hearing Research)
- Effect of Central Limit Theorem non-compliance on blind separation of speech by negentropy maximization
- Blind Separation of Speech by Fixed-Point ICA with Source Adaptive Negentropy Approximation(Blind Source Separation, Multi-channel Acoustic Signal Processing)
- Robots that can hear, understand and talk
- Probability Distribution of Time-Series of Speech Spectral Components(Audio/Speech Coding)(Applications and Implementations of Digital Signal Processing)
- A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System
- 音声情報案内システム「たけまるくん」および「キタちゃん」の開発(特別企画「音声認識デベロッパーズフォーラム」)
- A design of adaptive beamformer based on average speech spectrum for noisy speech recognition
- A Microphone Array-Based 3-D N-Best Search Method for Recognizing Multiple Sound Sources
- 3D N-best 探索法に基づく複数音源の位置推定と音声認識の統合
- 複数話者の音声認識における音源方向経路間距離を用いた3-D N-best探索法の評価
- Non-Audible Murmur (NAM) Recognition(2004 IEICE Excellent Paper Award)
- Non-Audible Murmur (NAM) Recognition Exploiting Adaptation Techniques
- An HMM State Duration Control Algorithm Applied to Large-Vocabulary Spontaneous Speech Recognition
- Development and evaluation of pocket-size real-time blind source separation microphone
- Objective sound quality comparison based on higher-order statistics for nonlinear noise reduction methods (応用音響)
- Objective sound quality evaluation for combination method of beamforming and spectral subtraction (応用音響)
- Fast Convergence Blind Source Separation Using Frequency Subband Interpolation by Null Beamforming
- Rapid Compensation of Temperature Fluctuation Effect for Multichannel Sound Field Reproduction System
- Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System
- Interface for Barge-in Free Spoken Dialogue System Using Nullspace Based Sound Field Control and Beam forming (Speech/Audio Processing, Multidimensional Signal Processing and Its Application)
- On-Line Relaxation Algorithm Applicable to Acoustic Fluctuation for Inverse Filter in Multichannel Sound Reproduction System(Sound Field Reproduction, Multi-channel Acoustic Signal Processing)
- 複数モデルを用いた十分統計量に基く教師なし話者適応における学習話者のクラス化の検討
- Iterative Inverse Filter Relaxation Algorithm for Adaptation to Acoustic Fluctuation in Sound Reproduction System
- Sound Reproduction System Including Adaptive Compensation of Temperature Fluctuation Effect for Broad-Band Sound Control(Special Section on Digital Signal Processing)
- Elderly Acoustic Models for Large Vocabulary Continuous Speech Recognition
- Maximum Likelihood Successive State Splitting Algorithm for Tied-Mixture HMnet
- Interface for Barge-in Free Spoken Dialogue System Combining Adaptive Sound Field Control and Microphone Array(Speech and Hearing)
- A Covariance-Typing Technique for HMM-Based Speech Synthesis
- Inquiry Classification in a Speech-Oriented Guidance System Using Discriminative Learning
- 統計的機械翻訳の手法を用いた音声情報案内システムのための応答文生成手法の検討
- A Self-Generator Method for Initial Filters of SIMO-ICA Applied to Blind Separation of Binaural Sound Mixtures(Blind Source Separation, Multi-channel Acoustic Signal Processing)
- Multistage SIMO-Model-Based Blind Source Separation Combining Frequency-Domain ICA and Time-Domain ICA(Adaptive Signal Processing and Its Applications)
- Direction of Arrival Estimation Using Nonlinear Microphone Array
- Speech Enhancement Using Nonlinear Microphone Array Based on Noise Adaptive Complementary Beamforming
- Speech Enhancement Using Nonlinear Microphone Array Based on Complementary Beamforming (Special Section on Digital Signal Processing)
- Evaluation of Extremely Small Sound Source Signals Used in Speaking-Aid System with Statistical Voice Conversion
- Improvements of the One-to-Many Eigenvoice Conversion System
- Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models
- Adaptive Training for Voice Conversion Based on Eigenvoices
- Blind Separation and Deconvolution for Convolutive Mixture of Speech Combining SIMO-Model-Based ICA and Multichannel Inverse Filtering(Engineering Acoustics)
- High-Fidelity Blind Separation of Acoustic Signals Using SIMO-Model-Based Independent Component Analysis(Engineering Acoustics)
- A Speech Dialogue System with Multimodal Interface for Telephone Directory Assistance
- Subband-Based Blind Separation for Convolutive Mixtures of Speech(Engineering Acoustics)
- Overdetermined Blind Separation for Real Convolutive Mixtures of Speech Based on Multistage ICA Using Subarray Processing(Speech/Acoustic Signal Processing)(Digital Signal Processing)
- Stable Learning Algorithm for Blind Separation of Temporally Correlated Acoustic Signals Combining Multistage ICA and Linear Prediction(Digital Signal Processing)
- Blind Source Separation of Acoustic Signals Based on Multistage ICA Combining Frequency-Domain ICA and Time-Domain ICA
- Fast-Convergence Algorithm for Blind Source Separation Based on Array Signal Processing
- An Iterative Inverse Filter Design Method for the Multichannel Sound Field Sound Field Reproduction System(Special Section on Acoustic Signal Processing)
- Sound Field Reproduction by Wavefront Synthesis Using Directly Aligned Multi Point Control
- Bayesian Context Clustering Using Cross Validation for Speech Recognition
- Speech recognition based on statistical models including multiple phonetic decision trees
- 音声情報案内システムにおけるBag-of-Wordsを特徴量とした無効入力の棄却
- 携帯端末用の音声情報案内システム開発に向けたネットワークサービスの検討
- D-9-36 多様な利用環境における音声情報案内サービスソフトウェアの開発(D-9.ライフインテリジェンスとオフィス情報システム,一般セッション)
- Theoretical Analysis of Amounts of Musical Noise and Speech Distortion in Structure-Generalized Parametric Blind Spatial Subtraction Array
- Speech Prior Estimation for Generalized Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator
- Comparison of Methods for Topic Classification of Spoken Inquiries (Preprint)
- 10年間の長期運用を支えた音声情報案内システム「たけまるくん」の技術(音声対話システムの実用化に向けて)
- Comparison of Methods for Topic Classification of Spoken Inquiries
- Semi-Blind Optimization Scheme of Joint Suppression of Background Noise and Late Reverberation
- Robust Sound Field Reproduction against Listeners Movement Utilizing Image Sensor