Auditory perception versus automatic estimation of location and orientation of an acoustic source in a real environment
スポンサーリンク
概要
- 論文の詳細を見る
In this work, the perception of the position and orientation of a directional acoustic source in a real enclosed environment by blindfolded listeners is investigated and compared with a method that automatically estimates the position and orientation of the source using a T-shaped microphone array. In the subjective experiment using blindfolded listeners, a human speaker acted as an acoustic source and listeners judged the speaker’s facing angle (one out of four possible orientations shifted by 90°) and position after listening to a spoken sentence. This procedure was performed twice, before and after a training phase. In the training, listeners were allowed to remove the blindfold and verify the speaker’s position and orientation. After the training, the correct orientation ratio increased from 75.0 to 76.5% and the average position error decreased from 66.2 to 60.6 cm. In addition, a subjective experiment on orientation estimation with eight orientations shifted by 45° in the same real environment showed that orientation estimation in a real environment was more difficult than that in an anechoic environment. Artificial neural networks (ANNs) were used in the automatic estimation method. A correct orientation ratio of 68.1% and an average position error of 48.0 cm were obtained by the T-shaped microphone array located nearest to the blindfolded listener among an array network consisting of eight T-shaped microphone arrays, enabling a rough comparison between human auditory perception and the automatic estimation method (a correct orientation ratio of 67% and a better average position error of 38.6 cm were the best results obtained by a T-shaped array in the network). It was clarified that the automatic estimation method cannot surpass the auditory system in terms of correct orientation ratio; however, it yielded better results in terms of the average position error.
- 社団法人 日本音響学会の論文
著者
-
Nakagawa Seiichi
Department Of Information And Computer Sciences Toyohashi University Of Technology
-
Nakagawa Seiichi
Toyohashi Univ. Technol. Toyohashi‐shi Jpn
-
Nakagawa Seiichi
Department Of Information And Computer Sciences Toyohashi University
-
YAMAMOTO Kazumasa
Toyohashi University of Technology
-
NAKANO Alberto
Department of Information and Computer Sciences, Toyohashi University of Technology
-
YAMAMOTO Kazumasa
Department of Information and Computer Sciences, Toyohashi University of Technology
-
Nakano Alberto
Department Of Information And Computer Sciences Toyohashi University Of Technology
-
Nakagawa Seiichi
Department of Computer Science and Engineering, Toyohashi University of Technology
-
Nakano Alberto
Department of Computer Science and Engineering, Toyohashi University of Technology
-
Yamamoto Kazumasa
Department of Computer Science and Engineering, Toyohashi University of Technology
関連論文
- 長時間分析に基づく位相情報を用いた音声認識の検討(認識,理解,対話,一般)
- Hidden Conditional Neural Fieldsを用いた音声認識における目的関数と階層的音素事後確率特徴量の検討
- 重要文抽出に基づく講義音声の自動要約
- Hidden Conditional Neural Fieldsを用いた音声認識の検討
- 距離付きn-gramインデックスによる認識誤りと未知語に頑健な高速検索法
- 雑音下マルチモーダル音声認識評価基盤CENSREC-1-AVの構築
- Topic dependent language model based on on-line voting (言語理解とコミュニケーション)
- 音声に含まれるプライバシ情報の保護(センシングウェブ)
- 日本語講義音声コンテンツコーパスの作成と分析
- 複数仮説を考慮した講義音声認識結果の自動整形
- 位相情報を利用した話者識別・照合法の評価(ポスターセッション,第10回音声言語シンポジウム)
- A transitive translation for Indonesian-Japanese CLQA (自然言語処理)
- A Machine Learning Approach for an Indonesian-English Cross Language Question Answering System(Natural Language Processing)
- Indonesian-Japanese Transitive Translation using English for CLIR
- CENSREC-1-C : An evaluation framework for voice activity detection under noisy environments
- Topic dependent language model based on on-line voting (音声)
- Topic dependent language model based on clustering of noun word history
- Word and class dependency of N-gram language model (音声言語情報処理)
- Word and class dependency of N-gram language model (言語理解とコミュニケーション・第9回音声言語シンポジウム)
- Word and class dependency of N-gram language model (音声・第9回音声言語シンポジウム)
- TEXT-INDEPENDENT SPEAKER IDENTIFICATION ON TIMIT DATABASE
- Text-Independent/Text-Prompted Speaker Recognition by Combining Speaker-Specific GMM with Speaker Adapted Syllable-Based HMM(Speaker Recognition, Statistical Modeling for Speech Processing)
- Evaluation of Combinational Use of Discriminant Analysis-Based Acoustic Feature Transformation and Discriminative Training
- LVCSR based on context-dependent syllable acoustic models (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- Robust distant speech recognition by combining variable-term spectrum based position-dependent CMN with conventional CMN (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- Linear Discriminant Analysis Using a Generalized Mean of Class Covariances and Its Application to Speech Recognition
- Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN
- LVCSR based on context-dependent syllable acoustic models
- Robust distant speech recognition by combining variable-term spectrum based position-dependent CMN with conventional CMN
- Improving Keyword Recognition of Spoken Queries by Combining Multiple Speech Recognizer's Outputs for Speech-driven WEB Retrieval Task(Spoken Language Systems, Corpus-Based Speech Technologies)
- An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems(Spoken Language Systems, Corpus-Based Speech Technologies)
- Speaker Change Detection and Speaker Clustering Using VQ Distortion Measure
- Noisy Speech Recognition Based on Integration/Selection of Multiple Noise Suppression Methods Using Noise GMMs
- Succeeding Word Prediction for Speech Recognition Based on Stochastic Language Model
- A Survey on Automatic Speech Recognition(Special Issue on the 2000 IEICE Excellent Paper Award)
- Relationship among Recognition Rate, Rejection Rate and False Alarm Rate in a Spoken Word Recognition System
- Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions
- Distant Speech Recognition Using a Microphone Array Network
- Auditory perception versus automatic estimation of location and orientation of an acoustic source in a real environment
- Continuous Speech Recognition Using an On-Line Speaker Adaptation Method Based on Automatic Speaker Clustering (Special Issue on Speech Information Processing)
- Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
- A Spoken Dialog System for Spontaneous Conversations Considering Response Timing and Response Type
- NMFとVQ手法による音楽重畳音声の音声認識(音声・言語・音響教育,一般)
- Indonesian-Japanese Transitive Translation using English for CLIR
- 複数理解候補の保持と効率性・自然性を考慮した応答生成による誤認識に頑健な音声対話戦略とその評価(音声,聴覚)
- 運動障害性構音障害者の発話明瞭度改善に対する音響パラメータを用いた自動推定法 : 歌唱・発声リハビリテーションを介して
- 音声ドキュメント検索のための音節ラティスの拡張とn-gram索引の削減手法(音声検索,第13回音声言語シンポジウム)
- 音声ドキュメント検索のための音節ラティスの拡張とn-gram索引の削減手法(音声検索,第13回音声言語シンポジウム)
- Class-Based N-Gram Language Model for New Words Using Out-of-Vocabulary to In-Vocabulary Similarity
- 複数の対話エージェントを用いた音声対話システムの分析と評価
- 複数の対話エージェントを用いた音声対話システムの分析と評価
- CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments
- A Combination of Electroless Plating and Sol-Gel Methods as a Novel Technique for Preparing a Honeycomb-type-structured Catalyst