A Microphone Array-Based 3-D N-Best Search Method for Recognizing Multiple Sound Sources
スポンサーリンク
概要
- 論文の詳細を見る
This paper describes a method for hands-free speech recognition, and particularly for the simultaneous recognition of multiple sound sources. The method is based on the 3-D Virerbi search, i.e., extended to the 3-D N-best search method enabling the recognition of multiple sound sources. The baseline system integrates two existing technologies - 3-D Viterbi search and conventional N-best search - into a complete system. Previously, the first evaluation of the 3-D N-best search-based system showed that new ideas are necessary to develop a system for the simultaneous recognition of multiple sound sources. It found two factors that play important roles in the performance of the system, namely the different likelihood ranges of the sound sources and the direction-based separation of the hypotheses. In order to solve these problems, we implemented a likelihood normalization and a path distance-based clustering technique into the baseline 3-D N-best search-based system. The performance of our system was evaluated through experiments on simulated data for the case of two talkers. The experiments showed significant improvements by implementing the above two techniques. The best results were obtained by implementing the two techniques and using a microphone array composed of 32 channels. More specifically, the Word Accuracy for the two talkers was higher than 80% and the Simultaneous Word Accuracy (where both sources are correctly recognized simultaneously) was higher than 70%, which are very promising results.
- 社団法人電子情報通信学会の論文
- 2002-06-01
著者
-
SHIKANO Kiyohiro
Nara Institute of Science and Technology
-
NAKAMURA Satoshi
ATR Spoken Language Translation Research Labs.
-
SHIKANO Kiyohiro
Graduate School of Information Science, Nara Institute of Science and Technology
-
Shikano Kiyohiro
Graduate School Of Information Science Nara Institute Of Science And Technology
-
Shikano Kiyohiro
Chiba University And National Institute Of Information And Communications Technology
-
YAMADA Takeshi
University of Tsukuba
-
Shikano K
Chiba University And National Institute Of Information And Communications Technology
-
Nakamura S
National Institute Of Information And Communications Technology
-
HERACLEOUS Panikos
Nara Institute of Science and Technology
-
Heracleous P
Nara Inst. Sci. And Technol. Nara Jpn
-
Yamada T
University Of Tsukuba
-
Nakamura Satoshi
Atr Spoken Language Translation Res. Lab. Kyoto Jpn
-
Nakamura Satoshi
Atr Spoken Language Communication Res. Lab. Kyoto‐fu Jpn
-
Nakamura Satoshi
National Institute Of Information And Communications Technology
関連論文
- Fuzzy Cluster Analysis and its Evaluation Method(BIOMETRICS AND ITS APPLICATIONS)
- Development of real-time audio localization control system (応用音響)
- EA2010-24 Development of real-time audio localization control system
- Combination Therapy with Vascular Endothelial Growth Factor Neutralizing Antibody and Mitomycin C on Human Gastric Cancer Xenograft
- CENSREC-1-C : An evaluation framework for voice activity detection under noisy environments
- Sound reproduction based on multi-channel inverse filtering and WFS
- Noise and Channel Distortion Robust ASR System for DARPA SPINE2 Task (Special Issue on Speech Information Processing)
- A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech (Special Issue on Speech Information Processing)
- The Cell Surface Glycoprotein of Haloarcula japonica TR-1
- Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method
- Cost Reduction of Acoustic Modeling for Real-Environment Applications Using Unsupervised and Selective Training
- Reducing Computation Time of the Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics(Speech and Hearing)
- Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models(Speech Recognition, Statistical Modeling for Speech Processing)
- Utterance-Based Selective Training for the Automatic Creation of Task-Dependent Acoustic Models(Speech Recognition, Statistical Modeling for Speech Processing)
- Designing Target Cost Function Based on Prosody of Speech Database(Speech Synthesis and Prosody, Corpus-Based Speech Technologies)
- Designing Target Cost Function Based on Prosody of Speech Database
- Cross-language Voice Conversion Evaluation Using Bilingual Databases (特集 音声言語情報処理とその応用)
- Characterization of a Novel Human Tumor Necrosis Factor-α Mutant with Increased Cytotoxic Activity
- A MAP Estimator for the Enhancement of Speech Signal Separated by ICA Algorithm (国際ワークショップ Frontiers in Speech and Hearing Research)
- Effect of Central Limit Theorem non-compliance on blind separation of speech by negentropy maximization
- Blind Separation of Speech by Fixed-Point ICA with Source Adaptive Negentropy Approximation(Blind Source Separation, Multi-channel Acoustic Signal Processing)
- Robots that can hear, understand and talk
- Probability Distribution of Time-Series of Speech Spectral Components(Audio/Speech Coding)(Applications and Implementations of Digital Signal Processing)
- Quantitative analysis of pattern of gonial proliferation during sexual maturation in Japanese scallop Patinopecten yessoensis
- GnRH-PROMOTED SPERMATOGONIAL PROLIFERATION OF SCALLOP MEDIATES THROUGH STEROIDOGENESIS(Endocrinology,Abstracts of papers presented at the 76^ Annual Meeting of the Zoological Society of Japan)
- MOLECULAR CLONING OF A PUTATIVE SEROTONIN RECEPTOR EXPRESSED IN THE OVARY OF SCALLOP, PATINOPECTEN YESSOENSIS(Developmental Biology,Abstracts of papers presented at the 76^ Annual Meeting of the Zoological Society of Japan)
- REGULATION OF GONIAL MULTIPLICATION BY A GnRH-LIKE FACTOR IN THE CENTRAL NERVOUS SYSTEM OF THE PATINOPECTEN YESSOENSIS(Endocrinology)(Proceedings of the Seventy-Third Annual Meeting of the Zoological Society of Japan)
- AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition(Speech Corpora and Related Topics, Corpus-Based Speech Technologies)
- Missing Feature Theory Applied to Robust Speech Recognition over IP Network(Speech Dynamics by Ear, Eye, Mouth and Machine)
- CENSREC-3: An Evaluation Framework for Japanese Speech Recognition in Real Car-Driving Environments(Speech and Hearing)
- A Design for a Collaborative Steering System of Microphone Array and Video Camera Toward Multi-Lingual Tele-Conference (特集 インタラクション技術の革新と実用化)
- A design of adaptive beamformer based on average speech spectrum for noisy speech recognition
- A Microphone Array-Based 3-D N-Best Search Method for Recognizing Multiple Sound Sources
- 3D N-best 探索法に基づく複数音源の位置推定と音声認識の統合
- 複数話者の音声認識における音源方向経路間距離を用いた3-D N-best探索法の評価
- The present status, progress, and usage of speech databases in Japan
- Thermophilic Alkaline Xylanase from Newly Isolated Alkaliphilic and Thermophilic Bacillus sp. Strain TAR-1
- Degradation of Human Hair by a Thermostable Alkaline Protease from Alkaliphilic Bacillus sp. No.AH-101
- Molecular Cloning, Nucleotide Sequence, and Expression of the Structural Gene for Alkaline Serine Protease from Alkaliphilic Bacillus sp.221
- Non-Audible Murmur (NAM) Recognition(2004 IEICE Excellent Paper Award)
- IMPROVING ACCURACY IN PARAMETER ESTIMATION IN AN EXTENDED KALMAN PARTICLE FILTERS FOR NOISY SPEECH RECOGNITION
- ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles(Speech Recognition, Statistical Modeling for Speech Processing)
- Non-Audible Murmur (NAM) Recognition Exploiting Adaptation Techniques
- An HMM State Duration Control Algorithm Applied to Large-Vocabulary Spontaneous Speech Recognition
- Development and evaluation of pocket-size real-time blind source separation microphone
- Objective sound quality comparison based on higher-order statistics for nonlinear noise reduction methods (応用音響)
- Objective sound quality evaluation for combination method of beamforming and spectral subtraction (応用音響)
- Fast Convergence Blind Source Separation Using Frequency Subband Interpolation by Null Beamforming
- Rapid Compensation of Temperature Fluctuation Effect for Multichannel Sound Field Reproduction System
- Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System
- Interface for Barge-in Free Spoken Dialogue System Using Nullspace Based Sound Field Control and Beam forming (Speech/Audio Processing, Multidimensional Signal Processing and Its Application)
- On-Line Relaxation Algorithm Applicable to Acoustic Fluctuation for Inverse Filter in Multichannel Sound Reproduction System(Sound Field Reproduction, Multi-channel Acoustic Signal Processing)
- 複数モデルを用いた十分統計量に基く教師なし話者適応における学習話者のクラス化の検討
- Iterative Inverse Filter Relaxation Algorithm for Adaptation to Acoustic Fluctuation in Sound Reproduction System
- Sound Reproduction System Including Adaptive Compensation of Temperature Fluctuation Effect for Broad-Band Sound Control(Special Section on Digital Signal Processing)
- Elderly Acoustic Models for Large Vocabulary Continuous Speech Recognition
- Maximum Likelihood Successive State Splitting Algorithm for Tied-Mixture HMnet
- Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation(Life-like Agent and its Communication)
- Interface for Barge-in Free Spoken Dialogue System Combining Adaptive Sound Field Control and Microphone Array(Speech and Hearing)
- Multi-Lingual Multi-Function Multi-Media Intelligent System
- Nonparametric Speaker Recognition Method Using Earth Mover's Distance(Speaker Recognition, Statistical Modeling for Speech Processing)
- Speaker Recognition using a Non-parametric Speaker Model Representation and Earth Mover's Distance
- Speaker Recognition using a Non-parametric Speaker Model Representation and Earth Mover's Distance
- Speaker Recognition using a Non-parametric Speaker Model Representation and Earth Mover's Distance
- A Self-Generator Method for Initial Filters of SIMO-ICA Applied to Blind Separation of Binaural Sound Mixtures(Blind Source Separation, Multi-channel Acoustic Signal Processing)
- Multistage SIMO-Model-Based Blind Source Separation Combining Frequency-Domain ICA and Time-Domain ICA(Adaptive Signal Processing and Its Applications)
- Charge-Independence-Breaking Interactions in sd-Shell Nuclei : Nuclear Physics
- Direction of Arrival Estimation Using Nonlinear Microphone Array
- Evaluation of Extremely Small Sound Source Signals Used in Speaking-Aid System with Statistical Voice Conversion
- Improvements of the One-to-Many Eigenvoice Conversion System
- Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models
- Adaptive Training for Voice Conversion Based on Eigenvoices
- Passive hybrid subtractive beamformer for near-field sound sources
- Detection of Overlapping Speech in Meetings Using Support Vector Machines and Support Vector Regression(Engineering Acoustics)
- Comparative Assessment of Test Signals Used for Measuring Residual Echo Characteristics
- Learning, Generation and Recognition of Motions by Reference-Point-Dependent Probabilistic Models
- An Acoustic Modeling Method Robustagainst Changes of Speaking Stylein Error Recovery
- Blind Separation and Deconvolution for Convolutive Mixture of Speech Combining SIMO-Model-Based ICA and Multichannel Inverse Filtering(Engineering Acoustics)
- High-Fidelity Blind Separation of Acoustic Signals Using SIMO-Model-Based Independent Component Analysis(Engineering Acoustics)
- A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency(Speech Recognition, Statistical Modeling for Speech Processing)
- Improving Acoustic Model Precision by Incorporating a Wide Phonetic Context Based on a Bayesian Framework(Speech Recognition, Statistical Modeling for Speech Processing)
- A Hybrid HMM/BN Acoustic Model for Automatic Speech Recognition (Special Issue on Speech Information Processing)
- A Speech Dialogue System with Multimodal Interface for Telephone Directory Assistance
- Overdetermined Blind Separation for Real Convolutive Mixtures of Speech Based on Multistage ICA Using Subarray Processing(Speech/Acoustic Signal Processing)(Digital Signal Processing)
- Stable Learning Algorithm for Blind Separation of Temporally Correlated Acoustic Signals Combining Multistage ICA and Linear Prediction(Digital Signal Processing)
- Blind Source Separation of Acoustic Signals Based on Multistage ICA Combining Frequency-Domain ICA and Time-Domain ICA
- Fast-Convergence Algorithm for Blind Source Separation Based on Array Signal Processing
- An Iterative Inverse Filter Design Method for the Multichannel Sound Field Sound Field Reproduction System(Special Section on Acoustic Signal Processing)
- MIXTURE OF FACTOR ANALYZED HMM
- Sound Field Reproduction by Wavefront Synthesis Using Directly Aligned Multi Point Control
- Iterative Estimation and Compensation of Signal Direction for Moving Sound Source by Mobile Microphone Array(Engineering Acoustics)
- TIME-VARYING NOISE COMPENSATION BY SEQUENTIAL MONTE CARLO METHOD
- Objective Quality Assessment of Wideband Speech Coding(Network)
- Burst Error Recovery for Huffman Coding(Algorithm Theory)
- Audio-Visual Speech Recognition Based on Optimized Product HMMs and GMM Based-MCE-GPD Stream Weight Estimation (Special Issue on Speech Information Processing)
- Theoretical Analysis of Amounts of Musical Noise and Speech Distortion in Structure-Generalized Parametric Blind Spatial Subtraction Array
- Speech Prior Estimation for Generalized Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator
- CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments
- Comparison of Methods for Topic Classification of Spoken Inquiries
- Semi-Blind Optimization Scheme of Joint Suppression of Background Noise and Late Reverberation