CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments
スポンサーリンク
概要
- 論文の詳細を見る
We have been distributing a new collection of databases and evaluation tools called CENSREC-4, which is a framework for evaluating distant-talking speech in reverberant environments. The data contained in CENSREC-4 are connected digit utterances as in CENSREC-1. Two subsets are included in the data: "basic data sets" and "extra data sets." The basic data sets are used for evaluating the room impulse response-convolved speech data to simulate the various reverberations. The extra data sets consist of simulated data and corresponding real recorded data. Evaluation tools are presently only provided for the basic data sets and will be delivered to the extra data sets in the future. The task of CENSREC-4 with a basic data set appears simple; however, the results of experiments prove that CENSREC-4 provides a challenging reverberation speech-recognition task, in the sense that a traditional technique to improve recognition and a widely used criterion to represent the difficulty of recognition deliver poor performance. Within this context, this common framework can be an important step toward the future evolution of reverberant speech-recognition methodologies.
著者
-
TSUGE Satoru
Daido University
-
Takeda Kazuya
Nagoya Univ.
-
Kuroiwa Shingo
Chiba Univ. And National Inst. Of Information And Communications Technol.
-
YAMADA Takeshi
University of Tsukuba
-
NISHIURA Takanobu
Ritsumeikan University
-
Ogawa Tetsuji
Waseda University
-
Tamura Satoshi
Gifu University
-
Nakamura Satoshi
National Inst. Information And Communications Technol. (nict) Kyoto‐fu Jpn
-
YAMAMOTO Kazumasa
Toyohashi University of Technology
-
TAKIGUCHI Tetsuya
Kobe University
-
Fujimoto Masakiyo
Ntt Communication Science Laboratories Ntt Corporation
-
Miyajima Chiyomi
Nagoya Univ.
-
Takiguchi Tetsuya
Kobe Univ.
-
Kitaoka Norihide
Nagoya Univ.
-
Nishiura Takanobu
Ritsumeikan Univ.
-
Denda Yuki
Murata Machinery Ltd.
-
Nakayama Masato
Kinki Univ.
-
Matsuda Shigeki
National Inst. Of Information And Communications Technol. “keihanna Sci. City "
-
Ogawa Tetsuji
Waseda Inst. For Advanced Study Waseda Univ.
-
Tamura Satoshi
Gifu Univ.
-
Fujimoto Masakiyo
NTT Communication Science Laboratories, NTT Corporation, "Keihanna Science City,"
-
Fukumori Takahiro
Ritsumeikan University
-
Nakayama Masato
Kinki University
-
Denda Yuki
Murata Machinery, Ltd.
関連論文
- Fuzzy Cluster Analysis and its Evaluation Method(BIOMETRICS AND ITS APPLICATIONS)
- 音響情報と映像情報の統合による多人数会話における話者決定技術(音響処理・話者同定,第10回音声言語シンポジウム)
- 音声認識における頑健性 : 音響分析・音響モデル,なにが課題か(企画)
- 長時間分析に基づく位相情報を用いた音声認識の検討(認識,理解,対話,一般)
- Hidden Conditional Neural Fieldsを用いた音声認識における目的関数と階層的音素事後確率特徴量の検討
- 重要文抽出に基づく講義音声の自動要約
- 音声区間検出の基礎と最近の研究動向(音声・言語・音響教育,一般)
- A Local DCT-II Feature Extraction Approach Personal Identification Based on Palmprint
- Hidden Conditional Neural Fieldsを用いた音声認識の検討
- Dirichlet事前分布を用いた音声区間検出の検討
- 音声区間検出技術の最近の研究動向
- 距離付きn-gramインデックスによる認識誤りと未知語に頑健な高速検索法
- 雑音下マルチモーダル音声認識評価基盤CENSREC-1-AVの構築
- 音声に含まれるプライバシ情報の保護(センシングウェブ)
- 日本語講義音声コンテンツコーパスの作成と分析
- 複数仮説を考慮した講義音声認識結果の自動整形
- 位相情報を利用した話者識別・照合法の評価(ポスターセッション,第10回音声言語シンポジウム)
- Dirichlet事前分布を用いた音声区間検出の検討(韻律・VAD,第11回音声言語シンポジウム)
- Dirichlet事前分布を用いた音声区間検出の検討 (音声)
- Detecting Robot-Directed Speech by Situated Understanding in Physical Interaction
- A DFT-based method of feature extraction for palmprint recognition (特集 平成20年電気学会電子・情報・システム部門大会)
- Fast Incremental Algorithm of Simple Principal Component Analysis (特集 若手研究者) -- (ソフトコンピューティング・学習)
- Fast approximate incremental learning algorithm based on simple-FLDA (Special issue on nonlinear circuits and signal processing)
- CENSREC-1-C : An evaluation framework for voice activity detection under noisy environments
- Driver Identification Using Driving Behavior Signals(Human-computer Interaction)
- Class-Dependent Modeling for Dialog Translation
- Using Mutual Information Criterion to Design an Efficient Phoneme Set for Chinese Speech Recognition
- 音響情報と映像情報の統合による多人数会話における話者決定技術(音響処理・話者同定,第10回音声言語シンポジウム)
- 音声区間検出と雑音抑圧の統合法を用いた雑音下音声認識(音響処理・話者同定,第10回音声言語シンポジウム)
- 音声区間検出と雑音抑圧の統合法を用いた雑音下音声認識(音響処理・話者同定,第10回音声言語シンポジウム)
- AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition(Speech Corpora and Related Topics, Corpus-Based Speech Technologies)
- CENSREC-3: An Evaluation Framework for Japanese Speech Recognition in Real Car-Driving Environments(Speech and Hearing)
- A Design for a Collaborative Steering System of Microphone Array and Video Camera Toward Multi-Lingual Tele-Conference (特集 インタラクション技術の革新と実用化)
- A design of adaptive beamformer based on average speech spectrum for noisy speech recognition
- A Microphone Array-Based 3-D N-Best Search Method for Recognizing Multiple Sound Sources
- A Non-stationary Noise Suppression Method Based on Particle Filtering and Polyak Averaging(Speech Recognition, Statistical Modeling for Speech Processing)
- Evaluation of HRTFs estimated using physical features
- Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- A New Framework of Removing Salt and Pepper Impulse Noise for the Noisy Image Including Many Noise-Free White and Black Pixels
- Influence of Lombard Effect : Accuracy Analysis of Simulation-Based Assessments of Noisy Speech Recognition Systems for Various Recognition Conditions
- Speech Enhancement Using a Square Microphone Array in the Presence of Directional and Diffuse Noise
- Evaluation of Combinational Use of Discriminant Analysis-Based Acoustic Feature Transformation and Discriminative Training
- Linear Discriminant Analysis Using a Generalized Mean of Class Covariances and Its Application to Speech Recognition
- Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN
- Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition
- Noisy Speech Recognition Based on Integration/Selection of Multiple Noise Suppression Methods Using Noise GMMs
- Noise Robust Voice Activity Detection Based on Switching Kalman Filter
- Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation
- Acoustic Feature Transformation Combining Average and Maximum Classification Error Minimization Criteria
- Driver's irritation detection using speech recognition results (音声・第10回音声言語シンポジウム)
- Driver's irritation detection using speech recognition results (音声言語情報処理)
- Driver's irritation detection using speech recognition results (言語理解とコミュニケーション・第10回音声言語シンポジウム)
- Nonparametric Speaker Recognition Method Using Earth Mover's Distance(Speaker Recognition, Statistical Modeling for Speech Processing)
- Speaker Recognition using a Non-parametric Speaker Model Representation and Earth Mover's Distance
- An Acoustically Oriented Vocal-Tract Model
- Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions
- Distant Speech Recognition Using a Microphone Array Network
- Auditory perception versus automatic estimation of location and orientation of an acoustic source in a real environment
- Estimation of speaker and listener positions in a car using binaural signals
- Sound localization under conditions of covered ears on the horizontal plane
- Single-Channel Multiple Regression for In-Car Speech Enhancement
- Adaptive Nonlinear Regression Using Multiple Distributed Microphones for In-Car Speech Recognition(Speech Enhancement, Multi-channel Acoustic Signal Processing)
- Speech Recognition Using Finger Tapping Timings(Speech and Hearing)
- CIAIR In-Car Speech Corpus : Influence of Driving Status(Corpus-Based Speech Technologies)
- Construction and Evaluation of a Large In-Car Speech Corpus(Speech Corpora and Related Topics, Corpus-Based Speech Technologies)
- Comparative Assessment of Test Signals Used for Measuring Residual Echo Characteristics
- Learning, Generation and Recognition of Motions by Reference-Point-Dependent Probabilistic Models
- Prosody reconstruction by rescaling fundamental frequency contours in order to synthesize communicative speech (Speech) -- (国際ワークショップ"Asian workshop on speech science and technology")
- 雑音のバイアス-残差成分の分解と各成分の最適化に基づく雑音抑圧の検討(音響モデル・雑音・分析,第12回音声言語シンポジウム:情報アクセス,音声・言語処理一般)
- 雑音のバイアス-残差成分の分解と各成分の最適化に基づく雑音抑圧の検討(音響モデル・雑音・分析,第12回音声言語シンポジウム:情報アクセス,音声・言語処理一般)
- Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
- 複数人会話シーン分析におけるマイクロホンアレイ音声処理(一般,音声・音響信号処理,音声及び一般)
- 複数人会話シーン分析におけるマイクロホンアレイ音声処理(一般,音声・音響信号処理,音声及び一般)
- 複数人会話シーン分析におけるマイクロホンアレイ音声処理(一般,音声・音響信号処理,音声及び一般)
- 雑音モデルの頑健なオンライン推定法に基づく雑音抑圧の検討(オーガナイズドセッション:スピーチエンハンスメント,音声・音響信号処理,音声及び一般)
- 雑音モデルの頑健なオンライン推定法に基づく雑音抑圧の検討(オーガナイズドセッション:スピーチエンハンスメント,音声・音響信号処理,音声及び一般)
- 雑音モデルの頑健なオンライン推定法に基づく雑音抑圧の検討(オーガナイズドセッション:スピーチエンハンスメント,音声・音響信号処理,音声及び一般)
- Ambient Browser: Web Browser for Daily Use (日韓合同ワークショップ 1st Korea-Japan Joint Workshop on Ubiquitous Computing and Networking Systems (ubiCNS 2005))
- Objective Quality Assessment of Wideband Speech Coding(Network)
- NMFとVQ手法による音楽重畳音声の音声認識(音声・言語・音響教育,一般)
- 複数理解候補の保持と効率性・自然性を考慮した応答生成による誤認識に頑健な音声対話戦略とその評価(音声,聴覚)
- A Bayesian Model of Transliteration and Its Human Evaluation When Integrated into a Machine Translation System
- 話者適応と雑音混合モデル推定の同時適用による雑音抑圧(耐雑音処理,第13回音声言語シンポジウム)
- 話者適応と雑音混合モデル推定の同時適用による雑音抑圧(耐雑音処理,第13回音声言語シンポジウム)
- 運動障害性構音障害者の発話明瞭度改善に対する音響パラメータを用いた自動推定法 : 歌唱・発声リハビリテーションを介して
- 音声ドキュメント検索のための音節ラティスの拡張とn-gram索引の削減手法(音声検索,第13回音声言語シンポジウム)
- 音声ドキュメント検索のための音節ラティスの拡張とn-gram索引の削減手法(音声検索,第13回音声言語シンポジウム)
- 音声区間検出の基礎と世界的な研究動向,今後の展開
- Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition
- Selective Gammatone Envelope Feature for Robust Sound Event Recognition
- 複数の対話エージェントを用いた音声対話システムの分析と評価
- 複数の対話エージェントを用いた音声対話システムの分析と評価
- CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments
- Selective Gammatone Envelope Feature for Robust Sound Event Recognition
- Collecting Colloquial and Spontaneous-like Sentences from Web Resources for Constructing Chinese Language Models of Speech Recognition
- Collecting Colloquial and Spontaneous-like Sentences from Web Resources for Constructing Chinese Language Models of Speech Recognition
- A Graph-Based Spoken Dialog Strategy Utilizing Multiple Understanding Hypotheses
- Situated Spoken Dialogue with Robots Using Active Learning
- 対数スペクトル事前分布を用いたMAPスペクトル推定に基づく劣決定音源分離(ブラインド信号処理,一般)
- Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition