AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition(Speech Corpora and Related Topics, <Special Section>Corpus-Based Speech Technologies)
スポンサーリンク
概要
- 論文の詳細を見る
This paper introduces an evaluation framework for Japanese noisy speech recognition named AURORA-2J. Speech recognition systems must still be improved to be robust to noisy environments, but this improvement requires development of the standard evaluation corpus and assessment technologies. Recently, the Aurora 2, 3 and 4 corpora and their evaluation scenarios have had significant impact on noisy speech recognition research. The AURORA-2J is a Japanese connected digits corpus and its evaluation scripts are designed in the same way as Aurora 2 with the help of European Telecommunications Standards Institute (ETSI) AURORA group. This paper describes the data collection, baseline scripts, and its baseline performance. We also propose a new performance analysis method that considers differences in recognition performance among speakers. This method is based on the word accuracy per speaker, revealing the degree of the individual difference of the recognition performance. We also propose categorization of modifications, applied to the original HTK baseline system, which helps in comparing the systems and in recognizing technologies that improve the performance best within the same category.
- 社団法人電子情報通信学会の論文
- 2005-03-01
著者
-
TAKEDA Kazuya
Nagoya University
-
Takeda Kazuya
Nagoya Univ.
-
Takeda K
Nagoya Univ. Nagoya Jpn
-
Kuroiwa S
Chiba University And National Institute Of Information And Communications Technology
-
Kuroiwa Shingo
Graduate School Of Advanced Integration Science Chiba University
-
Kuroiwa Shingo
Faculty Of Engineering The University Of Tokushima
-
Kuroiwa Shingo
University Of Tokushima
-
NAKAMURA Satoshi
ATR Spoken Language Translation Research Labs.
-
Takeda Kazuya
Nagoya Univ. Nagoya‐shi Jpn
-
Kitaoka Norihide
Toyohashi University Of Technology
-
Shikano Kiyohiro
Chiba University And National Institute Of Information And Communications Technology
-
Endo Tokiko
School Of Medicine Nagoya University
-
Endo T
Atr Spoken Language Translation Research Laboratories
-
YAMAMOTO Kazumasa
Shinshu University
-
YAMADA Takeshi
University of Tsukuba
-
NISHIURA Takanobu
Ritsumeikan University
-
SASOU Akira
National Institute of Advanced Industrial Science and Technology
-
MIZUMACHI Mitsunori
ATR Spoken Language Translation Research Laboratories
-
MIYAJIMA Chiyomi
Nagoya University
-
FUJIMOTO Masakiyo
ATR Spoken Language Translation Research Laboratories
-
ENDO Toshiki
ATR Spoken Language Translation Research Laboratories
-
Nakamura S
National Institute Of Information And Communications Technology
-
Miyajima Chiyomi
The Graduate School Of Information Science Nagoya University
-
Mizumachi Mitsunori
Atr Spoken Language Translation Research Laboratories:(present Address)kyushu Institute Of Technolog
-
Fujimoto Masakiyo
Department Of Electronics And Informatics Faculty Of Science And Technology Ryukoku University
-
Yamamoto K
Toyohashi University Of Technology
-
Nishiura Takanobu
Ritsumeikan Univ. Kusatsu‐shi Jpn
-
Yamada T
University Of Tsukuba
-
Nakamura Satoshi
Atr Spoken Language Translation Res. Lab. Kyoto Jpn
-
Miyajima Chiyomi
Nagoya Univ.
-
Kitaoka Norihide
Nagoya Univ.
-
Nishiura Takanobu
Ritsumeikan Univ.
-
Nakamura Satoshi
Atr Spoken Language Communication Res. Lab. Kyoto‐fu Jpn
-
Nakamura Satoshi
National Institute Of Information And Communications Technology
-
FUJIMOTO Masakiyo
Nagoya University
-
MIYAJIMA Chiyomi
Shinshu University
-
MIZUMACHI Mitsunori
University of Tsukuba
-
SASOU Akira
University of Tokushima
-
NISHIURA Takanobu
Toyohashi University of Technology
-
KITAOKA Norihide
Ritsumeikan University
-
KUROIWA Shingo
National Institute of Advanced Industrial Science and Technology
-
YAMADA Takeshi
ATR Spoken Language Translation Research Laboratories
-
YAMAMOTO Kazumasa
Nagoya University
-
TAKEDA Kazuya
ATR Spoken Language Translation Research Laboratories
-
Kuroiwa S
University of Tokushima
-
Nakamura S
ATR Spoken Language Translation Research Laboratories
-
Yamamoto K
Shinshu University
-
Kitakoka N
Toyohashi University of Technology
-
Nishiura T
Ritsumeikan University
-
Sasou A
National Institute of Advanced Industrial Science and Technology
-
Mizumachi M
ATR Spoken Language Translation Research Laboratories
-
Miyajima C
Nagoya University
-
Fujimoto M
ATR Spoken Language Translation Research Laboratories
関連論文
- Fuzzy Cluster Analysis and its Evaluation Method(BIOMETRICS AND ITS APPLICATIONS)
- 多人数会話シーン分析に向けた実時間マルチモーダルシステムの構築 : マルチモーダル全方位センサを用いた顔方向追跡と話者ダイアリゼーションの統合(テーマセッション2,アンビエント環境知能)
- 多人数会話シーン分析に向けた実時間マルチモーダルシステムの構築 : マルチモーダル全方位センサを用いた顔方向追跡と話者ダイアリゼーションの統合(テーマ関連セッション2)
- 音響情報と映像情報の統合による多人数会話における話者決定技術(音響処理・話者同定,第10回音声言語シンポジウム)
- 雑音下音声認識評価ワーキンググループ活動報告 : 認識に影響する要因の個別評価環境(3)(SIG-SLP内組織の活動報告)
- 雑音下音声認識評価ワーキンググループ活動報告 : 認識に影響する要因の個別評価環境(2)(雑音・VAD,第9回音声言語シンポジウム)
- 雑音下音声認識評価ワーキンググループ活動報告 : 認識に影響する要因の個別評価環境 (2)(雑音・VAD,第9回音声言語シンポジウム)
- 雑音下音声認識評価ワーキンググループ活動報告 : 認識に影響する要因の個別評価環境(第8回音声言語シンポジウム)
- 雑音下音声認識評価ワーキンググループ活動報告 : 認識に影響する要因の個別評価環境(Session-1 検出,第8回音声言語シンポジウム)
- 音声認識における頑健性 : 音響分析・音響モデル,なにが課題か(企画)
- CENSREC-1-C : 雑音下音声区間検出評価基盤の構築
- SLP雑音下音声認識評価WG活動報告 : 評価用データと評価手法について(Session-6 スペシャルセッション: 共通コーパスを利用した耐雑音技術評価, 第7回音声言語シンポジウム)
- 実走行車内音声認識の評価データベースCENSREC-3とその共通評価ベースライン
- 実走行車内単語音声データベースCENSREC-3と共通評価環境の構築
- Acoustic Feature Transformation Combining Average and Maximum Classification Error Minimization Criteria
- Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition
- Photovoltaic Effect in Schottky Junction of Poly(3-alkylthiophene)/Al with Various Alkyl Chain Lengths and Regioregularities
- Photocarrier Transport in Regioregular Poly (3-octadecylthiophene) : Optical Propertles of Condensed Matter
- Alkyl Chain Length Dependence of Field-Effect Mobilities in Regioregular Poly(3-Alkylthiophene)Films
- Dependencies of Field Effect Mobility on Regioregularity and Side Chain Length in Poly (Alkylthiophene) Films (Special Issue on Organic Molecular Electronics for the 21st Century)
- Regioregularity vs Regiorandomness : Effect on Photocarrier Transport in Poly(3-hexylthiophene)
- CENSREC-1-C : An evaluation framework for voice activity detection under noisy environments
- Driver Identification Using Driving Behavior Signals(Human-computer Interaction)
- Noise and Channel Distortion Robust ASR System for DARPA SPINE2 Task (Special Issue on Speech Information Processing)
- A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech (Special Issue on Speech Information Processing)
- Breast Tumor Classification by Neural Networks Fed with Sequential-Dependence Factors to the Input Layer
- AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition(Speech Corpora and Related Topics, Corpus-Based Speech Technologies)
- Missing Feature Theory Applied to Robust Speech Recognition over IP Network(Speech Dynamics by Ear, Eye, Mouth and Machine)
- Comparison of a 10 V Josephson Junction Array System and a Conventional 10 V Measuring System by Measuring Zener Reference Standard
- 1-V Josephson-Junction-Array Voltage Standard and Development of 10-V Josephson Junction Array at ETL
- Use of the Josephoson Junction Array Voltage Standard in Industry
- CENSREC-3: An Evaluation Framework for Japanese Speech Recognition in Real Car-Driving Environments(Speech and Hearing)
- A Design for a Collaborative Steering System of Microphone Array and Video Camera Toward Multi-Lingual Tele-Conference (特集 インタラクション技術の革新と実用化)
- A design of adaptive beamformer based on average speech spectrum for noisy speech recognition
- A Microphone Array-Based 3-D N-Best Search Method for Recognizing Multiple Sound Sources
- 3D N-best 探索法に基づく複数音源の位置推定と音声認識の統合
- 複数話者の音声認識における音源方向経路間距離を用いた3-D N-best探索法の評価
- The present status, progress, and usage of speech databases in Japan
- IMPROVING ACCURACY IN PARAMETER ESTIMATION IN AN EXTENDED KALMAN PARTICLE FILTERS FOR NOISY SPEECH RECOGNITION
- ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles(Speech Recognition, Statistical Modeling for Speech Processing)
- Evaluation of HRTFs estimated using physical features
- Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones(Feature Extraction and Acoustic Medelings, Corpus-Based Speech Technologies)
- Evaluation of Combinational Use of Discriminant Analysis-Based Acoustic Feature Transformation and Discriminative Training
- Linear Discriminant Analysis Using a Generalized Mean of Class Covariances and Its Application to Speech Recognition
- Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN
- Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition
- Gamma Modeling of Speech Power and Its On-Line Estimation for Statistical Speech Enhancement(Speech Enhancement, Statistical Modeling for Speech Processing)
- Noisy Speech Recognition Based on Integration/Selection of Multiple Noise Suppression Methods Using Noise GMMs
- Search computing based on Google API for QA system (自然言語処理)
- Search computing based on Google API for QA system (言語理解とコミュニケーション)
- Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation(Life-like Agent and its Communication)
- Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation
- SNR and sub-band SNR estimation based on Gaussian mixture modeling in the log power domain with application for speech enhancements (第6回音声言語シンポジウム)
- SNR and sub-band SNR estimation based on Gaussian mixture modeling in the log power domain with application for speech enhancements (第6回音声言語シンポジウム)
- SNR and sub-band SNR estimation based on Gaussian mixture modeling in the log power domain with application for speech enhancements (第6回音声言語シンポジウム)
- Multi-Lingual Multi-Function Multi-Media Intelligent System
- Acoustic Feature Transformation Combining Average and Maximum Classification Error Minimization Criteria
- Driver's irritation detection using speech recognition results (音声・第10回音声言語シンポジウム)
- Driver's irritation detection using speech recognition results (音声言語情報処理)
- Driver's irritation detection using speech recognition results (言語理解とコミュニケーション・第10回音声言語シンポジウム)
- Nonparametric Speaker Recognition Method Using Earth Mover's Distance(Speaker Recognition, Statistical Modeling for Speech Processing)
- Speaker Recognition using a Non-parametric Speaker Model Representation and Earth Mover's Distance
- Speaker Recognition using a Non-parametric Speaker Model Representation and Earth Mover's Distance
- Speaker Recognition using a Non-parametric Speaker Model Representation and Earth Mover's Distance
- Predicting the Degradation of Speech Recognition Performance from Sub-band Dynamic Ranges (特集 音声言語情報処理とその応用)
- A model of perceptual distance for group delays based on ellipsoidal mapping
- The effect of group delay spectrum on timbre
- Direction of Arrival Estimation Using Nonlinear Microphone Array
- Speech Enhancement Using Nonlinear Microphone Array Based on Noise Adaptive Complementary Beamforming
- Speech Enhancement Using Nonlinear Microphone Array Based on Complementary Beamforming (Special Section on Digital Signal Processing)
- Noise Robust Speech Recognition Using Subband-Crosscorrelation Analysis
- An Acoustically Oriented Vocal-Tract Model
- Estimation of speaker and listener positions in a car using binaural signals
- Sound localization under conditions of covered ears on the horizontal plane
- Single-Channel Multiple Regression for In-Car Speech Enhancement
- Adaptive Nonlinear Regression Using Multiple Distributed Microphones for In-Car Speech Recognition(Speech Enhancement, Multi-channel Acoustic Signal Processing)
- Speech Recognition Using Finger Tapping Timings(Speech and Hearing)
- CIAIR In-Car Speech Corpus : Influence of Driving Status(Corpus-Based Speech Technologies)
- Construction and Evaluation of a Large In-Car Speech Corpus(Speech Corpora and Related Topics, Corpus-Based Speech Technologies)
- Translation of Japanese Noun Compounds at Super-Function Based MT System
- Passive hybrid subtractive beamformer for near-field sound sources
- Comparative Assessment of Test Signals Used for Measuring Residual Echo Characteristics
- An Acoustic Modeling Method Robustagainst Changes of Speaking Stylein Error Recovery
- A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency(Speech Recognition, Statistical Modeling for Speech Processing)
- Improving Acoustic Model Precision by Incorporating a Wide Phonetic Context Based on a Bayesian Framework(Speech Recognition, Statistical Modeling for Speech Processing)
- A Hybrid HMM/BN Acoustic Model for Automatic Speech Recognition (Special Issue on Speech Information Processing)
- A Model of Mental State Transition Network
- A New Question Answering System for Chinese Restricted Domain(Language,Human Communication II)
- Effects of Phoneme Type and Frequency on Distributed Speaker Identification and Verification(Speech and Hearing)
- Response Timing Detection Using Prosodic and Linguistic Information for Human-friendly Spoken Dialog Systems
- MIXTURE OF FACTOR ANALYZED HMM
- Iterative Estimation and Compensation of Signal Direction for Moving Sound Source by Mobile Microphone Array(Engineering Acoustics)
- TIME-VARYING NOISE COMPENSATION BY SEQUENTIAL MONTE CARLO METHOD
- Method for determining sound localization by auditory masking
- Objective Quality Assessment of Wideband Speech Coding(Network)
- Improving Parsing of 'BA' Sentences for Machine Translation
- Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition
- CENSREC-4: An evaluation framework for distant-talking speech recognition in reverberant environments
- A Graph-Based Spoken Dialog Strategy Utilizing Multiple Understanding Hypotheses
- Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition