A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech (<Special Issue>Special Issue on Speech Information Processing)
スポンサーリンク
概要
- 論文の詳細を見る
This paper presents a study on modeling inter-word pauses to improve the robustness of acoustic models for recognizing noisy conversational speech. When precise contextual modeling is used for pauses, the frequent appearances and varying acoustics of pauses in noisy conversational speech make it a problem to automatically generate an accurate phonetic transcription of the training data for developing robust acoustic models. This paper presents a proposal to exploit the reliable phonetic heuristics of pauses in speech to aid the detection of varying pauses. Based on it, a stepwise approach to optimize pause HMMs was applied to the data of the DARPA SPINE2 project, and more correct phonetic transcription was achieved. The cross-word triphone HMMs developed using this method got an absolute 9.2% word error reduction when compared to the conventional method with only context free modeling of pauses. For the same pause modeling method, the use of the optimized phonetic segmentation brought about an absolute 5.2% improvements.
- 社団法人電子情報通信学会の論文
- 2003-03-01
著者
-
Matsui T
Atr Spoken Language Translation Communication Laboratories
-
Matsui T
Spoken Language Translation Research Laboratories Advanced Telecommunications Research Institute Int
-
MATSUI Tomoko
ATR Spoken Language Translation Laboratories
-
Zhang Jin-song
Atr Spoken Language Translation Communication Laboratories
-
MARKOV Konstantin
ATR Spoken Language Translation Research Labs.
-
NAKAMURA Satoshi
ATR Spoken Language Translation Research Labs.
-
Markov Konstantin
Atr Spoken Language Communication Research Laboratories
-
Matsui Tomoko
Atr Spoken Language Translation Communication Laboratories
-
Nakamura S
Laboratory Of Integrative Aquatic Biology Field Science Center Graduate School Of Agricultural Scien
-
Matsui T
Advanced Telecommunications Res. Inst. International Kyoto‐fu Jpn
-
Nakamura Satoshi
Atr Spoken Language Communication Res. Lab. Kyoto‐fu Jpn
-
Nakamura Satoshi
National Institute Of Information And Communications Technology
関連論文
- CENSREC-1-C : An evaluation framework for voice activity detection under noisy environments
- Verification of Multi-Class Recognition Decision : A Classification Approach(Spoken Language Systems, Corpus-Based Speech Technologies)
- Noise and Channel Distortion Robust ASR System for DARPA SPINE2 Task (Special Issue on Speech Information Processing)
- A Study on Acoustic Modeling of Pauses for Recognizing Noisy Conversational Speech (Special Issue on Speech Information Processing)
- The Cell Surface Glycoprotein of Haloarcula japonica TR-1
- Characterization of a Novel Human Tumor Necrosis Factor-α Mutant with Increased Cytotoxic Activity
- Quantitative analysis of pattern of gonial proliferation during sexual maturation in Japanese scallop Patinopecten yessoensis
- GnRH-PROMOTED SPERMATOGONIAL PROLIFERATION OF SCALLOP MEDIATES THROUGH STEROIDOGENESIS(Endocrinology,Abstracts of papers presented at the 76^ Annual Meeting of the Zoological Society of Japan)
- MOLECULAR CLONING OF A PUTATIVE SEROTONIN RECEPTOR EXPRESSED IN THE OVARY OF SCALLOP, PATINOPECTEN YESSOENSIS(Developmental Biology,Abstracts of papers presented at the 76^ Annual Meeting of the Zoological Society of Japan)
- REGULATION OF GONIAL MULTIPLICATION BY A GnRH-LIKE FACTOR IN THE CENTRAL NERVOUS SYSTEM OF THE PATINOPECTEN YESSOENSIS(Endocrinology)(Proceedings of the Seventy-Third Annual Meeting of the Zoological Society of Japan)
- AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition(Speech Corpora and Related Topics, Corpus-Based Speech Technologies)
- Missing Feature Theory Applied to Robust Speech Recognition over IP Network(Speech Dynamics by Ear, Eye, Mouth and Machine)
- Results of IPTP Character Recognition Competitions and Studies on Multi-expert System for Handprinted Numeral Recognition (Special Issue on Character Recognition and Document Understanding)
- Effects of Proteolytic Digestion on the Control Mechanism of Ciliary Orientation in Ciliated Sheets from Paramecium
- CENSREC-3: An Evaluation Framework for Japanese Speech Recognition in Real Car-Driving Environments(Speech and Hearing)
- A Design for a Collaborative Steering System of Microphone Array and Video Camera Toward Multi-Lingual Tele-Conference (特集 インタラクション技術の革新と実用化)
- A design of adaptive beamformer based on average speech spectrum for noisy speech recognition
- A Microphone Array-Based 3-D N-Best Search Method for Recognizing Multiple Sound Sources
- 3D N-best 探索法に基づく複数音源の位置推定と音声認識の統合
- 複数話者の音声認識における音源方向経路間距離を用いた3-D N-best探索法の評価
- The present status, progress, and usage of speech databases in Japan
- Thermophilic Alkaline Xylanase from Newly Isolated Alkaliphilic and Thermophilic Bacillus sp. Strain TAR-1
- Degradation of Human Hair by a Thermostable Alkaline Protease from Alkaliphilic Bacillus sp. No.AH-101
- Molecular Cloning, Nucleotide Sequence, and Expression of the Structural Gene for Alkaline Serine Protease from Alkaliphilic Bacillus sp.221
- IMPROVING ACCURACY IN PARAMETER ESTIMATION IN AN EXTENDED KALMAN PARTICLE FILTERS FOR NOISY SPEECH RECOGNITION
- ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles(Speech Recognition, Statistical Modeling for Speech Processing)
- Automatic Generation of Non-uniform HMM Topologies Based on the MDL Criterion(Speech and Hearing)
- Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation(Life-like Agent and its Communication)
- Charge-Independence-Breaking Interactions in sd-Shell Nuclei : Nuclear Physics
- Passive hybrid subtractive beamformer for near-field sound sources
- Learning, Generation and Recognition of Motions by Reference-Point-Dependent Probabilistic Models
- An Acoustic Modeling Method Robustagainst Changes of Speaking Stylein Error Recovery
- A Hybrid HMM/BN Acoustic Model Utilizing Pentaphone-Context Dependency(Speech Recognition, Statistical Modeling for Speech Processing)
- Improving Acoustic Model Precision by Incorporating a Wide Phonetic Context Based on a Bayesian Framework(Speech Recognition, Statistical Modeling for Speech Processing)
- A Hybrid HMM/BN Acoustic Model for Automatic Speech Recognition (Special Issue on Speech Information Processing)
- MIXTURE OF FACTOR ANALYZED HMM
- Iterative Estimation and Compensation of Signal Direction for Moving Sound Source by Mobile Microphone Array(Engineering Acoustics)
- TIME-VARYING NOISE COMPENSATION BY SEQUENTIAL MONTE CARLO METHOD
- Ambient Browser: Web Browser for Daily Use (日韓合同ワークショップ 1st Korea-Japan Joint Workshop on Ubiquitous Computing and Networking Systems (ubiCNS 2005))
- Burst Error Recovery for Huffman Coding(Algorithm Theory)
- Audio-Visual Speech Recognition Based on Optimized Product HMMs and GMM Based-MCE-GPD Stream Weight Estimation (Special Issue on Speech Information Processing)
- Situated Spoken Dialogue with Robots Using Active Learning