Unsupervised Speaker Normalization by Speaker Markov Model Converter for Speaker-Independent Speech Recognition Systems
スポンサーリンク
概要
- 論文の詳細を見る
As seen from the speech data of different speakers, their spectral patterns reflect inter-speaker variations which arise from differences in gender, age, health condition or emotional state.These variabilities contribute significantly to the error rate of recognition systems which are usually trained by one single speaker (the reference speaker, Rs) initially. These systems show great inaccuracies in recognizing a user (the new speaker, NS) whose features are different from the RS.In speaker-independent recognition systems, speaker variabilities are often treated by having a multiple RS training. Such HS aim to cover a wide range of speaker individualities. However, there always are inevitable distortions. Speaker adaptation is used to eliminate or minimize inter-speaker distortions. It is a mapping from RS tion) or vice versa (normalization).There are three main issues to address in speaker adaptation,(pt.1)maximizing recognition rate;(pt.2)minimizing training and computing time; and (pt.3)unsupervised training is preferred over supervised one. At present, there is no single approach that can achieve all three goals simultaneously. Our goal is to develop an optimal adaptation method which makes compromise between these three points without sacrificing any of them. Our approach, the Speaker Markov Model Converter (SMMC),is a spectrum to code mapping where input NS spectral is converted into RS-similar code sequences to go directly through the recognition system.i.e. Given NS data, our system predicts how the RS would have make the same utterance in the place of NS. The conversion and recognition processes are done in parallel by feedback process. No special training data is required for NS.
- 一般社団法人情報処理学会の論文
- 1991-02-25
著者
-
Fung Pascale
Hong Kong University Of Science And Technology Weniwen Technologies Inc.
-
Kawahara Tatsuya
Department Of Informationscience Kyoto University
-
Kawahara Tatsuya
Department Of Information Science Kyoto University
-
DOSHITA SHUJI
Department of Information Science, Kyoto University
-
Doshita S
Ryukoku Univ. Otsu‐shi Jpn
-
Doshita Shuji
Department Of Electronics And Informatics Faculty Of Science And Technology Ryukoku University
-
FUNG Pascale
Department of Information Science, Kyoto University
関連論文
- 柔軟な言語モデルとマッチングを用いた音声によるレストラン検索システム
- 柔軟な言語モデルとマッチングを用いた音声によるレストラン検索システム
- 柔軟な言語モデルとマッチングを用いた音声によるレストラン検索システム
- Modeling and automatic detection of English sentence stress for computer-assisted English prosody learning system
- Formant structure estimation using vocal tract length normalization for CALL systems
- Cooperative Spoken Dialogue Model Using Bayesian Network and Event Hierarchy
- Integrated Natural Language Analysis with an Integrated Parsing Engine IPE
- Hands-free speech recognition in real environments using microphone array and 2-levels MLLR adaptation as a front-end system for conversational TV
- Unsupervised Speaker Normalization by Speaker Markov Model Converter for Speaker-Independent Speech Recognition Systems
- Proposal of a Negotiation Protocol for Multi-user Scheduling System
- Topic Identification and Prediction Based on the Domain Plan and the Discourse Structure
- Evaluating Dialogue Strategies under Communication Errors Using Computer-to-Computer Simulation
- Comparison of discrete and continuous classifier-based HMM