Sounds of Speech Based Spoken Document Categorization : A Subword Representation Method(<Special Section>Speech Dynamics by Ear, Eye, Mouth and Machine)
スポンサーリンク
概要
- 論文の詳細を見る
In this paper, we explore a method to the problem of spoken document categorization, which is the task of automatically assigning spoken documents into a set of predetermined categories. To categorize spoken documents, subword unit representations are used as an alternative to word units generated by either keyword spotting or large vocabulary continuous speech recognition (LVCSR). An advantage of using subword acoustic unit representations to spoken document categorization is that it does not require prior knowledge about the contents of the spoken documents and addresses the out of vocabulary (OOV) problem. Moreover, this method works in reliance on the sounds of speech rather than exact orthography. The use of subword units instead of words allows approximate matching on inaccurate transcriptions, makes "sounds-like" spoken document categorization possible. We also explore the performance of our method when the training set contains both perfect and errorful phonetic transcriptions, and hope the classifiers can learn from the confusion characteristics of recognizer and pronunciation variants of words to improve the robustness of whole system. Our experiments based on both artificial and real corrupted data sets show that the proposed method is more effective and robust than the word based method.
- 社団法人電子情報通信学会の論文
- 2004-05-01
著者
-
Shirai Katsuhiko
School Of Science And Engineering Waseda University
-
QU Weidong
School of Science and Engineering, Waseda University
-
Qu Weidong
School Of Science And Engineering Waseda University
関連論文
- Development of a Lip-Sync Algorithm Based on an Audio-Visual Corpus
- An Efficient Lip-Reading Method Robust to Illumination Variations
- Phrase Recognition in Conversational Speech Using Prosodic and Phonemic Information (Special Issue on Speech and Discourse Processing in Dialogue Systems)
- Recognizing Reverberant Speech Based on Amplitude and Frequency Modulation
- Sounds of Speech Based Spoken Document Categorization : A Subword Representation Method(Speech Dynamics by Ear, Eye, Mouth and Machine)
- Extraction of Human Face and Transformable Region by Facial Expression Based on Extended Labeled Graph Matching