音源分離との統合によるミッシングフィーチャマスク自動生成に基づく同時発話音声認識
スポンサーリンク
概要
- 論文の詳細を見る
Our goal is to realize a humanoid robot that has the capabilities of recognizing simultaneous speech. A humanoid robot under real-world environments usually hears a mixture of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. In particular, an interface between sound source separation and speech recognition is important. In this paper, we designed an interface between sound source separation and speech recogniton by applying Missing Feature Theory (MFT) . In this method, spectral sub-bands distorted by sound source separation are detected from input speech as missing features. The detected missing features are masked on recognition not to affect the system badly. Therefore, this method is more flexible when noises change dynamically and drastically. It is the most important issue how distorted spectral sub-bands are detected. To solve the issue, we used speech feature apropriate for MFT-based ASR, and developed automatic missing feature mask generation. As a speech feature, we used a Mel-Scale Log Spectral (MSLS) feature instead of Mel-Frequency Cepstrum Coefficient (MFCC) which is commonly used for ASR. We presented a method of generating missing feature mask automatically by using information from sound source separation. To evaluate our method, we implemented it in a humanoid robot<I>SIG2</I>, and performed the experiments on recognition of three simultaneous isolated words. As a result, our method outperformed conventional ASR with MSLS feature.
論文 | ランダム
- 人材重視型CSRと企業価値(5)コーポレート・ガバナンスと女性活用
- CSRの課題(4)なぜ、普通の社員に人助けができたのか--日本スピンドル製造株式会社 齊藤十内(さいとうじゅうない)社長に聞く
- 働く女性支援度によるスクリーニングの有効性--従来型株式投資信託との比較 (特集 女性活用のパフォーマンス) -- (女性活用とSRI)
- CSRの課題(3)工業・工科高校生を対象としたインターンシップ(デュアルシステム)というCSR--大阪府立工業高校・工科高校と松下電工の取り組み
- CSRの課題(2)企業はSRI(社会的責任投資)という視点からも評価されている--麗澤大学発の社会責任投資基準R-BEC001をもとにして