Novel Tonal Feature and Statistical User Modeling for Query-by-Humming
スポンサーリンク
概要
- 論文の詳細を見る
This paper describes a query-by-humming (QbH) music information retrieval (MIR) system based on a novel tonal feature and statistical modeling. Most QbH-MIR systems use a pitch extraction method in order to obtain tonal features of an input humming. In these systems, pitch extraction errors inevitably occur and degrade the performance of the system. In the proposed system, a cross-correlation function between two logarithmic frequency spectra is calculated as a tonal feature instead of a difference of two successive pitch frequencies, and probabilistic models are prepared for all tone intervals existing in the database. The similarity scores between an input humming and musical pieces in a database are calculated using the probabilistic models. The advantages of this system are that it can obtain more appropriate tonal features than the pitch-based method, and it is also robust against inaccurate humming by the user thanks to its statistical approach. From experimental results, the top-1 retrieval accuracy given by the proposed method was 86.8%, which was more than 10 points higher than the conventional single pitch method. Moreover, several integration methods were applied to the proposed method with several conditions. The majority decision method showed the highest accuracy, and 5% reduction of retrieval error was obtained.
- 一般社団法人 情報処理学会の論文
著者
-
Makino Shozo
Graduate School Of Engineering Tohoku University
-
Ito Akinori
Graduate School Of Engineering Tohoku University
-
Suzuki Motoyuki
Institute Of Industrial Science University Of Tokyo
-
Ichikawa Takuto
Graduate School of Engineering, Tohoku University
関連論文
- SIG-SLP/SIG-NL合同セッションここまでできるぞ音声/言語処理技術 : 音声編
- ここまでできるぞ音声/言語処理技術 : 音声編
- 連続音声認識コンソーシアム2002年度版ソフトウエアの概要
- 連続音声認識コンソーシアム2001年度版ソフトウエアの概要
- 日本語ディクテーション基本ソフトウェア(99年度版)
- 2000-NL-137-7 / 2000-SLP-31-2 日本語ディクテーション基本ソフトウェア(99年度版)の性能評価
- 2000-NL-137-7 / 2000-SLP-31-2 日本語ディクテーション基本ソフトウェア(99年度版)の性能評価
- 日本語ディクテーション基本ソフトウェア : 97年度版
- 日本語ディクテーション基本ソフトウェア(97年度版)
- 日本語ディクテーション基本ソフトウェア(97年度版)の性能評価
- 連続音声認識コンソーシアム2000年度版ソフトウエアの概要と評価
- 新博士によるパネルディスカッションIII 「私のための研究・価値を生み出す研究」
- 大語彙日本語連続音声認識研究基盤の整備 : 汎用音素モデルの作成
- 大語彙日本語連続音声認識研究基盤の整備 : 学習・評価テキストコーパスの作成
- 大語彙日本語連続音声認識研究基盤の整備 : 評価用連続音声認識プログラムの開発
- 「人はなぜコンピューターを人間として扱うか『メディアの等式』の心理学」, バイロン・リーブズ, クリフォード・ナス著, 細馬宏通訳, 翔泳社, 2001年(私のすすめるこの一冊,コーヒーブレーク)
- 日本語ディクテーション基本ソフトウェア(97年度版)の性能評価
- Stability Analysis of Continuous Culture in Diauxic Growth
- Development of a toxicity evaluation system for gaseous compounds using air-liquid interface culture of a human bronchial epithelial cell line, Calu-3
- Development of a Simple Double-layered Cell Culture System Using Caco-2 and TIG-1 Cells as a New Cytotoxicity Test
- A New Assay for Evaluating Hepatotoxicity and Cytotoxicity Using LDL-Uptake Activity of Liver Cells
- Rapid and Sensitive Neurotoxicity Test Based on the Morphological Changes of PC12 Cells with Simple Computer-Assisted Image Analysis
- Improved Reference Speaker Weighting Using Aspect Model
- Bit rate reduction of mixed excitation linear prediction coder by Lempel-Ziv segment quantization
- Selection of Optimum Vocabulary and Dialog Strategy for Noise-Robust Spoken Dialog Systems
- Pronunciation error detection for computer-assisted language learning system based on error rule clustering using a decision tree
- An Evaluation Method of Japanese Pronunciation for Korean Native Speakers
- I-069 Smile and Laugh Recognition from Natural Conversation Video
- A New HMnet Construction Algorithm Requiring No Contextual Factors
- Information Hiding for G.711 Speech Based on Substitution of Least Significant Bits and Estimation of Tolerable Distortion
- Source-filter separation for nonstationary voiced speech based on sinusoidal representation
- Fast optimization of language model weight and insertion penalty from n-best candidates
- Tissue-engineered skin using aggregates of normal human skin fibroblasts and biodegradable material
- Speech Recognition under Multiple Noise Environment Based on Multi-Mixture HMM and Weight Optimization by the Aspect Model
- The Performance Prediction on Sentence Recognition Using a Finite State Word Automaton
- Novel Tonal Feature and Statistical User Modeling for Query-by-Humming
- A grammatical error detection method for dialogue-based CALL system
- 5 What Can be Done for Cardiovascular Medicine Using Robotics and Information Technologies?(Robotics and Information Technologies (IT) in the Field of Cardiovascular Medicine,Plenary Session 6 (PL6) (H),The 70th Anniversary Annual Scientific Meeting of th
- Effect of Boundary Layer Thickness on the Photoluminescence Spectra of GaAs Grown by MOCVD
- ESTIMATION OF ADSORPTION PARAMETERS OF A BINARY SYSTEM BY APPLYING THE LEWIS RULE
- Numerical Analysis of Group-V Element Transport and Incorporation at a Growing Surface in MOCVD Reactor
- Automatic Determination Algorithm for Optimum Number of States in Discrete-Type HMnet
- Robust Transmission of Audio Signals over the Internet: An Advanced Packet Loss Concealment for MP3-Based Audio Signals
- Kinetics of biological phosphorus behavior in sequential batch reactor under anaerobic/aerobic condition.
- Novel Tonal Feature and Statistical User Modeling for Query-by-Humming
- Novel Tonal Feature and Statistical User Modeling for Query-by-Humming
- 音声言語情報処理研究会の20年-歴代主査による研究レビュー-
- MOMENT ANALYSIS OF CONCENTRATION DECAY IN A BATCH ADSORPTION VESSEL
- Foreword to the special issue on ``the speech communication and its related technologies''
- CHROMATOGRAPHSC STUDY OF DIFFUSION IN MOLECULAR-SIEVING CARBON
- Liquid-to-particle mass transfer in a stirred batch adsorption tank with non-linear isotherm.
- Simulation of nonisothermal pressure swing adsorption.
- Recovery of carbon dioxide from stack gas by piston-driven ultra-rapid PSA.
- Correlation of adsorption equilibrium data of volatile chlorinated hydrocarbons from aqueous solutions to activated carbon fibers by the Dubinin-Astakhov equation.
- Robust Transmission of Audio Signals over the Internet: An Advanced Packet Loss Concealment for MP3-Based Audio Signals ( Fundamental Aspects and Recent Developments in Multimedia and VLSI Systems)