<論文>図書を NDC カテゴリに分類する試み

概要

論文の詳細を見る
In　information　retrie’val，　texts　are　usually　retrieved　by　them　with　queries．　ln　this　study，　anapproach　was　suggested　that　texts　are　automatically　classified　into　categories　and　retrieved　bymatching　them　with　queries　classified　in　the　same　way．　For　an　efficient　information　retrievalusing　automatic　classification，　extracting　methods　of　words　from　texts　and　matching　methodsare　essential．　Some　extracting　methods　from　Japanese　texts　have　been　suggested　in　naturallanguages　processing．　However，　it　is　difiicult　to　extract　significant　words　from　Japanese　textsbecause　Japanese　texts　are　written　without　blank　space　separating　words．　As　for　matchingmethods，　many　weighting　methods　have　been　suggested　as　well　as　vector　space　models　andprobabilistic　models．　　　This　article　reports　the　results　of　an　experiment　of　classifying　Japanese　texts　into　NipponDecimal　Classification　（NDC）　categories　based　on　the　title　information　in　Japanese　MARCrecords．　ln　this　experiment，　three　extracting　methods：　一一juman，　MHSA，　n－gram－are　tested　ona　set　of　1，000　books．　Four　weighting　methods：　一relative　term　frequency　between　categories，　tf・idf　and　tf　（max）・idf一一一一一are　tested．　The　results　indicate　that　the　extracting　method　using　jumanachieved　best　and　the　best　weighting　method　was　the　relative　term　frequency　between　categories，　being　able　to　select　correct　classification　categories　（upper　three　digits　of　NDC）　for　about55．99060　of　1，000　books．

<論文>図書を NDC カテゴリに分類する試み

スポンサーリンク

概要

論文 | ランダム

スポンサーリンク