<論文>図書を NDC カテゴリに分類する試み
スポンサーリンク
概要
- 論文の詳細を見る
In information retrie’val, texts are usually retrieved by them with queries. ln this study, anapproach was suggested that texts are automatically classified into categories and retrieved bymatching them with queries classified in the same way. For an efficient information retrievalusing automatic classification, extracting methods of words from texts and matching methodsare essential. Some extracting methods from Japanese texts have been suggested in naturallanguages processing. However, it is difiicult to extract significant words from Japanese textsbecause Japanese texts are written without blank space separating words. As for matchingmethods, many weighting methods have been suggested as well as vector space models andprobabilistic models. This article reports the results of an experiment of classifying Japanese texts into NipponDecimal Classification (NDC) categories based on the title information in Japanese MARCrecords. ln this experiment, three extracting methods: 一一juman, MHSA, n-gram-are tested ona set of 1,000 books. Four weighting methods: 一relative term frequency between categories, tf・idf and tf (max)・idf一一一一一are tested. The results indicate that the extracting method using jumanachieved best and the best weighting method was the relative term frequency between categories, being able to select correct classification categories (upper three digits of NDC) for about55.99060 of 1,000 books.
論文 | ランダム
- 漸化法によるベッセル関数を含む無限級数の近似計算
- 等方性運動プラズマによるH偏波電磁波の過渡応答
- 空気と運動する等方性プラズマ境界面におけるE偏波電磁波の過渡的反射と透過
- 1p-E-16 Zn_2Yの磁気ひずみ
- 8B-6 自ら学ぶ児童を育てる算数科の指導法 : 一人一人の考え方を生かすための支援のあり方について