<論文>図書を NDC カテゴリに分類する試み
スポンサーリンク
概要
- 論文の詳細を見る
In information retrie’val, texts are usually retrieved by them with queries. ln this study, anapproach was suggested that texts are automatically classified into categories and retrieved bymatching them with queries classified in the same way. For an efficient information retrievalusing automatic classification, extracting methods of words from texts and matching methodsare essential. Some extracting methods from Japanese texts have been suggested in naturallanguages processing. However, it is difiicult to extract significant words from Japanese textsbecause Japanese texts are written without blank space separating words. As for matchingmethods, many weighting methods have been suggested as well as vector space models andprobabilistic models. This article reports the results of an experiment of classifying Japanese texts into NipponDecimal Classification (NDC) categories based on the title information in Japanese MARCrecords. ln this experiment, three extracting methods: 一一juman, MHSA, n-gram-are tested ona set of 1,000 books. Four weighting methods: 一relative term frequency between categories, tf・idf and tf (max)・idf一一一一一are tested. The results indicate that the extracting method using jumanachieved best and the best weighting method was the relative term frequency between categories, being able to select correct classification categories (upper three digits of NDC) for about55.99060 of 1,000 books.
論文 | ランダム
- 燃焼騒音の研究 : バーナ配列の影響
- 22a-G-7 AgBrにおけるホットポーラロンのサイクロトロン共鳴.III
- 広帯域周波数燃焼騒音の実験的研究
- 5a-M-3 臭化銀(AgBr)の正孔のサイクロトロン共鳴
- 10a-Q-2 TlBrの円偏波サイクロトロン共鳴