Automatic F-term Classification of Japanese Patent Documents Using the k-Nearest Neighborhood Method and the SMART Weighting
スポンサーリンク
概要
- 論文の詳細を見る
Patent processing is important in various fields such as industry, business, and law. We used F-terms (Schellner 2002) to classify patent documents using the k-nearest neighborhood method. Because the F-term categories are fine-grained, they are useful when we classify patent documents. We clarified the following three points using experiments: i) which variations of the k-nearest neighborhood method are the best for patent classification, ii) which methods of calculating similarity are the best for patent classification, and iii) from which regions of a patent terms should be extracted. In our experiments, we used the patent data used in the F-term categorization task in the NTCIR-5 Patent Workshop (NTCIR committee 2005; Iwayama, Fujii, and Kando 2005). We found that the method of adding the scores of k extracted documents to classify patent documents was the most effective among the variations of the k-nearest neighborhood method used in this study. We also found that SMART (Singhal, Buckley, and Mitra 1996; Singhal, Choi, Hindle, and Pereira 1997), which is known to be effective in information retrieval, was the most effective method of calculating similarity. Finally, when extracting terms, we found that using the abstract and claim regions together was the best method among all the combinations of using abstract, claim, and description regions. The results were confirmed using a statistical test. Moreover, we experimented with changing the amount of training data and found that we obtained better performance when we used more data, which was limited to that provided in the NTCIR-5 Patent Workshop.
- Information and Media Technologies 編集運営会議の論文
著者
-
Isahara Hitoshi
National Inst. Information And Communications Technol. Kyoto Jpn
-
Murata Masaki
National Inst. Information And Communications Technol.
-
Kanamaru Toshiyuki
National Institute of Information and Communications Technology
-
Shirado Tamotsu
National Institute of Information and Communications Technology
関連論文
- An Alignment Model for Extracting English-Korean Translations of Term Constituents(Natural Language Processing)
- Extracting Protein-Protein Interaction Information from Biomedical Text with SVM(Natural Language Processing)
- Use of Multiple Documents as Evidence with Decreased Adding in a Japanese Question-answering System
- Automatic F-term Classification of Japanese Patent Documents Using the k-Nearest Neighborhood Method and the SMART Weighting
- Statistical-Based Approach to Non-segmented Language Processing(Knowledge, Information and Creativity Support System)
- Improving Search Performance : A Lesson Learned from Evaluating Search Engines Using Thai Queries(Knowledge, Information and Creativity Support System)
- Related Word Lists Effective in Creativity Support(Knowledge, Information and Creativity Support System)
- Toolbar to Highlight Important Expressions in Scientific Articles on Atomic and Molecular Physics
- Acquisition of Data for Plasma Simulation by Automated Extraction of Terminology from Article Abstracts
- Automatic F-term Classification of Japanese Patent Documents Using the k-Nearest Neighborhood Method and the SMART Weighting
- Use of Multiple Documents as Evidence with Decreased Adding in a Japanese Question-answering System