A Method to Extract Sentences with Protein Functional Information from Literature by Iterative Learning of the Corpus
スポンサーリンク
概要
- 論文の詳細を見る
We are developing PROFESS, a system to assist with the extraction of protein functional site information from the literature related to protein structural analysis. In this system, the sentences with functional information are first extracted. This paper proposes the complementary use of the protein structure data, keywords and patterns to extract the target sentences. In the proposed method, the sentences in the literature are expressed in vector using these three features, which are learnt by the SVM. As the accuracy of the SVM depends on the number of effective vector elements, we propose a method to automatically extract patterns to add as new vector elements and obtain a higher value in accuracy. There is a problem of matching of the patterns to the sentences when any proper noun tag is expressed adjacent to residue tag. We defined two rules to eliminate these unnecessary tags so that the patterns can match to the sentences. The proposed method was applied to five documents related to structural analysis of protein for extracting sentences with protein functional information, where eight literatures were used for the feedback for each of the experiment literatures. The average recall value and F value were 0.96 and 0.69, respectively. It was confirmed that the increase of the number of the vector elements lead to a higher performance in the sentence extraction.
- 一般社団法人情報処理学会の論文
- 2006-11-15
著者
-
OHKAWA Takenao
Kobe University
-
Ohkawa Takenao
Graduate School Of Engineering Kobe University
-
MUNNA AHADUZZAMAN
Osaka University
-
Ohkawa Takenao
Graduate School Of System Informatics Kobe Univ.
-
Munna Ahaduzzaman
Osaka University:(presently)fujitsu Limited
-
Ahaduzzaman Munna
Osaka University:(Presently)Fujitsu Limited
関連論文
- Entity Network Prediction Using Multitype Topic Models
- A Method to Extract Sentences with Protein Functional Information from Literature by Iterative Learning of the Corpus
- Erratum: Entity Network Prediction Using Multitype Topic Models [IEICE Transactions on Information and Systems E91.D (2008) , No. 11 pp.2589-2598]
- Reaction Structure Profile : A Comparative Analysis of Metabolic Pathways Based on Important Substructures
- A Method to Extract Sentences with Protein Functional Information from Literature by Iterative Learning of the Corpus
- Reaction Structure Profile: A Comparative Analysis of Metabolic Pathways Based on Important Substructures
- Selection of Effective Sentences from a Corpus to Improve the Accuracy of Identification of Protein Names
- A Comparative Analysis of Metabolic Pathways Based on Metabolic Steady States