A Combination Method of the Tanimoto Coefficient and Proximity Measure of Random Forest for Compound Activity Prediction
スポンサーリンク
概要
- 論文の詳細を見る
Chemical and biological activities of compounds provide valuable information for discovering new drugs. The compound fingerpring that is represented by structural information of the activities is used for candidates for investigating similarity. However, there are several problems with predicting accuracy from the requirement in the compound structural similarity. Although the amount of compound data is growing rapidly, the number of well-annotated compounds, e.g., those in the MDL Drug Data Report (MDDR) database, has not increased quickly. Since the compounds that are known to have some activities of a biological class of the target are rare in the drug discovery process, the accuracy of the prediction should be increased as the activity decreases or the false positive rate should be maintained in databases that have a large number of un-annotated compounds and a small number of annotated compounds of the biologiccal activity. In this paper, we propose a new similarity scoring method composed of a combination of the Tanimoto coefficient and the proximity measure of random forest. The score contains two propertied that are derived from unsupervised and supervised methods of partial dependence for compounds. Thus, the proposed method is expected to indicate compoinds that have accurate activities. By evalyating the performance of the prediction compared with the two scores of the Tanimoto coefficient and the proximitiy measure, we demonstrate that the prediction result of the proposed scoring method is better than those of the two method by using the Linear Discriminant Analysis (LDA) method. We estimate the predicition accuracy of compound datasets extracted from MDDR using the proposed method. It is also shown that the proposed method can identify active compounds in datasets including several un-annotated compounds.
- 一般社団法人情報処理学会の論文
- 2008-03-15
著者
-
Matsuda Hideo
Graduate School Of Information Sci. And Technol. Osaka Univ.
-
Matsuda Hideo
Graduate School Of Engineering Science Osaka University
-
SENO SHIGETO
Graduate School of Information Schience and Technology, Osaka University
-
TAKENAKA YOICHI
Graduate School of Information Schience and Technology, Osaka University
-
KAWAMURA GEN
Graduate School of Imformation Science and Technology, Osaka University
-
Seno Shigeto
Graduate School Of Information Sci. And Technol. Osaka Univ.
-
Shigeto Seno
Graduate School Of Information Science And Technology Osaka University
-
Kawamura Gen
Graduate School Of Imformation Science And Technology Osaka University
-
Takenaka Yoichi
Graduate School Of Information Sci. And Technol. Osaka Univ.
関連論文
- GXML : A Novel Method for Exchanging and Querying Complete Geneomes by Representing them as Structured Documents
- Introduction of Aggregate Functions to a Language for Querying Structured Genome Documents (夏のデータベースワークショップ1999(DBWS'99)沖縄--1999年7の月,天から大量データが降ってくる!) -- (4C:半構造データ検索(2))
- Introduction of Aggregate Functions to a Language for Querying Structured Genome Documents
- Querying Molecular Biology Databases by Integration Using Multiagents (Special Issue on New Generation Database Technologies)
- Implementation of a Parallel Prolog System on a Distributed Memory Parallel Computer (Special Issue on Parallel and Distributed Supercomputing)
- Conformational Search and Analysis of β-hairpin Formation by High-Speed Exhaustive Tree Search
- Retrieving Functionally Similar Bioinformatics Workflows Using TF-IDF Filtering
- Retrieving Functionally Similar Bioinformatics Workflows Using TF-IDF Filtering
- A Distributed-Proccessing System for Accelerating Biological Research Using Data-Staging
- A Combination Method of the Tanimoto Coefficient and Proximity Measure of Random Forest for Compound Activity Prediction
- Improved Prediction Method for Protein Interactions Using Both Structural and Functional Characteristics of Proteins
- A Method for Isoform Prediction from RNA-Seq Data by Iterative Mapping
- A Method for Isoform Prediction from RNA-Seq Data by Iterative Mapping
- Improved Prediction Method for Protein Interactions Using Both Structural and Functional Characteristics of Proteins
- A Distributed-Processing System for Accelerating Biological Research Using Data-Staging
- A Combination Method of the Tanimoto Coefficient and Proximity Measure of Random Forest for Compound Activity Prediction