Querying Molecular Biology Databases by Integration Using Multiagents (Special Issue on New Generation Database Technologies)
スポンサーリンク
概要
- 論文の詳細を見る
In this paper, we propose a method for querying heterogeneous molecular biology databases. Since molecular biology data are distributed into multiple databases that represent different biological domains, it is highly desirable to integrate data together with the correlations between the domains. However, since the total amount of such databases is very large and the data contained are frequently updated, it is difficult to maintain the integration of the entire contents of the databases. Thus, we propose a method for dynamic integration based on user demand, which is expressed with an OQL-based query language. By restricting search space according to user demand, the cost of integration can be reduced considerably. Multiple databases also exhibit much heterogeneity, such as semantic mismatching between the database schemas. For example, many databases employ their own independent terminology. For this reason, it is usually required that the task for integrating data based on a user demand should be carried out transitively; first search each database for data that satisfy the demand, then repeatedly retrieve other data that match the previously found data across every database. To cope with this issue, we introduce two types of agents; a database agent and a user agent, which reside at each database and at a user, respectively. The integration task is performed by the agents; user agents generate demands for retrieving data based on the previous search results by database agents, and database agents search their databases for data that satisfy the demands received from the user agents. We have developed a prototype system on a network of workstations. The system integrates four databases; GenBank (a DNA nucleotide database), SWISS-PHOT, PIR (protein amino-acid sequence databases), and PDB (a protein three-dimensional structure database). Although the sizes of GenBank and PDB are each over one billion bytes, the system achieved good performance in searching such very large heterogeneous databases.
- 社団法人電子情報通信学会の論文
- 1999-01-25
著者
-
MATSUDA HIDEO
Graduate School of Engineering Science, Osaka University
-
HASHIMOTO AKIHIRO
Graduate School of Engineering Science, Osaka University
-
NAKANISHI Michio
Education Center for Information Processing, Osaka University
-
IMAI Takashi
Graduate School of Engineering Science, Osaka University
-
Hashimoto Akihiro
Graduate School Of Engineering Science Osaka University
-
Nakanishi Michio
Education Center For Information Processing Osaka University
-
Matsuda Hideo
Graduate School Of Engineering Science Osaka University
-
Imai Takashi
Graduate School Of Engineering Science Osaka University:(present)ntt Data Corporation
関連論文
- GXML : A Novel Method for Exchanging and Querying Complete Geneomes by Representing them as Structured Documents
- Introduction of Aggregate Functions to a Language for Querying Structured Genome Documents (夏のデータベースワークショップ1999(DBWS'99)沖縄--1999年7の月,天から大量データが降ってくる!) -- (4C:半構造データ検索(2))
- Introduction of Aggregate Functions to a Language for Querying Structured Genome Documents
- Computational Complexity of Finding Meaningful Association Rules
- Querying Molecular Biology Databases by Integration Using Multiagents (Special Issue on New Generation Database Technologies)
- Implementation of a Parallel Prolog System on a Distributed Memory Parallel Computer (Special Issue on Parallel and Distributed Supercomputing)
- Conformational Search and Analysis of β-hairpin Formation by High-Speed Exhaustive Tree Search
- Retrieving Functionally Similar Bioinformatics Workflows Using TF-IDF Filtering
- Retrieving Functionally Similar Bioinformatics Workflows Using TF-IDF Filtering
- A Distributed-Proccessing System for Accelerating Biological Research Using Data-Staging
- A Combination Method of the Tanimoto Coefficient and Proximity Measure of Random Forest for Compound Activity Prediction
- Proposal of High Performance 1.55μm Quantum Dot Heterostructure Laser Using InN
- Improved Prediction Method for Protein Interactions Using Both Structural and Functional Characteristics of Proteins
- A Distributed-Processing System for Accelerating Biological Research Using Data-Staging
- A Combination Method of the Tanimoto Coefficient and Proximity Measure of Random Forest for Compound Activity Prediction