Examination of Criterion for Choosing a Run Time Method in GN Hash Join Algorithm
スポンサーリンク
概要
- 論文の詳細を見る
The join operation is one of the most expensive operations in relational database systems. So far many researchers have proposed several hash-based algorithms for the join operation. In a hash-based algorithm, a large relation is first partitioned into several clusters. When clusters overflow, that is, when the size of the cluster exceeds the size of main memory, the performance of hash-based algorithms degrade substantially. Previously we proposed the GN hash algorithm which is robust in the presence of overflown clusters. The GN hash join algorithm combines the Grace hash join and hash-based nested-loop join algorithms. We analyze the performance of the GN hash join algorithm when applied to relations with a non-uniform Zipf-like data distribution. The performance is compared with other hash-based join algorithms: Grace, Hybrid, nested-loop, and simple hash join. The GN hash join algorithm is found to have higher performance on non-uniformly distributed relations. In this paper, the robustness of the GN hash algorithm from the point of choosing a run time method is verified. In the GN hash algorithm, the criterion for selecting a run time method from the two algorithm is determined by using the value calculated from the I/O cost formula of the two algorithms. This criterion cannot be guaranteed to be optimal under every data distribution, that is, the optimal criterion may change depending on the data distribution. When the data distribution is unknown, all data has to be repartitioned in order to get an accurate optimal criterion. However, from the view of choosing a method at run time, it is necessary for the GN hash algorithm to determine an appropriate criterion regardless of the data distribution. Thus, we inspect the criterion adopted in our algorithm under a simulation environment. From simulation results, we find that the range of the criterion is very wide under any data distribution and assure that the criterion determined with the assumption of a uniform data distribution can be used even when the data is highly skewed. Consequently, we can conclude that the GN hash algorithm which dynamically selects the nested-loop and Grace hash algorithms provides good performance in the presence of data skew and its performance is not sensitive to the criterion.
- 社団法人電子情報通信学会の論文
- 1996-11-25
著者
-
Nakano Miyuki
Institute Of Industrial Science The University Of Tokyo
-
Kitsuregawa Masaru
Institute Of Industrial Science The University Of Tokyo
関連論文
- Display Wall Empowered Visual Mining for CEOP Data Archive(Coordinated Enhanced Observing Period(CEOP))
- Data Analysis System Attached to the CEOP Centralized Data Archive System(Coordinated Enhanced Observing Period(CEOP))
- QUASUR : Web-based Quality Assurance System for CEOP Reference Data(Coordinated Enhanced Observing Period(CEOP))
- Initial CEOP-based Review of the Prediction Skill of Operational General Circulation Models and Land Surface Models(Coordinated Enhanced Observing Period(CEOP))
- Overview of the Super Database Computer (SDC-I) (Special Issue on Super Chip for Intelligent Integrated Systems)
- Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework(Data Mining)
- Finding Neighbor Communities in the Web Using an Inter-Site Graph(Database)
- Speculative Transaction Processing Approach for Database Systems
- An Economic Dynamic Replication Model for Mobile-P2P networks (夏のデータベースワークショップDBWS 2006)
- An Economic Dynamic Replication Model for Mobile-P2P networks
- Performance Evaluation of Flash SSDs in a Transaction Processing System
- Rank Optimization of Personalized Search
- High Performanee Parallel Query Processing on a 100 Node ATM Connected PC Cluster (Special Issue on New Generation Database Technologies)
- Web Community Chart : A Tool for Navigating the Web and Observing Its Evolution
- Detecting Hijacked Sites by Web Spammer Using Link-Based Algorithms
- A Study of Link Farm Evolution Using a Time-series of Web Snapshots
- A Study of Link Farm Evolution Using a Time-series of Web Snapshots
- Efficient Analyzing General Dominant Relationship Based on Partial Order Models
- Examination of Criterion for Choosing a Run Time Method in GN Hash Join Algorithm
- Finding Web Communities by Maximum Flow Algorithm Using Well-Assigned Edge Capacities(Information Processing Technology for Web Utilization)
- D-3 An Link-Contents Coupled Clustering for Web Search Results
- Speculative Transaction Processing in Distributed Database Systems
- Foreword to the Special Issue on Japanese Microprocessors
- Virtual Striping: A Storage Management Scheme with Dynamic Striping (Special Issue on Architectures, Algorithms and Networks for Massively parallel Computing)
- A Study on Characteristics of Topic-Specific Information Cascade in Twitter (データ工学)
- A Study on Efficient Searching Top-k Semantic Similar Sentences (データ工学)
- Efficient Classification with Conjunctive Features
- A Study on Characteristics of Topic-Specific Information Cascade in Twitter
- A Study on Efficient Searching Top-k Semantic Similar Sentences
- A Study on Graph Similarity Search
- Semi-supervised Sentiment Classification in Resource-Scarce Language : A Comparative Study
- A Study on Graph Similarity Search
- Exploration on Efficient Similar Sentences Extraction
- A Study on Similar Words Searching (データ工学)
- Semi-supervised Sentiment Classification in Resource-Scarce Language : A Comparative Study
- A Study on Graph Similarity Search
- Collective Sentiment Classification Based on User Leniency and Product Popularity
- A Study on Similar Words Searching