Examination of Criterion for Choosing a Run Time Method in GN Hash Join Algorithm

概要

論文の詳細を見る
The join operation is one of the most expensive operations in relational database systems. So far many researchers have proposed several hash-based algorithms for the join operation. In a hash-based algorithm, a large relation is first partitioned into several clusters. When clusters overflow, that is, when the size of the cluster exceeds the size of main memory, the performance of hash-based algorithms degrade substantially. Previously we proposed the GN hash algorithm which is robust in the presence of overflown clusters. The GN hash join algorithm combines the Grace hash join and hash-based nested-loop join algorithms. We analyze the performance of the GN hash join algorithm when applied to relations with a non-uniform Zipf-like data distribution. The performance is compared with other hash-based join algorithms: Grace, Hybrid, nested-loop, and simple hash join. The GN hash join algorithm is found to have higher performance on non-uniformly distributed relations. In this paper, the robustness of the GN hash algorithm from the point of choosing a run time method is verified. In the GN hash algorithm, the criterion for selecting a run time method from the two algorithm is determined by using the value calculated from the I/O cost formula of the two algorithms. This criterion cannot be guaranteed to be optimal under every data distribution, that is, the optimal criterion may change depending on the data distribution. When the data distribution is unknown, all data has to be repartitioned in order to get an accurate optimal criterion. However, from the view of choosing a method at run time, it is necessary for the GN hash algorithm to determine an appropriate criterion regardless of the data distribution. Thus, we inspect the criterion adopted in our algorithm under a simulation environment. From simulation results, we find that the range of the criterion is very wide under any data distribution and assure that the criterion determined with the assumption of a uniform data distribution can be used even when the data is highly skewed. Consequently, we can conclude that the GN hash algorithm which dynamically selects the nested-loop and Grace hash algorithms provides good performance in the presence of data skew and its performance is not sensitive to the criterion.
社団法人電子情報通信学会の論文
1996-11-25

Examination of Criterion for Choosing a Run Time Method in GN Hash Join Algorithm

スポンサーリンク

概要

著者

関連論文

スポンサーリンク