CG on GPU-enhanced Clusters

概要

論文の詳細を見る
Motivated by high computation power and low price per performance ratio of GPUs, GPU accelerated clusters are being built for high performance scientific computing. In this work, we explain implementation of a mixed precision Conjugate Gradient solver for unstructured matrices on a GPU-extended cluster. Basic computations of the solver are held on GPUs and communications are managed by the CPU. For sparse matrix-vector multiplication, which is the most time-consuming operation, solver automatically selects the fastest between several high performance kernels running on GPUs. In a GPU-extended cluster, it is more difficult than traditional CPU clusters to obtain scalability, since GPUs are very fast compared to CPUs. GPU-extended clusters demand faster communication between computation units. We demonstrate performance of the solver and discuss communication bottleneck for the solver using up to 64 GPUs.
2009-11-23