Efficient PageRank on GPU Clusters
スポンサーリンク
概要
- 論文の詳細を見る
In this work, we report scalability of PageRank on multi-GPU clusters. Our target GPU clusters may contain more than one GPU accelerator per node. Iterative solvers for irregular sparse problems poorly scale with increasing number of processors because of load imbalance problem and network bottleneck. GPU computing units are too fast, for which network performance remains too low. Even the latest network hardware cannot provide bandwidth appropriate for high performance GPUs. In our previous work, we have introduced several implementation techniques and algorithms required for scalable sparse iterative solvers on multi-GPU extended clusters and evaluated those techniques on a Conjugate Gradient solver5),6). In this work, we present the GPU cluster performance evaluation of another important iterative method, PageRank. For GPU implementation of PageRank, although we are inspired by the techniques that are presented in [6], we cannot use them as is, since PageRank data has very different characteristics than Krylov Method data. Our PageRank implementation on GPUs is based on our work on an efficient CPU cluster algorithm7). In our experiments, we observe that PageRank achieves better scalability because of the enough data size to saturate GPUs. We observe scalability up to a hundred GPUs for PageRank, being at the same time almost 10 times faster than the CPU cluster implementation with the same number of CPU cores.
- 2010-12-09
著者
-
Ali Cevahir
Rakuten Institute of Technology
-
Cevdet Aykanat
Bilkent University
-
Ata Turk
Bilkent University
-
B.BarlaCambazoglu
Yahoo! Research
-
Akira Nukada
Tokyo Institute of Technology
-
Satoshi Matsuoka
Tokyo Institute of Technology
-
Satoshi Matsuoka
National Inst. Of Informatics
関連論文
- MPI-CUDA Applications Checkpointing
- Efficient PageRank on GPU Clusters
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Efficient PageRank on GPU Clusters
- Web-site-based partitioning techniques for efficient parallelization of the PageRank computation (ハイパフォーマンスコンピューティング)
- CG on GPU-enhanced Clusters
- CG on GPU-enhanced Clusters
- Fast GPU Read Alignmennt with Burrows Wheeler Transform Based Index
- GPU-based approach for elastic-plastic deformation simulations
- Data Ownership Assurance in the Inter-Cloud supporting data dynamics
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards Fast PGAS Implementation of Multithreaded Asynchronous Large-Scale Graph Traversal for Supercomputers with Local Semi-External Memory
- Towards a Dataflow FMM using the OmpSs Programming Model
- Avoiding silent data corruption in checkpoint files
- Burst SSD Buffer: Checkpoint Strategy at Extreme Scale
- Multi-level Temporal Blocking for Stencil Computation for Memory Hierarchy on TSUBAME2.5
- Performance modeling of a hierarchcial N-body algorithm for arbitrary particle distribution (Unrefereed Workshop Manuscript)
- Increasing GPU batch queue's utilization using rCUDA (Unrefereed Workshop Manuscript)
- Visualizing Collectives over InfiniBand Networks (Unrefereed Workshop Manuscript)