Towards Fast PGAS Implementation of Multithreaded Asynchronous Large-Scale Graph Traversal for Supercomputers with Local Semi-External Memory
スポンサーリンク
概要
- 論文の詳細を見る
Large-scale graph processing is becoming increasingly important, graph size larger than volatile memory would be difficult to process. A single-node graph traversal algorithm that uses flash-based memory such as SSD as an external memory has been proposed, whose algorithm also uses multithreaded asynchronous method to hide the latency to SSD access. In order to handle real large graphs, we also need to utilize large-scale supercomputers. Using PGAS language will be easy to design such kind of algorithm; however the implementations and the optimization techniques (e.g., graph data distribution) are not obvious. This work focuses on the design and implementation of the same algorithm on a multi-node supercomputer using high productability PGAS language - X10. In current status, we implemented the similar algorithm using X10 on single-node. Our experiment shows that the X10 implantation version achieves about 9 times scale on a single 12-core compute node.
- 2012-03-19
著者
-
Satoshi Matsuoka
Tokyo Institute of Technology
-
Satoshi Matsuoka
Tokyo Institute Of Technology|national Institute Of Informatics|japan Science And Technology Agency
-
Satoshi Matsuoka
National Inst. Of Informatics
-
Hitoshi Sato
Tokyo Institute Of Technology|jst Crest
-
Jiayeu Zhang
Tokyo Institute of Technology|JST CREST
関連論文
- MPI-CUDA Applications Checkpointing
- Efficient PageRank on GPU Clusters
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Efficient PageRank on GPU Clusters
- Web-site-based partitioning techniques for efficient parallelization of the PageRank computation (ハイパフォーマンスコンピューティング)
- CG on GPU-enhanced Clusters
- CG on GPU-enhanced Clusters
- Fast GPU Read Alignmennt with Burrows Wheeler Transform Based Index
- GPU-based approach for elastic-plastic deformation simulations
- Data Ownership Assurance in the Inter-Cloud supporting data dynamics
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards Fast PGAS Implementation of Multithreaded Asynchronous Large-Scale Graph Traversal for Supercomputers with Local Semi-External Memory
- Towards a Dataflow FMM using the OmpSs Programming Model
- Avoiding silent data corruption in checkpoint files
- Burst SSD Buffer: Checkpoint Strategy at Extreme Scale
- Multi-level Temporal Blocking for Stencil Computation for Memory Hierarchy on TSUBAME2.5
- Performance modeling of a hierarchcial N-body algorithm for arbitrary particle distribution (Unrefereed Workshop Manuscript)
- Increasing GPU batch queue's utilization using rCUDA (Unrefereed Workshop Manuscript)
- Visualizing Collectives over InfiniBand Networks (Unrefereed Workshop Manuscript)