Low-overhead checkpoint for large-scale GPU-accelerated systems
スポンサーリンク
概要
- 論文の詳細を見る
In HPC, the applications are periodically checkpointed to stable storage to increase the success rate of long executions. Nowadays, the overhead imposed by remote-disk based checkpoint is about 20% of the execution time and in the next years it will be more than 50% if the checkpoint frequency increases as the fault frequency increases. Diskless checkpoint has been introduced as a solution to avoid the I/O bottleneck of remote-disk based checkpoint. However, the encoding time, the spare nodes and the memory overhead imposed by diskless checkpoint are significant obstacles against its adoption. At the same time, heterogeneous computing is becoming more and more popular in HPC, with new clusters combining CPUs and GPUs. In this work, we propose a way to checkpoint GPU applications, and avoid the I/O bottleneck by using SSDs in the compute nodes to significantly increase the checkpoint performance and avoid the memory overhead of classic diskless checkpoint. Our technique does not require spare nodes and can tolerate up to 50% of process failures with a low checkpoint overhead. We plan to evaluate and present the first results of our technique on TSUBAME 2.0.
- 2010-12-09
著者
-
Akira Nukada
Tokyo Institute of Technology
-
Satoshi Matsuoka
Tokyo Institute of Technology
-
LeonardoBautistaGomez
Tokyo Institute of Technology
-
Naoya Maruyama
Tokyo Institute of Technology
-
Franck Cappello
INRIA
-
Satoshi Matsuoka
Tokyo Institute Of Technology|national Institute Of Informatics|japan Science And Technology Agency
-
Satoshi Matsuoka
National Inst. Of Informatics
関連論文
- MPI-CUDA Applications Checkpointing
- Efficient PageRank on GPU Clusters
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Efficient PageRank on GPU Clusters
- Web-site-based partitioning techniques for efficient parallelization of the PageRank computation (ハイパフォーマンスコンピューティング)
- CG on GPU-enhanced Clusters
- CG on GPU-enhanced Clusters
- Fast GPU Read Alignmennt with Burrows Wheeler Transform Based Index
- GPU-based approach for elastic-plastic deformation simulations
- Data Ownership Assurance in the Inter-Cloud supporting data dynamics
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards Fast PGAS Implementation of Multithreaded Asynchronous Large-Scale Graph Traversal for Supercomputers with Local Semi-External Memory
- Towards a Dataflow FMM using the OmpSs Programming Model
- Avoiding silent data corruption in checkpoint files
- Burst SSD Buffer: Checkpoint Strategy at Extreme Scale
- Multi-level Temporal Blocking for Stencil Computation for Memory Hierarchy on TSUBAME2.5
- Performance modeling of a hierarchcial N-body algorithm for arbitrary particle distribution (Unrefereed Workshop Manuscript)
- Increasing GPU batch queue's utilization using rCUDA (Unrefereed Workshop Manuscript)
- Visualizing Collectives over InfiniBand Networks (Unrefereed Workshop Manuscript)