MPI-CUDA Applications Checkpointing
スポンサーリンク
概要
- 論文の詳細を見る
We describe a method to checkpoint MPI applications that use GPUs as accelerators. As current MPI checkpointing tools such as LAM/MPI and Open MPI do not support checkpointing states on GPU, it is a big hindrance for users who want to develop hybrid MPI CUDA applications running on large-scale clusters with high rate of failure. Here we propose a method to checkpoint MPI CUDA applications by integrating Open MPI, BLCR and our CUDA checkpointer. Our CUDA checkpointer hooks CUDA Runtime API calls to record data on GPU for backup during checkpoint/restart sessions and we integrate this checkpointer into the BLCR checkpoint/restart module in Open MPI. In this method, our CUDA checkpointer will monitor and record CUDA resources used on the GPU during program execution. At checkpointing, it is invoked to checkpoint states on GPU by calling our user-defined callback function in BLCR. As restarting, the CUDA checkpointer will perform restoring data and CUDA contexts on the GPU together with Open MPI's restarting service. Based on this methodology, our implementation demonstrates that MPI CUDA applications in which CUDA Runtime API codes are used can be checkpointed and restarted properly in a transparent way. Our implementation also shows a checkpoint overhead of about 38 seconds in checkpointing a 3D stencil application with size 256x256x600 running on 60 GPU-enabled nodes.
- 2010-07-27
著者
-
Satoshi Matsuoka
Tokyo Institute of Technology
-
Naoya Maruyama
Tokyo Institute of Technology
-
Nguyen Toan
Tokyo Institute of Technology
-
Tatsuo Nomura
Tokyo Institute of Technology
-
Hideyuki Jitsumoto
University of Tokyo
-
Toshio Endo
Tokyo Institute of Technology
-
JITSUMOTO Hideyuki
University of Tokyo
-
ENDO Toshio
Tokyo Institute of Technology
-
Satoshi Matsuoka
Tokyo Institute Of Technology|national Institute Of Informatics|japan Science And Technology Agency
-
Satoshi Matsuoka
National Inst. Of Informatics
関連論文
- MPI-CUDA Applications Checkpointing
- Efficient PageRank on GPU Clusters
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Efficient PageRank on GPU Clusters
- 異種アクセラレータを持つヘテロ型スーパーコンピュータ上のLinpackの性能向上手法
- Web-site-based partitioning techniques for efficient parallelization of the PageRank computation (ハイパフォーマンスコンピューティング)
- MPI-CUDA Applications Checkpointing
- CG on GPU-enhanced Clusters
- CG on GPU-enhanced Clusters
- Fast GPU Read Alignmennt with Burrows Wheeler Transform Based Index
- GPU-based approach for elastic-plastic deformation simulations
- Data Ownership Assurance in the Inter-Cloud supporting data dynamics
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards Fast PGAS Implementation of Multithreaded Asynchronous Large-Scale Graph Traversal for Supercomputers with Local Semi-External Memory
- Towards a Dataflow FMM using the OmpSs Programming Model
- Avoiding silent data corruption in checkpoint files
- Burst SSD Buffer: Checkpoint Strategy at Extreme Scale
- Multi-level Temporal Blocking for Stencil Computation for Memory Hierarchy on TSUBAME2.5
- Performance modeling of a hierarchcial N-body algorithm for arbitrary particle distribution (Unrefereed Workshop Manuscript)
- Increasing GPU batch queue's utilization using rCUDA (Unrefereed Workshop Manuscript)
- Visualizing Collectives over InfiniBand Networks (Unrefereed Workshop Manuscript)