Increasing GPU batch queue's utilization using rCUDA (Unrefereed Workshop Manuscript)
スポンサーリンク
概要
- 論文の詳細を見る
In heterogeneous supercomputer, GPU job queue whose nodes compose of multiple GPUs can be under-utilized due to resource-assignment fragmentation. For example, in the case that each node has three GPUs like TSUBAME2.5, if a node has already been assigned to a job requesting two GPUs, that node cannot be assigned to another job requesting more than one GPU until the current job leaves the node. We examine this problem on TSUBAME2.5's GPU batch-queue system, and present a scheduling algorithm that uses rCUDA to alleviate it. Our simulation shows that the proposed scheduling algorithm can finish all simulated jobs on simulated congesting queue by 15% - 30% faster. Moreover, using jobs patterns obtained from scheduler log of TSUBAME GPU queue, the proposed algorithm shows 5.06% decrease in job life time (from arrives until finishes processing) on average. It also shows that even reducing the number of nodes in the queue by around 4% the average jobs life time is still around the same as the present algorithm.
- 2014-07-21
著者
-
Satoshi Matsuoka
Tokyo Institute of Technology
-
Akihiro Nomura
Tokyo Institute of Technology
-
Pak Markthub
Tokyo Institute of Technology
関連論文
- MPI-CUDA Applications Checkpointing
- Efficient PageRank on GPU Clusters
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Efficient PageRank on GPU Clusters
- Web-site-based partitioning techniques for efficient parallelization of the PageRank computation (ハイパフォーマンスコンピューティング)
- CG on GPU-enhanced Clusters
- CG on GPU-enhanced Clusters
- Fast GPU Read Alignmennt with Burrows Wheeler Transform Based Index
- GPU-based approach for elastic-plastic deformation simulations
- Data Ownership Assurance in the Inter-Cloud supporting data dynamics
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards Fast PGAS Implementation of Multithreaded Asynchronous Large-Scale Graph Traversal for Supercomputers with Local Semi-External Memory
- Avoiding silent data corruption in checkpoint files
- Burst SSD Buffer: Checkpoint Strategy at Extreme Scale
- Performance modeling of a hierarchcial N-body algorithm for arbitrary particle distribution (Unrefereed Workshop Manuscript)
- Increasing GPU batch queue's utilization using rCUDA (Unrefereed Workshop Manuscript)
- Visualizing Collectives over InfiniBand Networks (Unrefereed Workshop Manuscript)