Visualizing Collectives over InfiniBand Networks (Unrefereed Workshop Manuscript)
スポンサーリンク
概要
- 論文の詳細を見る
As the scale of high performance computing systems increases, optimizing interprocess communication becomes more challenging while being critical for ensuring good performance. Furthermore, the hardware layer abstraction provided by MPI makes it difficult to perform any application optimization that links network utilization with application communication. We overcome this barrier by extending the Peruse utility in Open MPI to track network events within MPI operations from the application layer. We also develop a non-intrusive profiling library to make use of our Peruse enhancement and show how we can use BoxFish with our profiling library to visualize the flow of application traffic over each link within large scale InfiniBand networks. The tool-chain that we describe can be used without any modification to the target application and incurs less than 1% application runtime overhead.
- 一般社団法人情報処理学会の論文
- 2014-07-21
著者
-
Satoshi Matsuoka
Tokyo Institute of Technology
-
Kevin A.Brown
Tokyo Institute of Technology
-
Jens Domke
Tokyo Institute of Technology
関連論文
- MPI-CUDA Applications Checkpointing
- Efficient PageRank on GPU Clusters
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Efficient PageRank on GPU Clusters
- Web-site-based partitioning techniques for efficient parallelization of the PageRank computation (ハイパフォーマンスコンピューティング)
- CG on GPU-enhanced Clusters
- CG on GPU-enhanced Clusters
- Fast GPU Read Alignmennt with Burrows Wheeler Transform Based Index
- GPU-based approach for elastic-plastic deformation simulations
- Data Ownership Assurance in the Inter-Cloud supporting data dynamics
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards Fast PGAS Implementation of Multithreaded Asynchronous Large-Scale Graph Traversal for Supercomputers with Local Semi-External Memory
- Avoiding silent data corruption in checkpoint files
- Burst SSD Buffer: Checkpoint Strategy at Extreme Scale
- Performance modeling of a hierarchcial N-body algorithm for arbitrary particle distribution (Unrefereed Workshop Manuscript)
- Increasing GPU batch queue's utilization using rCUDA (Unrefereed Workshop Manuscript)
- Visualizing Collectives over InfiniBand Networks (Unrefereed Workshop Manuscript)