Performance modeling of a hierarchcial N-body algorithm for arbitrary particle distribution (Unrefereed Workshop Manuscript)
スポンサーリンク
概要
- 論文の詳細を見る
Hierarchical algorithms are considered to be important in next-generation large scale scientific computing. Such algorithms are typically compute-intensive and have higher communication locality that are beneficial on future supercomputers with much less B/F ratio. However, one of the big challenges of such algorithms is that the data structures and computation/communication patterns are irregular and it is difficult to analyze and predict the performance. In this paper, we introduce a performance modeling method for Fast Multipole Method, a typical example of hierarchical algorithms for N-body problems, using a domain specific performance modeling language Apsen. We show that our modeling scheme can adapt to various particle distributions parameters and provides useful information to application researchers to optimize algorithmic parameters.
- 2014-07-21
著者
-
Satoshi Matsuoka
Tokyo Institute of Technology
-
Naoya Maruyama
RIKEN
-
Jeremy S.Meredith
Oak Ridge National Laboratory
-
Keisuke Fukuda
Tokyo Institute of Technology
-
Jeffrey S.Vetter
Oak Ridge National Laboratory
関連論文
- MPI-CUDA Applications Checkpointing
- Efficient PageRank on GPU Clusters
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Efficient PageRank on GPU Clusters
- Web-site-based partitioning techniques for efficient parallelization of the PageRank computation (ハイパフォーマンスコンピューティング)
- CG on GPU-enhanced Clusters
- CG on GPU-enhanced Clusters
- Fast GPU Read Alignmennt with Burrows Wheeler Transform Based Index
- GPU-based approach for elastic-plastic deformation simulations
- Data Ownership Assurance in the Inter-Cloud supporting data dynamics
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards Fast PGAS Implementation of Multithreaded Asynchronous Large-Scale Graph Traversal for Supercomputers with Local Semi-External Memory
- Avoiding silent data corruption in checkpoint files
- Burst SSD Buffer: Checkpoint Strategy at Extreme Scale
- Performance modeling of a hierarchcial N-body algorithm for arbitrary particle distribution (Unrefereed Workshop Manuscript)
- Increasing GPU batch queue's utilization using rCUDA (Unrefereed Workshop Manuscript)
- Visualizing Collectives over InfiniBand Networks (Unrefereed Workshop Manuscript)