Towards a Dataflow FMM using the OmpSs Programming Model
スポンサーリンク
概要
- 論文の詳細を見る
This paper describes initial efforts towards the development of a dataflow implementation of the ExaFMM Fast Multipole Method code using the OmpSs programming model. We first develop several implementations based on task decomposition which overcome load balancing problems previously identified using traditional parallelization approaches. We then add dataflow extensions to improve task throughput by extracting distant parallelism and removing barriers. Execution profiles and scalability results for a single node of the Tsubame 2.0 supercomputer are then shown.
- 一般社団法人情報処理学会の論文
- 2012-09-26
著者
-
Naoya Maruyama
Tokyo Institute of Technology
-
Satoshi Matsuoka
Tokyo Institute Of Technology|national Institute Of Informatics|japan Science And Technology Agency
-
Satoshi Matsuoka
National Inst. Of Informatics
-
Satoshi Matsuoka
Global Scientific Information and Computing Center, Tokyo Institute of Technology
-
Keisuke Fukuda
Department Of Mathematical And Compute Sciences Tokyo Institute Of Technology
-
Miquel Pericas
Global Scientific Information and Computing Center, Tokyo Institute of Technology
-
Abdelhalim Amer
Department of Mathematical and Compute Sciences, Tokyo Institute of Technology
-
Naoya Maruyama
Advanced Institute for Computational Science, Riken
-
Rio Yokota
King Abdullah University of Science and Technology
-
Miquel Pericas
Global Scientific Information And Computing Center Tokyo Institute Of Technology
-
Abdelhalim Amer
Department Of Mathematical And Compute Sciences Tokyo Institute Of Technology
-
Naoya Maruyama
Advanced Institute For Computational Science Riken
関連論文
- MPI-CUDA Applications Checkpointing
- Efficient PageRank on GPU Clusters
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Low-overhead checkpoint for large-scale GPU-accelerated systems
- Efficient PageRank on GPU Clusters
- Web-site-based partitioning techniques for efficient parallelization of the PageRank computation (ハイパフォーマンスコンピューティング)
- CG on GPU-enhanced Clusters
- CG on GPU-enhanced Clusters
- Fast GPU Read Alignmennt with Burrows Wheeler Transform Based Index
- GPU-based approach for elastic-plastic deformation simulations
- Data Ownership Assurance in the Inter-Cloud supporting data dynamics
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards an Asynchronous Checkpointing System
- Towards Fast PGAS Implementation of Multithreaded Asynchronous Large-Scale Graph Traversal for Supercomputers with Local Semi-External Memory
- Towards a Dataflow FMM using the OmpSs Programming Model
- Avoiding silent data corruption in checkpoint files
- Burst SSD Buffer: Checkpoint Strategy at Extreme Scale
- Multi-level Temporal Blocking for Stencil Computation for Memory Hierarchy on TSUBAME2.5