Performance Models for MPI Collective Communications with Network Contention
スポンサーリンク
概要
- 論文の詳細を見る
The paper presents a novel approach to estimate the performance of MPI collective communications. Our objective is to help researchers to make appropriate decisions on their message-passing applications. For each collective communication, we attempt to apply LogGP and P-LogP standard point-to-point models. The resulted models are compared with the empirical data in order to identify the most suitable for performance characterization of collective operations. For the communications on large clusters with large size messages, the network contention problem can significantly affect the performance. Hence, to reduce the relative gap between the prediction and the measured runtime, the contention issue is also modeled, by a queuing theory analysis method, and taken in account with the total performance estimation. The experiments performed on a cluster which consists of 64 processors interconnected by Gigabit Ethernet network show encouraging results. For any collective operation, given a number of processors and a range of message sizes, there is at least one model that predicts the performance precisely. We could achieve a gap between the predicted and the measured run-time around 15%. Thus, by handling the contention problem, we could reduce around 80% of the relative gap.
- (社)電子情報通信学会の論文
- 2008-04-01
著者
-
Murakami Kazuaki
Department Of Electronics Engineering And Computer Science Fukuoka University
-
Murakami Kazuaki
Department Of Informatics Kyushu University
-
Nzigou Mamadou
Department Of Informatics Kyushu University
-
NANRI Takeshi
Computing and Communications Center, Kyushu University
-
Nanri Takeshi
Computing And Communications Center Kyushu University
-
Nanri Takeshi
Computer Center
-
Murakami Kazuaki
Department Of Computer Science And Communication Engineering Kyushu University
関連論文
- Quantitative Evaluation of State-Preserving Leakage Reduction Algorithm for L1 Data Caches
- Improving Performance and Energy Saving in a Reconfigurable Processor via Accelerating Control Data Flow Graphs
- Character Projection Mask Set Optimization for Enhancing Throughput of MCC Projection Systems
- Reliable Cache Architectures and Task Scheduling for Multiprocessor Systems
- Architectural-Level Soft-Error Modeling for Estimating Reliability of Computer Systems(VLSI Design Technology,VLSI Technology toward Frontiers of New Market)
- A Reconfigurable Functional Unit with Conditional Execution for Multi-Exit Custom Instructions
- Temperature-Aware Configurable Cache to Reduce Energy in Embedded Systems
- The potential of temperature-aware configurable cache on energy reduction (計算機アーキテクチャ)
- The potential of temperature-aware configurable cache on energy reduction (集積回路)
- Custom Instructions with Multiple Exits : Generation and Execution
- Custom Instructions with Multiple Exits : Generation and Execution
- A Reconfigurable Functional Unit for Adaptable Custom Instructions
- A Reconfigurable Functional Unit for Adaptable Custom Instructions(集積回路技術とアーキテクチャ技術の協調・融合へ向けた,プロセッサ,並列処理,システムLSIアーキテクチャ及び一般)
- An Adaptive Dynamic Extensible Processor
- Performance Models for MPI Collective Communications with Network Contention
- Instruction Encoding for Reducing Power Consumption of I-ROMs Based on Execution Locality
- Trends in High-Performance, Low-Power Cache Memory Architectures
- Omitting Cache Look-up for High-Performance, Low-Power Microprocessors(Special Issue on High-Performance and Low-Power Microprocessors)
- Portability in Implementing Distributed Shared Memory System on the Workstation Cluster Environment
- Hyperscalar Processor Architecture and the Preliminary Performance Evaluation
- PPRAM (Parallel Processing RAM) : A Merged-DRAM/Logic System-LSI Architecture
- A Message-Pool-Based Parallel Operating System for the Kyushu University Reconfigurable Parallel Processor : Parallel Creation of Multiple Threads
- Optimisations Techniques for the Automatic ISA Customisation Algorithm