Deterministic Message Passing for Distributed Parallel Computing
スポンサーリンク
概要
- 論文の詳細を見る
The nondeterminism of message-passing communication brings challenges to program debugging, testing and fault-tolerance. This paper proposes a novel deterministic message-passing implementation (DMPI) for parallel programs in the distributed environment. DMPI is compatible with the standard MPI in user interface, and it guarantees the reproducibility of message with high performance. The basic idea of DMPI is to use logical time to solve message races and control asynchronous transmissions, and thus we could eliminate the nondeterministic behaviors of the existing message-passing mechanism. We apply a buffering strategy to alleviate the performance slowdown caused by mismatch of logical time and physical time. To avoid deadlocks introduced by deterministic mechanisms, we also integrate DMPI with a lightweight deadlock checker to dynamically detect and solve these deadlocks. We have implemented DMPI and evaluated it using NPB benchmarks. The results show that DMPI could guarantee determinism with incurring modest runtime overhead (14% on average).
著者
-
ZHANG Kai
School of Computer, National University of Defense Technology
-
ZHOU Xu
School of Computer, National University of Defense Technology
-
LU Kai
School of Computer, National University of Defense Technology
-
WANG Xiaoping
School of Computer, National University of Defense Technology
-
ZHANG Wenzhe
School of Computer, National University of Defense Technology
-
LI Xu
School of Computer, National University of Defense Technology
-
LI Gen
School of Computer, National University of Defense Technology
関連論文
- CMRF: a Configurable Matrix Register File for accelerating matrix operations on SIMD processors
- Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors
- Deterministic Message Passing for Distributed Parallel Computing