Controller/Precompiler for Portable Checkpointing(Parallel/Distributed Programming Models, Paradigms and Tools, Paradigms and Tools, <Special Section> Parallel/Distributed Computing and Networking)
スポンサーリンク
概要
- 論文の詳細を見る
This paper presents CPPC (Controller/Precompiler for Portable Checkpointing), a checkpointing tool designed for heterogeneous clusters and Grid infrastructures through the use of portable protocols, portable checkpoint files and portable code. It works at variable level being user-directed, thus generating small checkpoint files. It allows parallel processes to checkpoint independently, without runtime coordination or message-logging. Consistency is achieved at restart time by negotiating the restart point. A directive-based checkpointing precompiler has also been implemented to ease up user's effort. CPPC was designed to work with parallel MPI programs, though it can be used with sequential ones, and easily extended to parallel programs written using different message-passing libraries, due to its highly modular design. Experimental results are shown using CPPC with different test applications.
- 社団法人電子情報通信学会の論文
- 2006-02-01
著者
-
Martin Maria
Computer Architecture Group Department Of Electronics And Systems University Of A Coruna
-
RODRIGUEZ Gabriel
Computer Architecture Group, Department of Electronics and Systems, University of A Coruna
-
GONZALEZ Patricia
Computer Architecture Group, Department of Electronics and Systems, University of A Coruna
-
TOURINO Juan
Computer Architecture Group, Department of Electronics and Systems, University of A Coruna
-
Rodriguez Gabriel
Computer Architecture Group Department Of Electronics And Systems University Of A Coruna
-
Gonzalez Patricia
Computer Architecture Group Department Of Electronics And Systems University Of A Coruna
関連論文
- Controller/Precompiler for Portable Checkpointing(Parallel/Distributed Programming Models, Paradigms and Tools, Paradigms and Tools, Parallel/Distributed Computing and Networking)
- A Grid Portal to Support High-Performance Scientific Computing on Distributed Resources(Distributed, Grid and P2P Computing)(Hardware/Software Support for High Performance Scientific and Engineering Computing)