Evaluation of Checkpointing Mechanism on Score Cluster System(Dependable Software)(<Special Issue>Dependable Computing)
スポンサーリンク
概要
- 論文の詳細を見る
Cluster systems are getting widely used because of good performance / cost ratio. However, their reliability has not been well discussed in practical environment so far. As the number of commodity components in a cluster system gets in- creased, it is indispensable to support reliability by system software. Score cluster system software is a parallel programming environment for High Performance Computing (HPC). Score provides checkpointing and rollback-recovery mechanism for high availability. In this paper, we analyze and evaluate the checkpointing and rollback-recovery mechanisms of Score quantitively. The experimental results reveal that the required time for checkpointing scales very well in respect to the number of computing nodes. However, the required time is quite long due to the low effective network bandwidth. Based on the results, we modify Score and successfully make checkpointing and recovery 1.8 〜 2.8 times and 3.7 〜 5.0 times faster respectively. This is very helpful for cluster systems to achieve high performance and high availability.
- 社団法人電子情報通信学会の論文
- 2003-12-01
著者
-
IMAI Masashi
Research Center for Advanced Science and Technology, the University of Tokyo
-
KONDO Masaaki
Research Center for Advanced Science and Technology, the University of Tokyo
-
NAKAMURA Hiroshi
Research Center for Advanced Science and Technology, the University of Tokyo
-
NANYA Takashi
Research Center for Advanced Science and Technology, the University of Tokyo
-
Hori Atsushi
Swimmy Software Inc.
-
Nanya T
Research Center For Advanced Science And Technology The University Of Tokyo
-
Nanya Takashi
Research Center For Advanced Science & Technology University Of Tokyo
-
Kondo Masaaki
Research Center For Advanced Science And Technology The University Of Tokyo
-
KONDO Masaaki
CREST, JST (Japan Science and Technology Agency)
-
HAYASHIDA Takuro
Research Center for Advanced Science and Technology, The University of Tokyo
-
Hayashida T
Research Center For Advanced Science And Technology The University Of Tokyo
-
Nakamura Hiroshi
Research And Development Division Technical Research Loborotory Kawasaki Dockyard Co. Ltd.
関連論文
- Design Method of High Performance and Low Power Functional Units Considering Delay Variations(Circuit Synthesis,VLSI Design and CAD Algorithms)
- A Cascade ALU Architecture for Asynchronous Super-Scalar Processors (Special Issue on Low-Power High-Performance VLSI Processors and Technologies
- Improvement of the Uniformity of Tungsten / Carborn Multilayers by Thermal Processing
- Normal-Incidence X-Ray Microscope for Carbon Kα Radiation with 0.5 μm Resolution
- Mobile Service Control Point for Intelligent and Multimedia Mobile Communications (Special Issue on Mobile Multimedia Communications)
- Influence of Annealing Method on Microscopic One-to-One Correlation between Threshold Voltage of GaAs MESFET and Dislocation
- The Dependence of Threshold Voltage Scattering of GaAs MESFET on Annealing Method
- Improved Threshold Voltage Uniformity in GaAs MESFET Using High Purity MOCVD-Grown Buffer Layer as a Substrate for Ion Implantation
- Synthesis of Serial Local Clock Controllers for Asynchronous Circuit Design(IP Design)(VLSI Design and CAD Algorithms)
- Synthesis of Serial Local Clock Controllers for Asynchronous Circuit Design
- Crystal Growth of CuGaS_2 from Te, Te-Cu and Te-Cu-S Solutions
- Verification and Violation Correction of Timing Constraints for Gate-Level Asynchronous Circuits (特集:システムLSIの設計技術と設計自動化)
- Verification of Timing Constraints for Fine-Grain Pipelined Asynchronous Data-Path Circuits (デザインガイヤ2000) -- (VLSIの設計/検証/テスト及び一般)
- Verification of Timing Constraints for Fine-Grain Pipelined Asynchronous Data-Path Circuits (デザインガイヤ2000) -- (VLSIの設計/検証/テスト及び一般)
- Verification of Timing Constraints for Fine-Grain Pielined Asynchronous Data-Path Circuits (デザインガイヤ2000) -- (VLSIの設計/検証/テスト及び一般)
- 3E-3 Layout Methodology for SDI Model Asynchronous Circuits
- Effects of an Oral Administration of Glucosamine-Chondroitin-Quercetin Glucoside on the Synovial Fluid Properties in Patients with Osteoarthritis and Rheumatoid Arthritis
- Evaluation of Checkpointing Mechanism on Score Cluster System(Dependable Software)(Dependable Computing)
- On the Rotary Bending Fatigue Strength of Induction Hardened Crankshaft
- Effects of an Oral Administration of Glucosamine-Chondroitin-Quercetin Glucoside on the Synovial Fluid Properties in Patients with Osteoarthritis and Rheumatoid Arthritis
- A database replication middleware with fine-grained concurrency control (データベースシステム)
- Synthesis of Asynchronous Circuits from Signal Transition Graph Specifications (Special Issue on Asynchronous Circuit and System Design)
- On Concurrent Error Detection of Asynchronous Circuits Using Mixed-Signal Approach (Special Issue on Asynchronous Circuit and System Design)
- Performance Comparison of Synchronous and Asynchronous VLSI Systems
- Fluorescence Chemosensor with Specific Response for Mg^
- Synthesis Algorithm for Asynchronous Circuits from STG specifications
- Finding Unique PCR Products on Distributed Databases
- Tolerating Interaction Faults Originated From External Systems
- Tolerating Interaction Faults Originated From External Systems
- Special Issue on Asynchronous Circuit and System Design
- Logic Optimization of Asynchronous Speed-Independent Circuits Using Transduction Methods (特集:システムLSIの設計技術と設計自動化)
- Special Issue on Fault-Tolerant Computing