Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems
スポンサーリンク
概要
- 論文の詳細を見る
In this paper, we propose a new scheduling algorithm to achieve fault tolerance in multiprocessor systems. This algorithm first partitions a parallel program into subsets of tasks, based on the notion of height of a task graph. For each subset, the algorithm then duplicates and schedules the tasks in the subset successively. We prove that schedules obtained by the proposed algorithm can tolerate a single processor failure and show that the computational complexity of the algorithm is O(|V|^4) where V is the set of nodes of a task graph. We conduct simulations by applying the algorithm to two kinds of practical task graphs (Gaussian elimination and LU-decomposition). The results of this experiment show that fault tolerance can be achieved at the cost of small degree of time redundancy, and that performance in the case of a processor failure is improved compared to a previous algorithm.
- 社団法人電子情報通信学会の論文
- 2002-03-01
著者
-
Tsuchiya Tatsuhiro
Graduate School Of Information Science And Technology Osaka University
-
Tsuchiya Tatsuhiro
Department Of Informatics And Mathematical Science Graduate School Of Engineering Science Osaka Univ
-
Tsuchiya Tatsuhiro
The Department Of Informatics And Mathematical Science Osaka University
-
KIKUNO Tohru
Department of Informatics and Mathematical Science, Graduate School of Engineering Science, Osaka Un
-
Kikuno Tohru
Department Of Informatics And Mathematical Science Graduate School Of Engineering Science Osaka Univ
-
HASIMOTO Koji
Hitachi Laboratory, Hitachi Ltd
-
Hasimoto Koji
Hitachi Laboratory Hitachi Ltd
-
Tsuchiya Tatsuhiro
Department Of Informatics And Mathematical Science Graduate School Of Engineering Science Osaka Univ
-
Hashimoto Koji
Hitachi Laboratory Hitachi Ltd.
関連論文
- Feature Interaction Verification Using Unbounded Model Checking with Interpolation
- Probabilistic Model Checking of the One-Dimensional Ising Model
- Feature Interaction Detection by Bounded Model Checking(Dependable Communication)(Dependable Computing)
- Exploiting Symmetric Relation for Efficient Feature Interaction Detection
- Three-Mode Failure Model for Reliability Analysis of Distributed Programs (Special Issue on Fault-Tolerant Computing)
- Verifying Fault Tolerance of Concurrent Systems by Model Checking(Special Section on Concurrent System Technology and Its Application to Multiple Agent Systems)
- A Hierarchical Approach to Dependability Evaluation of Distributed Systems with Replicated Resources
- Computing the Stabilization Times of SElf-Stabilizing Systems (Special Section on Concurrent Systems Technology)
- New Constructions for Nondominated k-Coteries
- Timed Reachability Analysis Method for Communication Protocols Modeled by Extended Finite State Machines (Special Issue on Multimedia Communication and Distributed Processing)
- New 2-Factor Covering Designs for Software Testing(Regular Section)
- A New Verification Framework of Object-Oriented Design Specification for Small Scale Software (Special Issue on Fault-Tolerant Computing)
- A BDD-based approach to reliability-optimal module allocation in networks (信頼性)
- SAT and SMT based model checking of concurrent systems (コンカレント工学)
- Parallelizing SDP(Sum of Disjoint Products)Algorithms for Fast Reliability Analysis
- An Energy-Efficient Broadcast Scheme for Multihop Wireless Ad Hoc Networks Using Variable-Range Transmission Power(Networks)
- Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems
- Error Models and Fault-Secure Scheduling in Multiprocessor Systems
- Experimental Evaluation of Team Performance in Program Development Based on a Model : Extension of a Programmer Performance Model
- Constructing Overlay Networks with Short Paths and Low Communication Cost
- On Desirable Fault-Tolerant Topology for Cluster-Based Network (Special Section on Net Theory and Its Applications)