Int J Performability Eng ›› 2007, Vol. 3 ›› Issue (4): 441-451.doi: 10.23940/ijpe.07.4.p441.mag

• Original articles • Previous Articles     Next Articles

Reliability Analysis of Fault Tolerant Systems with Multi-Fault Coverage

GREGORY LEVITIN1 and SUPRASAD V. AMARI2   

  1. 1Israel Electric Corporation Ltd., P.O. Box 10, Haifa, 31000 Israel
    2Relex Software Corporation, 540 Pellis Road, Greensburg, PA 15601 USA

Abstract:

Fault-tolerance has been an essential architectural attribute for achieving high reliability in many critical applications of digital systems. Automatic fault and error handling mechanisms play a crucial role in implementing fault tolerance because an uncovered (undetected) fault may lead to a system or a subsystem failure even when adequate redundancy exists. Examples of this effect can be found in computing systems, electrical power distribution networks, pipelines carrying dangerous materials etc. Because an uncovered fault may lead to overall system failure, an excessive level of redundancy may even reduce the system reliability. Therefore, an accurate analysis must account for not only the system structure, but also the system fault & error handling behavior (often called coverage behavior) as well. The appropriate coverage modeling approach depends on the type of fault tolerant techniques used. The recent research literature emphasizes the importance of multi-fault coverage models where the effectiveness of recovery mechanisms depends on the coexistence of multiple faults in a group of elements, which are also called fault level coverage (FLC) groups, that collectively participate in detecting and recovering the faults in that group. However, the methods for solving multi-fault coverage models are limited, primarily because of the complex nature of the dependency introduced by the reconfiguration mechanisms. The paper suggests a modification of the generalized reliability block diagram (RBD) method for evaluating reliability indices of systems with multi-fault coverage. The suggested method based on a universal generating function technique computes the reliability indices of complex systems with multi-fault coverage using a straightforward recursive procedure. The proposed algorithm can be easily used in the case of hierarchical structure of FLC groups. Illustrative examples are presented.
Received on November 26, 2006
References: 24