Int J Performability Eng ›› 2018, Vol. 14 ›› Issue (7): 1487-1492.doi: 10.23940/ijpe.18.07.p12.14871492

• Original articles • Previous Articles     Next Articles

Reliability Evaluation of a Parallel Job with Real-Time Redundant Computing

Xiwei Qiu, Liang Luo, Sa Meng, and Han Xu   

  1. University of Electronic Science and Technology of China, Chengdu, 611731, China

Abstract:

In network systems, a job with a large amount of work-requirement is usually divided into multiple tasks to achieving parallel computing. However, if any task is failed due to random task failures or server failures, the entire job cannot be complete. In traditional redundant computing, copies of tasks are initiated whenever the tasks are found to be failed. This is not an efficient approach from the perspective of the performance. Real-time parallel computing is more high-efficient, which makes tasks and their copies run simultaneously. In this paper, to evaluating the reliability of such a job, we first describe a complicated parallel and redundant computing environment as multiple minimal-job spanning trees (MJST) consisting of tasks and servers. Then, we design two operators of MJSTs to systemically analyze complicated failure correlations among multiple MJSTs. Finally, an algorithm based on the Bayesian theorem is presented to evaluate the reliability of a parallel job with real-time parallel computing. Illustrative examples are provided.


Submitted on April 14, 2018; Revised on May 23, 2018; Accepted on June 22, 2018
References: 11