Int J Performability Eng ›› 2019, Vol. 15 ›› Issue (11): 3008-3015.doi: 10.23940/ijpe.19.11.p20.30083015

Previous Articles     Next Articles

Survivability of Distributed Fault Detection Systems

Lijun Zhou*, Haiyan Lv, Kai Liu, and Jie Zhang   

  1. Naval Aviation University, Yantai, 264001, China
  • Submitted on ; Revised on ; Accepted on
  • Contact: * E-mail address: jungle730@163.com

Abstract: In the design of distributed fault detection systems, an important basis is to monitor the entities in the distributed network computing system in order to achieve the full coverage of the fault detection function. Such distributed computing systems often involve many nodes, large geographical span, unstable communication delays, and loose management. It is very difficult to cover such systems functionally. Aiming at this problem, based on the idea of self-organizing networks and the realization of coverage monitoring of system nodes, this paper studies the survivability of monitoring functions caused by highly dynamic nodes and proposes a set of detection and repair methods for the system cut vertexes, which reduces the impact of highly dynamic nodes on system monitoring.

Key words: survivability, distributed system, fault detection