International Journal of Performability Engineering, 2011, No 1

, No 1

Data Mining

Data Mining in Reliability, Availability, Maintainability, and Risk Management

Export Citations
EndNote Reference Manager ProCite BibTeX RefWorks

Editorial

Select Editorial

Krishna B. Misra

2011, 7(1): 1. doi:10.23940/ijpe.11.1.p1.mag

Abstract

In order to invigorate research and to encourage readers to contribute to new areas of importance in performability engineering, this journal has been bringing out special issues or special sections of an issue on some topics of current interest or importance. This first issue of the seventh year of the IJPE existence, we present to our readers, the upcoming and potential area of Data Mining and its possible applications to Reliability and Risk. With this objective in view, we invited Professor Claudio Rocco of Venezuela who is a member of IJPE Editorial Board and has extensively worked in this area to bring out a special section on data mining. He was kind enough to accept this responsibility and was able to select four papers which were reviewed and revised for this section which is presented here. Data mining is an important area and has developed its own methodologies. It is hoped that this section will be able to generate interest in our readership in this important area.

The first paper of the special section by Utkin and Coolen describes an approach of obtaining reliability growth models through the framework of predictive learning using Kolomogorov-Smirnov confidence limits and nonparametric inferences. The largest and smallest risk measures are determined as a function of the regression parameters through a minimizing process and finally the pessimistic and optimistic reliability growth models. In the second paper by Zhang and Ramirez-Marquez, an evolutionary algorithm based on a data mining technique is used to determine an approximation to minimal cut sets of a flow network by defining an optimization problem and using an evolutionary algorithm to solve it. Generally, for complex and large networks, obtaining an exact value of reliability may become prohibitive and an approximation to the true reliability may suffice. Similarly, in the third paper by Fuqing, Kumar and Misra, it is again approximate system reliability is obtaining from incomplete data set using Support Vector Machine and Monte Carlo simulation. In the fourth paper by Baraldi, Compare, Zio, De Nigris and Rizzi of special section on Data Mining, a technique based on the Adaboost algorithm, which has been effectively used for addressing classification problems, is proposed for identifying contradictory PD patterns within an a priori analysis aimed at improving the diagnostic performance.

I would like to thank Professor Rocco for the effort put in bringing out this special section and thank all the authors who contributed to this section.

The next four papers of the present issue are the papers contributed to the journal through usual channel and represent variety of interesting and important topics. For example the fifth paper of this issue by Kosmowski of Poland considers human factors during design of safety related functions for a complex and hazardous installation and its protection. The next paper by S?derholm, and Karim of LTU, and Candell of Saab Aerotech, of Sweden present a methodology and support tool box for design of experiment and simulation for identification of significant e-maintenance services. The seventh paper of this issue by Abrahamsen and Aven of Norway presents an interesting application of bubble diagram for projects risk management. This paper aims to provide a rational framework for managing risks. The last paper of this issue is by Guo with Guo and Thiart of South Africa and they present a scheme for parameter estimation and simulation scheme, which the authors claim to be a foundational work for Poisson random fuzzy reliability and risk analysis. It is hoped that the readers of IJPE will find papers presented here interesting and stimulating which is the objective of this journal.

Select Guest Editorial: Data Mining in Reliability and Risk Assessment

Claudio M. Rocco

2011, 7(1): 3-4. doi:10.23940/ijpe.11.1.p3.mag

Abstract

Today, in many physical phenomena, the underlying first principles may not be known with certainty, or the system under study is so complex that a mathematical formulation is difficult to develop. However, with the growing use of computers, it has been possible to generate a great amount of data for such systems. In such cases, in stead of developing models from first principles, such readily available data can be used to derive models by extracting meaningful relationships between a system's variables (i.e., unknown input-output dependencies). The complex, information-rich data sets today are becoming common to all fields of business, science, and engineering. The ability to extract useful knowledge hidden in these data and to act on that knowledge is becoming increasingly important in today's competitive world. The entire process of applying a computer-based methodology for extracting knowledge or information from data is called data mining. Data mining is an iterative process and it is a search for new, valuable, and nontrivial information in large volumes of data.

Basically, the objective of data mining is either prediction or description. Predictive data mining involves using some variables or fields in the data set to classify, predict unknown or estimate values of the variables of interest. Descriptive data mining, on the other hand, involves finding patterns and relationships described by the data that can be interpreted. Therefore, data-mining activities may fall into one of the foregoing categories.

Today, data mining applications are available on all size computer systems, viz., for mainframe, client/server, and PC platforms. System prices range from several thousand dollars for the smallest applications up to $1 million a terabyte for the largest.

The success of a data-mining depends largely on the amount of skill, knowledge, and ingenuity of the analyst. It is essentially like solving a puzzle. Data mining is one of the fastest growing fields in the field of science and engineering. Starting with the use of computer science and statistics, it has quickly expanded into a field of its own. One of the greatest strengths of data mining is reflected in its wide range of methodologies and techniques that have been embraced by it and applied to a host of problems.

Data mining has its origins in various disciplines, of which the two most important are statistics and machine learning. With the use of Statistics, there has been an emphasis on mathematical rigor, a need to establish that is something sensible on theoretical grounds before testing it in practice. On the other hand, the machine-learning community has its origins in computer technology. This has led to a practical orientation, a willingness to test something out to see how well it performs, without waiting for a formal proof of effectiveness.

Basic modeling principles in data mining also have roots in control theory, which is primarily applied to engineering systems and industrial processes. The problem of determining a mathematical model for an unknown is by observing its input-output data pairs and is generally referred to as system identification. The purposes of system identification from the point of view of data mining, is to predict a system's behavior and to explain the interaction and relationships between the variables of a system. System identification generally involves two top-down steps:

Structure identification - In this step, a priori knowledge about the system is applied to determine a class of models within which the search for the most suitable model can be conducted. Usually this class of models is denoted by a parameterized function y = f(u,t), where y is the model's output, u is an input vector, and t is a parameter vector.
Parameter identification - In this step, once the structure of the model is known, we need to apply optimization techniques to determine parameter vector t such that the resulting model y* = f(w,/*) can describe the system appropriately.

In general, system identification is not a one-pass process: both structure and parameter identification needs to be applied repeatedly until a satisfactory model is developed.

Several types of analytical software are available: statistical, machine learning, and neural networks and the idea is to seek any of the four types of relationships:

Classes: Stored data is used to locate data in predetermined groups.
Clusters: Data items are grouped according to logical relationships or the user's preferences.
Associations: Data can be mined to identify associations.
Sequential patterns: Data is mined to anticipate behavior patterns and trends.

Data mining consists of five major elements:

Extract, transform, and load transaction data onto the data warehouse system.
Store and manage the data in a multidimensional database system.
Provide data access to business analysts and information technology professionals.
Analyze the data by application software.
Present the data in a useful format, such as a graph or table.

The manual extraction of patterns from data has been there for centuries. Early methods of identifying patterns in data included Bayes' theorem (1700s) and well-known regression analysis (1800s). The proliferation, all pervasiveness and increasing power of computers has increased data collection and storage capabilities. As data sets have grown in size and complexity, direct hands-on data analysis has increasingly been augmented with indirect, automatic data processing. This has been aided by other useful techniques in computer science, such as neural networks, clustering, genetic algorithms (1950s), decision trees (1960s) and support vector machines (1980s). Nearest neighbourhood technique, rule induction (the extraction of useful if-then rules from data based on statistical significance), data visualization etc. are the newer techniques used in data mining.

Data mining is today the process of applying these methods to data with the intention of uncovering hidden patterns. It has been used for many years by businesses, scientists, engineers and governments to sift through volumes of data.

Realizing the importance of data mining to the field of reliability and risk, Professor Krishna B. Misra, Editor-in-Chief of IJPE requested me to bring out a special issue on Data Mining as applied to reliability and risk. Invitations were sent out to several researchers active in this field and the result of the exercise is that we received only four papers which relate to Data mining principles for this issue. However, it is hoped that these papers will act as catalyst to generate further interest of the researchers and readers to come forward to augment reliability and risk literature with data mining principles.

Claudio M. Rocco received his B.Sc. in Electrical Engineering and M.Sc. in Electrical Engineering (Power Systems) from Universidad Central de Venezuela and his Ph.D. from The Robert Gordon University, Aberdeen, Scotland, U.K. He is a full professor at Universidad Central de Venezuela in Operational Research post-graduate courses. He is a member of the editorial board of International Journal of Performability Engineering.

Original articles

Select On Reliability Growth Models Using Kolmogorov-Smirnov Bounds

LEV V. UTKIN and FRANK P.A. COOLEN

2011, 7(1): 5-19. doi:10.23940/ijpe.11.1.p5.mag

Abstract

PDF (379KB)

An approach for constructing nonparametric imprecise growth models (regression models) is proposed. The approach is based on applying sets of probability distributions of the "noise" produced by means of Kolmogorov-Smirnov bounds. The corresponding growth models are constructed by minimizing the risk functional in the framework of predictive learning and by choosing "optimal" probability distribuions of the "noise" defining the minimax and minimin strategies. Numerical examples illustrate the proposed approach.
Received on March 16, 2009, revised on June 22, 2010
References: 16

Select Approximation of Minimal Cut Sets for a Flow Network via Evolutionary Optimization and Data Mining Techniques

CHI ZHANG JOSé EMMANUEL RAMIREZ-MARQUEZ

2011, 7(1): 21-31. doi:10.23940/ijpe.11.1.p21.mag

Abstract

PDF (134KB)

For the reliability analysis of networks, approaches based on minimal cut sets provide not only the necessary elements to obtain a reliability value but also, insight about the importance of network components. When considering a flow network, flow minimal cut sets –the equivalent of minimal cut sets in the binary case- identification is generally based on the a priori knowledge of binary minimal cut sets. Unfortunately, the enumeration of minimal cut sets is known to be an NP-hard problem. For complex and high density networks, obtaining an exact value of reliability may be prohibitive. Instead an approximation to the true reliability may suffice. In this paper, for the first time minimal cut set approximation for a flow network is done via the development of an optimization problem and an evolutionary algorithm to solve this model. The evolutionary algorithm is based on a data mining technique used to identify potentially optimal set of solutions- a subset of the true set of all cut sets that can be used to create a reliability bound and identify critical components.
Received on March 15, 2009, revised June 28, 2010
References: 27

Select Complex System Reliability Evaluation using Support Vector Machine for Incomplete Data-set

YUAN FUQING and UDAY KUMAR K. B. MISRA

2011, 7(1): 32-42. doi:10.23940/ijpe.11.1.p32.mag

Abstract

PDF (199KB)

Support Vector Machine (SVM) is an artificial intelligence technique that has been successfully used in data classification problems, taking advantage of its learning capacity. In systems modelled as networks, SVM has been used to classify the state of a network as failed or operating to approximate the network reliability. Due to the lack of information, or high computational complexity, the complete analytical expression of system states may be impossible to obtain, that is to say, only incomplete data-set can be obtained. Using these incomplete data-sets, depending on amount of missed data-set, this paper proposes two different approaches named rough approximation method and simulation based method to evaluate system reliability. SVM is used to make the incomplete data-set complete. Simulation technique is also employed in the so called simulation based approximation method. Several examples are presented to illustrate the approaches.
Received on March 16, 2010, revised on July 6, 2010
References: 15

Select Identification of Contradictory Patterns in Experimental Datasets for the Development of Models for Electrical Cables Diagnostics

P. BARALDI, M. COMPARE, E. ZIO, M. DE NIGRIS, and G. RIZZI

2011, 7(1): 43-60. doi:10.23940/ijpe.11.1.p43.mag

Abstract

PDF (437KB)

The state of health of an electrical cable may be difficult to know, without destructive or very expensive tests. To overcome this, partial discharge (PD) measurements have been proposed as a relatively economic and simple-to-apply experimental technique for retrieving information on the state of health of an electrical cable. The retrieval is based on a relationship between PD measurements and the health of the cable. Given the difficulties in capturing such relationship by analytical models, empirical modeling techniques based on experimental data have been propounded. In this view, a set of PD measurements have been collected by Enea Ricerca sul Sistema Elettrico-ERSE during past campaigns, for building a diagnostic system of electrical cable health state. These experimental data may contain contradictory information which remarkably reduces the performance of the state classifier, if not a priori identified and possibly corrected. In the present paper, a novel technique based on the Adaboost algorithm is proposed for identifying contradictory PD patterns within an a priori analysis aimed at improving the diagnostic performance. Adaboost is a bootstrap-inspired, ensemble-based algorithm which has been effectively used for addressing classification problems.
Received on March 30, 2010, revised on August 1, 2010
References: 13

Select Functional Safety Analysis including Human Factors

KAZIMIERZ T. KOSMOWSKI

2011, 7(1): 61-76. doi:10.23940/ijpe.11.1.p61.mag

Abstract

PDF (249KB)

In this paper selected aspects of human factors are discussed that should be taken into account during the design of safety-related functions for a complex hazardous installation and its protections. The layer of protection analysis (LOPA) methodology is used for simplified risk analysis based on defined accident scenarios. To control the risk the safety instrumented functions (SIFs) are identified and their safety integrity levels (SILs) determined based on results of risk assessment. Given SIF is to be realised by the electric/ electronic/ programmable electronic system (E/E/PES) or safety instrumented system (SIS) and the human-operator. The SIL is to be verified according to requirements and criteria given in international standards IEC 61508 and IEC 61511. Some issues concerning the alarm system (AS) designing with regard to human factors and related human reliability analysis (HRA) are outlined.
Received on November 18, 2009, revised on July 29, 2010
References: 30

Select Design of Experiment and Simulation for Identification of Significant e-Maintenance Services

PETER SÖDERHOLM, RAMIN KARIM, and OLOV CANDELL

2011, 7(1): 77-90. doi:10.23940/ijpe.11.1.p77.mag

Abstract

PDF (506KB)

The purpose of this paper is to present a methodology and a supporting toolbox that identify information-based maintenance support services using an evaluation of the services' impacts on the effectiveness of complex technical systems. A hypothetical aircraft and its support system are simulated in SIMLOX. The variables included in the model, as well as their expected effects on critical measures of system-effectiveness, were identified through interviews and studies of documents and the literature. The simulations have been planned and analysed according to established Design of Experiment (DoE) principles supported by MATLAB. Microsoft Access and Microsoft Visual Studio .NET has been used to integrate SIMLOX and MATLAB. The outcome of the study was scrutinised by both practitioners and statisticians. The methodology and its toolbox are useful for those involved in simulation work or the development of information services that support maintenance activities. The proposed systematic methodology, along with its supporting toolbox, identifies information-based services that are currently lacking when applying a Service-Oriented Architecture (SOA) approach during design and are considered valuable in identifying information-based maintenance support services within an eMaintenance solution.
Received on October 21, 2009, revised May 24, 2010
References: 31

Select Safety Oriented Bubble Diagrams in Project Risk Management

E.B. ABRAHAMSEN TERJE AVEN

2011, 7(1): 91-96. doi:10.23940/ijpe.11.1.p91.mag

Abstract

PDF (188KB)

In project risk management many firms use bubble diagrams to get a graphical presentation of a project's most uncertain attributes. The bubble diagrams and procedures used to put attributes into these diagrams are seen to provide a rational framework for managing risks. In this paper we review and discuss the use of these diagrams and procedures. Special attention is given to the way safety is treated. We show that the standard use of bubble diagrams is not adequate for identification and follow up critical activities that affect safety. The main problem is that the present structure means that the uncertainty is not properly taken into account. In this paper a reformulated bubble diagram is suggested that better reflects safety related uncertainties. The offshore oil and gas industry is the starting point, but the discussion is to large extent general.
Received on November 18, 2009, revised August 3, 2010
References: 12

Select Hybrid Poisson Processes with Fuzzy Rate

RENKUAN GUO, DANNI GUO, and CHRISTIEN THIART

2011, 7(1): 97-106. doi:10.23940/ijpe.11.1.p97.mag

Abstract

PDF (152KB)

Poisson processes, particularly the time-dependent extension, play important roles in reliability and risk analysis. It should be fully aware that the Poisson modeling in the current reliability engineering and risk analysis literature is merely an ideology under which the random uncertainty governs the phenomena. In other words, current Poisson Models generate meaningful results if randomness assumptions hold. However, the real world phenomena are often facing the co-existence reality and thus the probabilistic Poisson modeling practices may be very doubtful. In this paper, we define the random fuzzy Poisson process, explore the related average chance distributions, and propose a scheme for the parameter estimation and a simulation scheme as well. It is expecting that a foundational work can be established for Poisson random fuzzy reliability and risk analysis.
Received on November 18, 2009, revised August 23, 2010
References: 14

Online ISSN 2993-8341
Print ISSN 0973-1318