Username   Password       Forgot your password?  Forgot your username? 

ISSUES BY YEAR

Volume 13 - 2017

No.4 July 2017
No.4 July 2017
No.5 September 2017
No.5 September 2017
No.7 November 2017
No.7 November 2017

Volume 12 - 2016

Volume 11 - 2015

Volume 10 - 2014

Volume 9 - 2013

Volume 8 - 2012

Volume 7 - 2011

Volume 6 - 2010

Volume 5 - 2009

Volume 4 - 2008

Volume 3 - 2007

Volume 2 - 2006

Written by Claudio M. Rocco   

Guest Editorial:
Data Mining in Reliability and Risk Assessment

Volume 7, Number 1, January 2011, pp. 3-4

Claudio M. Rocco

Universidad Central de Venezuela

 

      Today, in many physical phenomena, the underlying first principles may not be known with certainty, or the system under study is so complex that a mathematical formulation is difficult to develop. However, with the growing use of computers, it has been possible to generate a great amount of data for such systems. In such cases, in stead of developing models from first principles, such readily available data can be used to derive models by extracting meaningful relationships between a system's variables (i.e., unknown input-output dependencies). The complex, information-rich data sets today are becoming common to all fields of business, science, and engineering. The ability to extract useful knowledge hidden in these data and to act on that knowledge is becoming increasingly important in today's competitive world. The entire process of applying a computer-based methodology for extracting knowledge or information from data is called data mining. Data mining is an iterative process and it is a search for new, valuable, and nontrivial information in large volumes of data.

      Basically, the objective of data mining is either prediction or description. Predictive data mining involves using some variables or fields in the data set to classify, predict unknown or estimate values of the variables of interest. Descriptive data mining, on the other hand, involves finding patterns and relationships described by the data that can be interpreted. Therefore, data-mining activities may fall into one of the foregoing categories.

      Today, data mining applications are available on all size computer systems, viz., for mainframe, client/server, and PC platforms. System prices range from several thousand dollars for the smallest applications up to $1 million a terabyte for the largest.

      The success of a data-mining depends largely on the amount of skill, knowledge, and ingenuity of the analyst. It is essentially like solving a puzzle. Data mining is one of the fastest growing fields in the field of science and engineering. Starting with the use of computer science and statistics, it has quickly expanded into a field of its own. One of the greatest strengths of data mining is reflected in its wide range of methodologies and techniques that have been embraced by it and applied to a host of problems.

      Data mining has its origins in various disciplines, of which the two most important are statistics and machine learning. With the use of Statistics, there has been an emphasis on mathematical rigor, a need to establish that is something sensible on theoretical grounds before testing it in practice. On the other hand, the machine-learning community has its origins in computer technology. This has led to a practical orientation, a willingness to test something out to see how well it performs, without waiting for a formal proof of effectiveness.

      Basic modeling principles in data mining also have roots in control theory, which is primarily applied to engineering systems and industrial processes. The problem of determining a mathematical model for an unknown is by observing its input-output data pairs and is generally referred to as system identification. The purposes of system identification from the point of view of data mining, is to predict a system's behavior and to explain the interaction and relationships between the variables of a system. System identification generally involves two top-down steps:

  1. Structure identification - In this step, a priori knowledge about the system is applied to determine a class of models within which the search for the most suitable model can be conducted. Usually this class of models is denoted by a parameterized function y = f(u,t), where y is the model's output, u is an input vector, and t is a parameter vector.
  2. Parameter identification - In this step, once the structure of the model is known, we need to apply optimization techniques to determine parameter vector t such that the resulting model y* = f(w,/*) can describe the system appropriately.

      In general, system identification is not a one-pass process: both structure and parameter identification needs to be applied repeatedly until a satisfactory model is developed.

      Several types of analytical software are available: statistical, machine learning, and neural networks and the idea is to seek any of the four types of relationships:

  • Classes: Stored data is used to locate data in predetermined groups.
  • Clusters: Data items are grouped according to logical relationships or the user's preferences.
  • Associations: Data can be mined to identify associations.
  • Sequential patterns: Data is mined to anticipate behavior patterns and trends.

      Data mining consists of five major elements:

  • Extract, transform, and load transaction data onto the data warehouse system.
  • Store and manage the data in a multidimensional database system.
  • Provide data access to business analysts and information technology professionals.
  • Analyze the data by application software.
  • Present the data in a useful format, such as a graph or table.

      The manual extraction of patterns from data has been there for centuries. Early methods of identifying patterns in data included Bayes' theorem (1700s) and well-known regression analysis (1800s). The proliferation, all pervasiveness and increasing power of computers has increased data collection and storage capabilities. As data sets have grown in size and complexity, direct hands-on data analysis has increasingly been augmented with indirect, automatic data processing. This has been aided by other useful techniques in computer science, such as neural networks, clustering, genetic algorithms (1950s), decision trees (1960s) and support vector machines (1980s). Nearest neighbourhood technique, rule induction (the extraction of useful if-then rules from data based on statistical significance), data visualization etc. are the newer techniques used in data mining.

      Data mining is today the process of applying these methods to data with the intention of uncovering hidden patterns. It has been used for many years by businesses, scientists, engineers and governments to sift through volumes of data.

      Realizing the importance of data mining to the field of reliability and risk, Professor Krishna B. Misra, Editor-in-Chief of IJPE requested me to bring out a special issue on Data Mining as applied to reliability and risk. Invitations were sent out to several researchers active in this field and the result of the exercise is that we received only four papers which relate to Data mining principles for this issue. However, it is hoped that these papers will act as catalyst to generate further interest of the researchers and readers to come forward to augment reliability and risk literature with data mining principles.

 


 

Claudio M. Rocco received his B.Sc. in Electrical Engineering and M.Sc. in Electrical Engineering (Power Systems) from Universidad Central de Venezuela and his Ph.D. from The Robert Gordon University, Aberdeen, Scotland, U.K. He is a full professor at Universidad Central de Venezuela in Operational Research post-graduate courses. He is a member of the editorial board of International Journal of Performability Engineering.

 

Important Access Information

Individuals, Institutions and Corporations with access via userid and password:

If you have a valid userid and password, you will need to login at the top of the screen. All volumes that are authorized by your subscription will be available for download. If you are not authorized, you will see an 'add to cart' option and you have the choice of purchasing the articles. You may also apply for subscription from our Subscription page.

 

If you cannot access any paper and you feel your subscription entitles you to access, please notify us by using the contact form on the Contact Us page. Please provide as much detail as possible.

 

Institutions and Corporations with access via IP addresses:

If you have a subscription via IP addresses, all volumes that are authorized by your subscription will be available for download. If you are not authorized, you will see an 'add to cart' option and you have the choice of purchasing the articles. You may also apply for subscription from our Subscription page.

 

If you cannot access any paper and you feel your subscription entitles you to access, please verify that the EXTERNAL IP address of your computer is authorized. Before contacting us, contact your system administrator and confirm if the IP address of your network has been authorized for access. To find out the EXTERNAL IP address of your network, simply open any Internet browser and point it to : http://www.WhatIsMyIP.com. Your EXTERNAL IP address will be shown in the browser window.

 

Once you have this information, you may email us using the contact form on the Contact Us page. Please provide as much detail as possible. You may also email a screenshot of the browser to :subscriptions@IJPE-Online.com

 

Thank you for your understanding. We will try our best to reply within 24-48 hours.

This site uses encryption for transmitting your passwords. ratmilwebsolutions.com