Int J Performability Eng ›› 2010, Vol. 6 ›› Issue (4): 301-.doi: 10.23940/ijpe.10.4.p301.mag

• Editorial •     Next Articles

Editorial:Designing for Maintainability and Optimization of Maintenance Policy an important step of achieving Performability

Krishna B. Misra   

  1. Editor-in-Chief, International Journal of Performability Engineering


The next important attribute of achieving performability after quality, reliability (see Editorials Vol.6, No.1-3, 2010) is maintainability, which needs to be designed and built into any system or product and realized through maintenance during its operational phase. From BS 4778-3.1:1991 or BS 3811:1993 or MIL-STD-721B, maintenance is defined as a "process of maintaining an item in an operational state by either preventing its transition to a failed state or by restoring it to an operational state following its failure". Therefore, the primary aim of maintenance is to prolong the state of functioning by not allowing an item to deteriorate its condition.??? There are several approaches to maintenance based on the expected use and maintenance schedule of an item. Economic considerations are closely linked to maintenance and system lifecycle; as failure to consider design's effects on maintenance, and vice versa, can have adverse affects on profit. Therefore, design and maintenance are simultaneously planned in order to ensure an efficient and cost-effective operation over the life of a product.

Generally, there are three types of maintenances in use, viz., preventative (PM), corrective (CM), predictive maintenance (PdM). Maintenance can also be classified according to the degree the maintenance work is carried out to restore the equipment in relation to its original state. For example, a "Perfect Maintenance" is one which restores the equipment to as good as new condition. Minimal Maintenance results in equipment having the same failure rate as it had before the maintenance action was initiated. This is called - as bad as old state. Imperfect Maintenance is one in which the equipment is not restored to as good as new but relatively younger (a state in between as good as new and as bad as old). Worse Maintenance is one which results (unintentionally) in an increase of equipment's failure rate or actual age but does not result in break down. While the maintenance that results in equipment's breakdown is termed as "Worst Maintenance". Accordingly, the PM or CM would belong to one of the above categories.

PM is a schedule of planned maintenance actions aimed at preventing an equipment failure before it actually occurs and to keep it working and/or extend its life. It is performed on a regular basis. For example, lubrication of mechanical systems is done after a certain number of operating hours or the replacement of lightning arresters in jet engines is done after a certain number of lightning strikes. PM designed to enhance the equipment reliability by replacing worn components before they actually fail and this includes activities like equipment checks, partial or complete overhauls at specified periods, oil changes, lubrication and so on. An ideal preventive maintenance program is one which prevents all equipment failures before they occur.

Preventive maintenance is a logical choice if the following two conditions are satisfied:

  • The equipment has an increasing hazard rate, thereby implying a wear-out situation.
  • The overall cost of the preventive maintenance actions (which include ancillary tangible and/or intangible costs, such as downtime costs, loss of production costs, lawsuits over the failure of a safety-critical item, loss of goodwill, etc.) must be less than the overall cost of a corrective action.

If an item has an increasing failure rate, then a carefully designed PM program is likely to improve system availability. Otherwise, the costs of PM might actually outweigh the benefits. Also it must be made explicitly clear that if an item has a constant failure rate, then PM will have no effect on the item's failure occurrences. A good preventive maintenance program should either minimize the overall costs (or downtime, etc.) or meet the reliability/ availability goals. In order to achieve this, an optimum interval of time must be determined for the scheduled maintenance. Long-term benefits of preventive maintenance include, improved system reliability, decreased cost of replacement, decreased system downtime, better spares inventory management. Thus long-term effects and cost comparisons usually favor preventive maintenance over performing maintenance actions only when the system fails.

Predictive maintenance (PdM) or condition based maintenance (CBM) is carried out only after collecting and evaluating enough physical data on performance or condition of equipment such as temperature, vibration or particulate matter in oil etc. by performing periodic or continuous (on-line) equipment monitoring. Analysis is then performed on the collected data to prepare an appropriate maintenance plan. PdM technologies used to collect information of equipment condition can include infrared thermography, acoustic (partial discharge and airborne ultrasonic), corona detection, vibration analysis, sound level measurements, oil analysis, motor current analysis and other specific online tests. The basic aim in PdM is to perform maintenance at a scheduled point in time when the maintenance activity is most cost effective but before the equipment fails in service. Most PdM inspections are performed while equipment is in service, thereby minimizing disruption of normal system operations. This type of maintenance is generally carried out on mechanical systems where historical data is available for validating the performance and maintenance models for the systems and the failure modes are known.

Although there are sophisticated techniques for condition monitoring available these days, however, the main determinant of frequency of condition monitoring is the PF interval, which is the lead time at which an incipient failure can first be detected, until functional failure occurs. The PF Interval can only be approximately estimated even today. Any error tends to be on the conservative (i.e., too frequent) side. However there are cases of bearing failures that have occurred undetected, despite these bearings being monitored at these conservative frequencies.? However, smart sensor technology is likely to reduce the complexity of linking the outputs of these sensors to current process control systems thereby more and more equipment can be monitored continuously, on-line, and the control room operators will be able to assess quickly and easily, the current condition of the bearings or alignment or balance or gears on a particular machine. Several expert systems for fault diagnosis are available today. However, at present, these expert systems are still essentially rule-based systems, and like all rule-based systems, the results are only as good as the rules that have been established within the system.

Corrective maintenance (CM) consists of the actions taken to restore a failed equipment or system to operational state. This maintenance usually involves replacing or repairing the component that caused the failure of the overall system. CM can be performed only at unpredictable intervals because the item's failure time is not known a priori. An item becomes operational after CM or repairs have been performed. Corrective maintenance is actually carried out in three steps:

  • Diagnosis of the fault: It is the process of locating the fault or failed parts or otherwise satisfactorily assess the cause of the equipment or system failure.
  • Repair or replacement of faulty components: Once the cause of a failure has been established, action is taken to remove the cause, usually by replacing or repairing the failed components.
  • Verification of the repair action: After the faulty components have been repaired or replaced, the repair crew must verify that the system is again successfully operating.

The total time taken to repair the equipment is known as down time (DT), and the uptime of an equipment or system is the time during which it is available or operating.? In fact, DT is the sum of the administrative time, logistic time and the actual repair time. The administrative time is the time spent in organizing repairs. This excludes the logistic time, which is the portion of down time during which the repair activity is suspended or delayed on account of non-availability of spare parts or replacements.? The actual repair time or active repair time is the time during which the repairmen are working on the equipment to affect the repairs. This time in fact is the sum of the time to locate the fault or faults and for identification of the fault, fault correction time, and finally the time taken for testing and recommissioning the equipment. It is apparent that the repairability, which is the probability that the equipment or system will be restored to operable state within a specified active repair time, depends on the training and skill of the repair crew as well as on the design of the equipment. For example, the ease of accessibility of components in equipment has a direct effect on the active repair time. However, the human factors to a large extent govern the duration of active repair time.

Statistically speaking the uptimes and downtimes are random variables and will have their distributions. Based on these distributions, one can compute mean uptime (MUT) and mean downtime (MDT). Actually mean uptime reflects how good the inherent design or built-in reliability is and mean downtime reflects how good the maintainability is? There are other measures of performance of the maintained equipment such as point availability and interval availability.

Reliability Centered Maintenance (RCM) is an approach that helps in deciding what maintenance tasks must be performed at any given point of time. The value of RCM lies in the fact that it recognizes that the consequences of failures are far more important than their technical characteristics. In fact, it recognizes that the only reason for doing any kind of proactive maintenance is not to avoid failures per se, but to avoid or at least minimize the consequences of failures.

The total productive maintenance is a proactive equipment maintenance strategy designed to improve overall equipment effectiveness .It actually breaks the barrier between maintenance department and production department of a company. Total Productive Maintenance: It is an approach to optimize the effectiveness of production means in a structured manner.

Computerized Maintenance Management System (CMMS) also known as Enterprise Asset Management (EAM) is a stand alone computer program to manage maintenance work, labour and inventory in a company, whereas EAM not only does all the above functions what a CMMS does but also integrates with the company financial, human resource, material management and other ERP (Enterprise Resource Planning) applications. In the past, stand alone CMMS had an advantage over EAM in terms of features, ease of use and functionality.? A detailed description of maintenance models, strategies and analysis is given in [1].

1. Handbook of Performability Engineering, K. B. Misra (Ed.), 76 chapters, pp. 1315, Springer, 2008