The Essentials of Reliability-Centered Maintenance

John E. Skog P.E.

Doble Consultant

 

 

 

Introduction

Traditional scheduled maintenance is based on the premise that every item in a complex system has a "right age" at which a complete overhaul is necessary to ensure safety and operation reliability. Historically, we have discovered that many types of failures can not be prevented or their frequency reduced by such maintenance activities no matter how intensively maintenance is performed. In response to this problem, design engineers have mitigated the consequences of such failures by making their designs "failure-tolerant". The recognition that the relation between aging and maintenance is not simple or straight forward and the implementation of improved equipment designs has forced maintenance personnel to re-evaluate the concepts of traditional scheduled maintenance.

Reliability Centered Maintenance (RCM) is a process that was developed as a result of the above conditions. The process was so named to emphasize the role that reliability theory and practice plays in properly focusing PM activities on the retention of the equipment’s original design reliability. RCM allows one to obtain the full design operating ability of the equipment. It does not necessarily identify a new series of maintenance tasks. It identifies those tasks, which are most applicable and ineffective, and at the same time provides a framework for developing an optimal preventive maintenance program.

The Essentials of RCM

Five key features characterize the RCM methodology. These features generally set RCM apart from other maintenance planning process.

    • Preservation of Function
    • Identification of Failure Modes that Compromise Functionality
    • Understanding the Effects and Consequences of Failure
    • Determining the Cause of Failure
    • Selection of Applicable and Effective PM Tasks

Each of these features will be discussed in the following sections.

Function

Function may be defined as the normal or expected response of a system or equipment to known inputs, generally it is defined in terms of performance capabilities. As an example, in substations, the functions of a transformer include insulating the high voltage conductors, transforming power from one voltage/current level to another, regulating the voltage within a prescribed range as well as numerous other activities.

Maintenance has traditionally focused on preservation of equipment while RCM has shifted its focus to preservation of function. The change of focus from equipment preservation to preservation of function is generally difficult for many maintenance engineers to accept. While this initial focus is on function rather than on the equipment itself, the ultimate results are critical operations and thus critical equipment are preserved through RCM. By addressing function first rather than equipment, we are much more able to identify those key functions that are of strategic and operational importance to the substation and utility system. The identification and ranking of functions enable the RCM analyst to systematically decide at later stages of the process just what equipment supports each function and not get caught up in assuming that all equipment is of equal importance, a tendency that has pervaded traditional PM planning approaches.

Let us look at an example of a function approach to maintenance. In the previous example of the power transformer, one of the functions listed was transformation of power from one voltage/current level to another at a prescribed level. In order to achieve this functional requirement, oil pumps and fans may be used to assure that the transformer is operating within prescribed temperature limits. If the transformer has multiple stages of cooling and multiple combinations of fans and pumps, it may be determined that the cooling function is not significantly compromised if a pump or fan fails to operate. The functional analysis reveals that the cooling function can be preserved even if one of the cooling components fails. The net result is a cooling maintenance program that can allow the loss of one or more cooling components without the loss of the whole cooling function. If the analysis were to focus only on the equipment, loss of a single fan or pump would be considered an unacceptable failure and a PM program for pumps and fans would be required.

Failure Modes

Since the primary objective of maintenance is to preserve system function, then the loss of a function or "Functional Failure" is the second item of consideration in RCM. Function preservation means avoidance of functional failures. The first step in avoiding functional failures is identifying how functions may be defeated. By understanding how functions may be lost, we can ascertain the actions required to prevent, mitigate or detect the onset of a functional loss.

A failure mode describes in a general way, the manner or sequence of events that lead to the loss of a particular function. Failure modes describe how a function is lost without offering specific reasons or causes for the failure. Failure modes are usually more than just a single, simple statement of functional loss. Most functions have two or more failure modes. For example, one failure mode for a transformer cooling pump may be failure to start, a second may be failure to run and a third may be failure to pump at rated capacity.

The distinctions in failure modes are essential so that ultimately, the proper importance ranking for each function can later be determined. In the RCM process, where the primary objective is to preserve function, there is an opportunity to decide, in a very systematic way, just what order or priority to assign maintenance budgets and priorities. In other words, "all functions are not created equal, " and therefore all functional failures and their related components and failure modes are not created equal. Thus by using a decision tree process, failure modes are prioritized.

 

Failure Effects and Consequences

It is the consequences of a failure that generally determines if preventive maintenance should be considered. If a failure takes place, and no operational, economic or safety consequence results, the need for preventive maintenance is extremely difficult to justify. Conversely, if significant consequences result, effective preventive maintenance procedures must be identified and implemented.

When the loss of a function occurs, the effects of the failure may different depending on one’s frame of reference. In order to understand the consequences of a failure, one must understand the effects that result on a local, system and remote basis.

Local effects are the consequences observed at the failure site. Local effects include interactions between the failed component and the surrounding equipment. As an example the loss of oil from a transformer may have negative effects on the local environment if it contaminates other equipment or enters a storm water drainage system.

System effects are those impacts that the loss of function poses to the substation or electrical system. As a general rule, all failures that impact the functionality of the system being analyzed are system effects. Continuing with our previous transformer example, the loss of cooling may result in a system consequence of reduced transformer output.

Remote affects are those effects on equipment and systems outside the boundaries being analyzed with the RCM process. Remote effects have broader impacts on the substation and utility and may be beyond the immediate scope of the RCM analysis. Continuing with our previous transformer example, the loss of cooling may result in the need to overload other transformers in adjacent substations.

 

Failure Causes

When a failure mode results in a significant failure consequence, the failure mode is considered critical and preventive maintenance justified. In order to prevent the failure from occurring, the specific events that lead to the loss of function must be determined. It is the cause of failure, not the failure mode that must be addressed by preventive maintenance.

When the exact cause of a functional failure is determined, an appropriate and cost-effective PM task can then be identified. Generally, only when the failure cause is dominant, will a PM task be considered. If the cause is not dominant, the chances of the PM task being effective are small. The resulting preventive maintenance tasks are focused on predicting, preventing or resolving the cause of the failure.

Task Selection

Task selection involves the identification of appropriate tasks to address the causes of critical failure modes identified during the RCM analysis. In a pure sense, selecting tasks involves the identification of technically and cost effective maintenance actions that are best suited at predicting, preventing or mitigating the effect of a functional failure. In a broader sense, task selection also includes redesign, failure finding procedures and corrective maintenance. The RCM process for identifying effective PM tasks is structured and rigorous.

The Keys of RCM

The five features discussed above are the essentials of RCM. Identifying and understanding those functions critical to the operation of the substation or electrical system is the key to RCM. Once understood, continuous availability of these critical and important functions then becomes the foremost goal of the maintenance program. By fully understanding the events or causes that lead to the loss of these critical functions, one can develop focused maintenance activities that predict, prevent or resolve these failures. Finally, by determining the operational effects resulting from the loss of these critical functions, one can then quantify the value of preventing the loss and thus ensure that all preventive and mitigating actions are cost effective.

In the Next Doble Exchange: Task Selection Logic

Editors Note: John Skog is a Doble Consultant in the area of Maintenance Management. He introduced the Client Group to RCM in 1993. John performs RCM training and consulting services through Doble and is available to assist clients in the refinement of their current maintenance programs.