|
PREDICTIVE
MAINTENANCE ACTIVITIES IN THE CONTEXT OF A MAINTENANCE STRATEGY REVIEW
Dr. R A Platfoot
Covaris Pty Ltd
28 Joseph Street
Ashfield NSW 2131
Summary: This paper presents material associated with assessing the effectiveness
of a variety of condition monitoring approaches within the context of
a predictive/preventative maintenance strategy. It promotes the concept
that earlier thinking on the application of RCM may need to be assessed
in the light of the growth of lower cost predictive maintenance and life
forecasting technologies, and hence whether or not primary outcomes of
tools such as FMECA need to be further skewed to embrace more fully the
details and attributes of various life assessment possibilities. The application
of RCM Turbo as an automated/semi-electronic diary approach to employing
RCM is tested for relevant aspects such as the application of its criticality
scores to assigning equipment criticality rankings within the maintenance
system, thereby assisting the scheduling and prioritization of predictive
maintenance tasks.
1. INTRODUCTION
Reliability-centred maintenance (RCM) has been a flagship process for
improving preventative maintenance systems for close on four decades [1],
with the emphasis of moving from either a breakdown maintenance approach
or a totally scheduled discard approach, to a considered balance of run
to failure, scheduled tasks and predictive maintenance, [2]. The original
uses of RCM, aircraft manufacturers such as Boeing, have access to detailed
and comprehensive materials and structural data, and the in –house
competence to assess and act on a very detailed appreciation of the physics
and chemistry of the damage modes that are intended to be prevented by
maintenance tasks designed in accordance with RCM. Hence RCM has its roots
in companies that included life assessment and detailed failure mode analysts,
which is a capability that is rarely included in modern companies’
maintenance teams, [3].
The USAF distinguishes between two types of failure modes, [4]:
Functional failure – the loss or part-loss of the intended design
capability of a maintenance unit (maintenance unit is an item to be maintained
using something like a preventative maintenance work card; it can be a
part or a system)
Engineering failure – sometimes called the damage mode, although
in fact the damage mode is only one attribute of the engineering failure
– the precise mechanism by which degradation of the maintenance
unit arises and may be described as a type of damage (eg fatigue cracking,
abrasion wear, weld defects, etc) and the environmental drivers that promote
the rate of damage
The work described in this paper closely considers the engineering failure
mode, and uses considerations from this assessment to design predictive
maintenance tasks to determine the presence of the relevant failure mode
and assess whether or not remedial action is required on the basis of
the extent of the flaw.
The FMECA MIL STD 1629A [5] calls for failure detection activities to
be entered into the assessment table towards the end of the FMECA process.
In contemporary maintenance engineering, with the growth of diagnostic
engineering including condition monitoring (determining failure modes
with the unit in operation or recently in operation), non-destructive
testing (NDT - determining likely time for material failure, usually associated
with static plant not in operation) and remote surveillance (automated
implementation of either condition monitoring or NDT), the failure detection
task is rapidly becoming dominant due to its affordability and the breadth
of the spectrum of damage modes that can be measured by the technologies.
Predictive Maintenance Activities in the Context of a Maintenance Strategy
Review
R Platfoot
1
Covaris Pty Ltd Strategic Reliability Conference, 2002
A common criticism of the RCM process is the time required to complete
the analysis and create worthwhile maintenance tasks. In general this
results from two problems: imperfect understanding of the process leading
to inefficiencies in its applications, and lack of project management
skills to control a long-term project within the scope of day-to-day operations,
[6]. However with the introduction of new predictive maintenance technologies,
plus consideration that insufficient people in most maintenance teams
have a good understanding of damage modes, the traditional process for
implementing RCM should be challenged. This is fully in keeping with the
spirit of the developers of RCM as described in Nolan and Heap’s
source documentation, [3].
This paper considers the spectrum of predictive maintenance technologies
now available and where predictive maintenance fits within the traditional
RCM decision tree. Recent developments by the author and his colleagues
in implementing FMECA’s with a focus on the failure detection task
are considered. Finally, an important outcome of a RCM analysis, namely
the equipment criticality ranking is considered and demonstration of its
use in prioritization of scheduling and risk management is demonstrated.
2. PREDICTIVE MAINTENANCE AND LIFE FORECASTING
The process of predictive maintenance and life forecasting is shown in
Figure 2.1. It is based on the principles of remnant life analysis and
integrates the original FMECA thinking with the conduct of condition-based
maintenance and the analysis of the inspection or surveillance data ensuing
from that maintenance, [7].
The principles of Figure 2.1 are aligned with common RCM literature whereby
a time at which is possible detect the failure is distinguished from a
remaining time to address the failure, [3,6]. The challenging part, which
is rarely described in detail in the same literature, is how to determine
the magnitude of these time increments.
Life forecasting is based on understanding the accumulation of damage
within a part or a system, and then based on an estimate of the rate of
that accumulation with a specified operating profile, calculating the
remaining time to failure. Undertaken properly this is a complex analysis
and can only be undertaken for critical, high cost capital equipment items
by domain experts.
Predictive maintenance, for the purposes of this paper, is categorized
as follows:
Scheduled inspections – operator and maintenance staff checks
Condition monitoring – a specialised form of check that ascertains
the presence of a damage mode within an operating item of equipment
Non-destructive testing – a specialised check that ascertains the
presence of degradation within the material of an item of equipment –
usually conducted on a static item of plant
Remote surveillance – feedback from installed equipment that indicates
either an unwanted mode of operation that will accelerate damage or the
presence of a damage mode itself
Predictive maintenance technologies are becoming more readily available,
more precise and greater confidence can be given that they will find damage
modes, [8]. Further the supporting software for many of the technologies
is delivering an “engineer in a box” so that diagnostic capabilities
are being better disseminated in the wider industry. As a result predictive
maintenance is getting cheaper and more accurate, improving its capability
to supersede other alternatives in the standard RCM decision making tree.
Predictive Maintenance Activities in the Context of a Maintenance Strategy
Review
R Platfoot
2
Covaris Pty Ltd Strategic Reliability Conference, 2002
AssetSpecificationDamage ModesOperationsImpact, eg Load,temperature, wearparticlesEnvironmentalacceleration,
egSalt, ambienttemperatureFunctional FailureconsiderationsDesign incl.Service
life,purposeDetectionConservativemaintenancestrategyFailure of assetScheduled
task- discard, overhaulYesNoNoSelection ofinspection taskYesSimple inspectionConditionmonitoringNon
destructivetestingPerformanceanalysisSpecification ofTaskPeriodicityOperatingcondition
of assetInternal/externalaccessSkills, equipment,methods sheetsPreventativeMaintenanceScheduleExecution
of taskReturn of CheckSheetsWritten sheetsElectronic datatransferAnalysisComparison
tochange-out limitsTrend analysisLife assessmentCondition-basedmaintenance
taskForecast time tonext taskForecast time tocapitalreplacementIncludes
key elements ofDiagnostic-focused FMEACriticality of task
Figure 2.1 Predictive maintenance
3. DIAGNOSTICS-FOCUSED FMEA
Within this section we will only cover FMEA rather than the FMECA. The
treatment of criticality will be undertaken in Section 5 of the paper.
The FMEA process adopted for the purposes of this paper, and which is
consistent with definitions within the MIL STD, comprises four stages:
Predictive Maintenance Activities in the Context of a Maintenance Strategy
Review
R Platfoot
3
Covaris Pty Ltd Strategic Reliability Conference, 2002
1. Identification of failure modes – including both the functional
failure mode which is consistent with RCM requirements and the engineering
failure mode which includes definition of the damage mode and assists
with step 3 below
2. Identification of failure effects at different system levels
3. Determination of means to ascertain presence of the failure mode
4. Other information to be sought, including expert opinion to check assumptions
The emphasis on steps 3 and 4 is a variant from the emphasis or focus
of the MIL STD and general FMEA literature, which anticipate a back-to-back
RCM decision analysis process to be also undertaken. In this departure,
we are spending more time thinking on an appropriate surveillance task
than a scheduled change-out or discard task, or even run to failure, which
are all further options in the RCM decision logic.
The further variant is that the design change option in the RCM logic
is modified to a design change to allow improved surveillance rather than
modification to achieve function. The reasoning here is that in many cases,
the need for a design change is to handle transient peaks in operating
duty, which were unforeseen by the original designer. We may argue that
improved surveillance can assess the onset of such peaks and allow corrective
operation, or we can detect such peaks and analyse the loss of life subsequent
to the event. In the case that it is expected that the service life will
be reduced, the consequent design change or early retirement is a condition-based
corrective task in the future – it does not have to be considered
as part of the analysis.
The diagnostics-focused FMEA is therefore not so much concerned with prognostics
(the understanding of future problems) which is its traditional thrust
[9], but rather with ascertaining appropriate diagnostics strategy, which
is understanding how to detect problems well in advance of when necessary
action is required. Hence we are moving from a focus on functional failure
and what causes it, to determining the presence of engineering failure
modes. In this work engineering failure modes are defined as a damage
mode for which we understand the environmental parameters that initiate
the mode and then cause its propagation. This two-stage life is important:
in the case of common kinds of fatigue cracks, the crack initiation process
will take up perhaps 80% of the service life of the asset. Hence often
we can retain cracked items in service and simply monitor possible propagation
of that cracks. This is called tolerable flaw analysis. The FMEA is therefore
obliged to also collate understanding of the tolerable limit of the damage
mode before condition-based action is necessary.
The fields used in a diagnostics-focused FMEA are set out below:
Item
Meaning
Comment
Component name
Specific item for which failure modes are being considered - may be a
subcomponent of a larger system
Function
The operation and purpose for the component - what it is supposed to do
and within what limits
Function failure mode loss
In what way can the function not be provided when required
Functional Failure is defined by USAF in their application of FMECA as
"The failure of an item to perform its normal or characteristic actions
within specified limits". We interpret "between specified limits"
as being a partial loss of availability - see next cell below. Functional
failure is included in specification of failure mode, however had to also
identify the engineering failure mode information - hence the use of terminology
Predictive Maintenance Activities in the Context of a Maintenance Strategy
Review
R Platfoot
4
Covaris Pty Ltd Strategic Reliability Conference, 2002
Item Meaning Comment
Function failure mode impaired
In what way the function, which may still be available when required,
is not to the standard, output or capacity for which the component was
designed
Impairment of the function is a functional failure under the definition
provided in the Standard. The maintenance approach must deliver quality
and performance. However it is also important to distinguish between total
loss of capability (an availability loss) and a partial loss of capability
(or partial loss of availability)
Functional failure mode operating condition
When functional failure occurs, what is the operating condition of the
component - dormant, steady operation, transient, startup shutdown
This is part of the failure mode info, and is a special field that is
isolating this info - in fact the above two rows plus this one are all
about info rolled into the failure mode column of a traditional FMECA
– We split them out for readability and to ensure that the analyst
provides the comment
Engineering failure mode condition criteria
What is the nature of degradation, damage or decay which constitutes the
reason for the loss or impairment of function or divergence from acceptable
or safe condition
Nature of the EFM is cracking, thinning, or some form of loss of integrity
- this gets more complicated when we go into tolerable flaw limits, eg
crack length greater than 10 mm etc - hence the damage mode may be present
but does not yet represent a failure - many structures crack or deform
under first loading and are then OK for their remaining service life
Engineering failure mode - damage mode
Of the 22 know damage modes (eg creep, fatigue, corrosion, wear etc) what
is the precise name for the prevailing damage mode that has led to engineering
failure
Cracking may be due to thermal fatigue, cyclic loading, high cycle/low
cycle etc - hence we are seeking the metallurgical cause of the failure
condition criteria
Engineering failure mode - internal/external
Is the engineering failure mode evident from the outside of the component
and system which contains the component or does a machine or system need
to be opened up to expose the engineering failure mode
This is a critical field and is not necessarily shown up in failure effects
- we need to have defined whether this mode is internal (need a strip
down or some special technology to detect) or external (can be possibly
detected by a simple detection). The reason it is here is that it is part
of the process in thinking about the failure mode
Engineering failure mode - cause
What are the physical processes, environmental or other parameters that
have contributed to the progress of the damage mode
Need to ask the question this way since we are still working on failure
mode identification and needed to determine accelerants such as the environment
etc Yes the idea is to prevent it, but we are not yet at the step of prevention:
we remain at the step of listing all information identifying the failure
modes
Failure effect - local effect
What is the consequence of the failure mode on the actual component being
studied
Failure effect - next level effect
What is the consequence of the failure mode on the machine ore system
that contains the component
Failure effect - system effect
What is the consequence of the failure on the overall process, production
line, utility or business unit in which the component's system or machine
is located
As per all FMECA references including RCM II, we need to list a number
of levels of effects within a system - instead of lumping them all into
one field called failure effects we list them for different levels in
different fields. This helps a lot in evaluation of criticality
Failure effect - consequence of failure
Standard RCM consequences - safety & environment, operating loss,
no real loss, hidden failure - something else is now exposed to significant
loss if another component or system fails
This field is not strictly compliant with the MIL STD but we put it in
since we wanted somewhere to list the four RCM/ RCM II consequences -
this is a field normally found on the RCM spreadsheet (eg MIL STD 2173,
RCM II) but it does no harm over here.
Detection mode - inspection
If a simple visual and measurement check is suitable, what is the means
by which the damage mode and the status of the failure mode can be detected
We are not writing maintenance task yet. What we want is to elaborate
the traditional Failure Detection Means field into a number of categories.
This is a departure from the FMECA literature and is where we consider
this approach
Predictive Maintenance Activities in the Context of a Maintenance Strategy
Review
R Platfoot
5
Covaris Pty Ltd Strategic Reliability Conference, 2002
Item Meaning Comment
Detection mode - overhaul
If an overhaul is required to detect the damage mode, what is required
to be done
a new variant to FMEA
Detection mode - condition monitoring
If condition monitoring (eg vibration checks, temperature checks, current
analysis) is required using expert methods, what is the method that will
detect the damage mode (eg accelerometers, thermography, infrared tachometry,
oil analysis)
Primarily used for rotating failure modes as well as some electrical modes
Detection mode - NDT
If NDT (eg thickness checks, crack checks) is required using expert methods,
what is the method that will detect the damage mode, eg ultrasonics, dye
penetrant, fluorescent, radiography, EMAT.
Primarily used for static failure modes
Detection mode - does the equipment need opening up
Based on past answers, simple yes/no
Expert opinion sought - metallurgist
Is an expert metallurgical opinion on the component required - typically
static plant type problems such as material loss, cracking, corrosion,
micrography
Major concern as to whether the general maintenance analyst is competent
to precisely define the metallurgical phenomena of the damage mode and
thereby define an appropriate predictive maintenance task
This is also very helpful in tolerable flaw analysis – do we replace
at the first sign of damage or after damage has propagated beyond a tolerable
limit
Expert opinion sought - Process engineer
Is an expert opinion on the process required - eg what are process signs
of a failure, what are the consequences of a failure
Important for feedback on environmental drivers of the damage mode –
special considerations for likely transients or external factors on the
item to be maintained
Expert opinion sought - condition monitoring
Is an expert condition monitoring opinion required - typically rotating
plant type problems such as vibration, current analysis on heavy drives,
but also thermography, oil analysis etc.
New options available in condition monitoring or whether there is a cost
benefit for in situ monitoring to be established
Compensating provisions - redundancy
If the failure arises, is there redundancy in the process so that the
consequences are limited to either the component or the components immediate
surrounding system or machine
Compensating provisions - operational
Can the operating staff continue to meet their production or operating
requirements through some work around or adjusting of the process or production
schedule
Compensating provisions - design
Are there aspects about the design that limit the consequences (ie impact)
of the failure mode
This is not covered in the failure effect and is a standard FMECA field.
Compensating provisions are used in most military applications of FMECA
for example, including USAF
The considerations within the diagnostics-focused FMEA listed above easily
transfer into a study of the condition monitoring strategy for major assets.
This is covered in the next session.
4 CONDITION MONITORING ASSESSMENT
The key issue in assessing a condition monitoring program is the need
for value assessment of the types of inspections and technologies adopted.
Value can be measured in terms of the following: Predictive Maintenance
Activities in the Context of a Maintenance Strategy Review
R Platfoot
6
Covaris Pty Ltd Strategic Reliability Conference, 2002
• Risk reduction – early warning of failure
• Reliability improvement – the maintenance tasking (i.e.
condition-based maintenance) is effective in preventing failures
• Cost improvement – the timing of maintenance tasks is optimized
and sufficient advanced planning assures minimum overall logistics costs
(including materials, transport, storage and time lost in waiting for
parts)
• Capital improvement – optimum timing for replacement of
capital assets based on pure reliability and business risk considerations,
not including capability development or generation of new capability
Hence there are two parts to the assessment – did we know what value
we were acquiring when establishing the condition monitoring program (this
would be answered by a diagnostics-focused FMEA or equivalent), and can
we measure the outcomes of the condition monitoring program (and general
predictive maintenance approach) in terms of the above criteria.
What has been implemented in this company is a selection of best practice
methods identified from a range of sources, not least the alliance with
Honeywell. This has been a commendable investment in policy, implementation
of technology and commitment to best practice. However the time has come
for so-called best practice methods to be challenged.
4.1 Percentage Maintenance Work Type
A preliminary analysis of value is identifying equipment, subject to well
understood and relatively common forms of condition monitoring, which
only achieve a degree of proactive maintenance of less than an optimum
percentage of the total maintenance hours allocated to them. In this case
proactive maintenance is a combination of scheduled preventative maintenance
and condition-based maintenance. The key issue for such equipment is how
much condition-based maintenance is actually driven by the scheduled inspection
and condition monitoring work.
Generic work types include:
BD
Breakdown maintenance - urgent repair to an equipment stoppage where the
process is impeded
CM
Corrective maintenance – scheduled task that is generated by an
unscheduled event such as an observation, a breakdown that can wait for
repair or an ad hoc request for maintenance
PM
Scheduled maintenance task, commonly called preventative maintenance –
but which covers all calendar or runtime scheduled tasks including predictive
maintenance tasks such as inspections and condition monitoring procedures
CBM
Condition-based maintenance – tasks generated as a result of another
scheduled (ie PM) task, typically an inspection or condition monitoring
procedure
Plant improvement
Ad hoc enhancement task or major scheduled refurbishment conducted within
the maintenance budget
From time to time other maintenance tasks may be necessary such as overheads,
investigations and so on.
If the work type definitions are used the proportion of PM to CBM will
indicate where PM inspections are providing value. The metrics used in
this work to analyse maintenance effectiveness include:
typesworkAllCMBD+
Proportion of reactive work to combined proactive and reactive work
Predictive Maintenance Activities in the Context of a Maintenance Strategy
Review
R Platfoot
7
Covaris Pty Ltd Strategic Reliability Conference, 2002
PMCBM
Proportion of corrective action driven by scheduled inspections to the
total amount of scheduled work
Two sample analyses are provided in Figure 4.1 for facilities where the
degree of scheduled inspections far outweighs their benefit in driving
corrective maintenance. Figure 4.1(a) is a standard KPI chart for assessing
the value of maintenance, and indicates two opportunities: reducing unnecessary
PM checks and reducing high reliance on CM work.
020040060080010001200PMHKCBMCMCANumber of work orders020040060080010001200PMHKCBMCMCANumber
ordersHigh level of repair and replace workWasteful and inefficient PM’sWork
Types
1. Proportion of annual work in various work categories – total
amount of expenditure on inspection work known to be high compared to
value being obtained – provides insight into overall maintenance
performance – the proportion of PM:CBM is too high
% P Tasks per Equipment Item8/99 –2/01020406080100120Equipment Areas%
of Total Work% WorkCheck for over maintenanceConsidered to be efficient
workReliability concerns
2. Coverage of many asset types by proportion of PM to CBM work –
many assets are over inspected in a known over-inflated maintenance budget
– identifies key equipment areas where maintenance approach may
need review – too many equipment areas are located in the region
of high %PM tasks
Figure 4.1 Work type analysis
In Figure 4.1(b) the percentage of total tasks which may be designated
PM were plotted for each equipment area within a facility compared to
the total number of PM and CBM tasks. Working from the left hand side
Predictive Maintenance Activities in the Context of a Maintenance Strategy
Review
R Platfoot
8
Covaris Pty Ltd Strategic Reliability Conference, 2002
of the graph, successive equipment areas may be checked for the possibility
of over maintenance or the too frequent scheduling of PM inspections and
condition monitoring procedures.
A more precise correlation with condition monitoring procedures was achieved
in another case study, where the proportion of proactive tasks for different
types of condition monitoring procedures applied in a hydro power utility
are tabulated:
Relevant Equipment
%Proactive Work
Condition monitoring checks
Bearings
23
Vibration analysis
Oil sampling – spectrographic
Governor
Intake systems
Turbine
Bearings
29
39
23
23
Oil sampling – particle count
Battery
59
DGA
Generator
52
PDA
Battery
83
Battery inspection
Rotor winding insulation resistance checks
Generator
52
Rotor winding impedance checks
The indications from this case study are that the effectiveness of the
condition monitoring procedures in driving down reliance on corrective
maintenance has some way to improve.
4.2 Failure Mitigation
A second form of analysis appraises failure modes that have occurred in
recent times and the ability of the technologies within the condition
monitoring approach to effectively detect these modes and generate tasks
to prevent them. Gaps between the condition monitoring strategy and the
failure mode history to date will indicate where rethink of the techniques
may be warranted.
In one case study, the coverage of condition monitoring routines on historical
causes of stoppage is listed below. Using some approximation for partial
cover of some trip causes, condition monitoring addresses approximately
24% of the reasons as to why the targeted set of facilities stopped in
the 12 months of the data sample set. This number in itself represents
a KPI to be used for assessing the value of the condition monitoring.
Failure Code
%No. of Stops
Comment
Condition monitoring implications
Other
19.4
Undiagnosed trips or incomplete records kept – these trips were
kept outside the statistical analysis
Electronic circuits and software
11.8
Hardware, software and integrated circuits. Remnant bugs from past capital
work included in the data set.
Testing of software
Ongoing program is very good at identifying problems
Low voltage circuit failures
9.0
Could be a broken wire or random failure of a connection
Protection
8.1
False stops
Gauges
7.6
Random failures
No condition monitoring
Exciter
5.7
Typically wear out failures
Yes
High voltage circuit failures
5.2
Valves
4.7
No condition monitoring
Bearings
4.3
Oil Analysis, VA
Turbines
3.8
Some VA
Governor oil
3.3
Governor oil pump
Oil Analysis
Governor
3.3
Some
Pumps
2.8
Only big pumps
Not a lot of VA
Predictive Maintenance Activities in the Context of a Maintenance Strategy
Review
R Platfoot
9
Covaris Pty Ltd Strategic Reliability Conference, 2002
Failure Code
%No. of Stops
Comment
Condition monitoring implications
Transformers
1.9
Over a specific size
DGA
General hazard
1.4
Health and safety issues
Pump sets
1.4
Motor and pump
No condition monitoring
Battery
0.9
Yes
Heat exchangers
0.9
Transformer and bearing cooling oil
Hydraulics
0.9
No condition monitoring
Lubricants
0.9
Yes
Mechanical fixtures
0.9
Mechanical links, levers, arms
Filters
0.5
Replaced on fixed time basis
No condition monitoring
Generator
0.5
Yes
Vessels
0.5
Inspection only
No condition monitoring
A considerable amount of additional data has been suppressed in the interests
of confidentiality, but tables of this kind provide an insight into the
likely impact of condition monitoring on current failure modes afflicting
a facilities or organisation.
In another analysis, the distribution of failure modes in a population
of pumps was considered. The results are plotted below.
Total Cost -20000DriveElectricalRotatingAssemblyGland andSealingInstrumentationValve
andPositionerSpools andlinesBeltsBearingWork Action$
Figure 4.1 Cost of corrective actions for a grouping of pumps
The analysis of the failure modes indicates a very high percentage of
failure associated with the electrical drives. Problems with the rotating
assembly and bearing failures are considerably less. The primary option
for condition monitoring suggested by this type of analysis is performance
surveillance, where issues associated with over-driving of the pumps and
operating pumps bogged with product need to be monitored. VA remains a
relevant option for condition monitoring, and in the case above is meticulously
undertaken, but will not prevent much of the failure rate indicated in
the figure.
5 CRITICALITY RANKING AND ITS APPLICATION
Assessment of the criticality ranking of equipment is an important part
of the establishment of a maintenance system. It is particularly significant
in the design of a predictive maintenance strategy for the following reasons:
1. Scheduled maintenance may lapse under the pressure of corrective and
breakdown work at poorly scheduled or under-resourced sites
Predictive Maintenance Activities in the Context of a Maintenance Strategy
Review
R Platfoot
10
Covaris Pty Ltd Strategic Reliability Conference, 2002
2. The periodicity of the task is based on a perception of risk –
we do not know precisely when items will fail – and criticality
is an effective metric of risk
3. Investment in external services for advanced condition monitoring or
in capital equipment for in situ surveillance may be justified on the
basis of risk being addressed
The RCM Turbo tool provides a means to establish the overall criticality
of an item, based on answers to 12 questions. The rankings for the analysis
include:
1. High criticality/statutory item
2. Medium criticality item
3. Low criticality item
4. Non critical item
An extract from the results of one study using RCM Turbo for criticality
analysis is shown below:
ID
Comment
Risk Level
Turbo RCM Code
BEN04/BAC10/BB
Accumulator and Pipe Work
1
12 332232
BEN04/BAY/010
Protection tripping relay
1
11 122231
BEN04/BAY/020
Protection Measuring Relay
1
11 111231
BEN04/LP14
Draft tube system
4
333331331 2322333223
BEN04/MKA30/BU040
Slip Rings
4
331232111 2222323212
BEN04/MKA10/HA020
Stator windings
5
321132111 11122231
BEN04/MKA30/HB001
Rotor poles 1- 36
5
321132111 111 12231
As a consequence Turbo RCM is able to set an equipment criticality ranking.
Another process used by the author in establishing equipment criticality
is a simple ranking basis according to the table below:
Criticality Level
Plain English description of Criticality
Least consequence
A
No immediate problem when item fails AND Can be fixed when convenient
B
Nuisance when item fails AND Can be fixed when convenient
Target level for a maintainable item
C
Process slows when item fails OR Can usually be fixed in less than a week
D
Quality of the product deteriorates when item fails OR Cannot be fixed
within a week, but can be fixed within 4 weeks
AND No impact on downstream manufacturing processes
Greatest consequence
E
Product supply to client as scheduled stops OR OH&S or environmental
consequences OR Greater than 4 weeks to repair
In considering what constitutes an E level item, the following issues
should be considered:
• The contingency cost of failure – if a downstream process
is to be delayed, such as assembly, supply to client or even final assembly,
then the contingency costs of urgency to deliver or produce need to be
considered
• Risk to OH&S and environment – the maintenance policy
is to treat all such threats as high criticality, irrespective of the
level of harm allowing for omission of minor cuts and injuries suitable
for treatment by a first aid kit.
This represents a departure from some authorities [10] where multiple
criteria are ranked against each other, e.g. an OH&S incident of differing
severity is matched against an appropriate level of financial loss.
Predictive Maintenance Activities in the Context of a Maintenance Strategy
Review
R Platfoot
11
Covaris Pty Ltd Strategic Reliability Conference, 2002
5.1 Use of Criticality – Work Backlog
A possible policy for the acceptable limits for backlog data is defined
and presented in the table below. The risk levels indicated are based
on a consideration of the combined equipment criticality and task criticality
associated with each work order. Equipment criticality is as described
above and can be set at the establishment of the maintenance policy. The
task criticality may be assigned by the maintenance planner, and refers
to the perceived urgency of the task, ranging from correcting a hazardous
situation to updating drawings or painting a non-critical item.
Asset Criticality
A
B
C
D
E
1
1
2
3
4
5
2
2
4
6
8
10
3
3
6
9
12
15
4
4
8
12
16
20
Risk Level
5
5
10
15
20
25
Risk Levels – 1, 2 and 3
This policy can be graphically presented in a backlog report to distinguish
between the acceptable performances and the risky performances. The report
will also show the specific task that is a threat to the assets or has
the criticality changed over time
Level
Preferred Maximum Time in Back Log
Recommended Response to Remove from Backlog
1
2 months
Review if task is necessary – delete if not; increase task criticality
if it is
2
1 month
Increase task criticality to expedite attention
3
1 week
Urgent priority to address task
Policy for Addressing the Backlog
The graphical interpretation of the above policy is presented in Figure
5.1.
Backlog Example051015202530020406080100120140160180200Days
Figure 5.1 Graphical presentation of the Policy for Addressing the Backlog
Predictive Maintenance Activities in the Context of a Maintenance Strategy
Review
R Platfoot
12
Covaris Pty Ltd Strategic Reliability Conference, 2002
The area in the left side of the policy line is the acceptable performance
area where tasks in backlog represent an acceptable level of risk. The
right hand side of the policy line identifies tasks that are risks to
the assets or require their criticality to be reviewed over time.
5.2 Criticality of Condition Monitoring Task
In the material provided in Section 5.1, a key feature is assigning the
required time for task completion is assessing a combined risk score based
on both equipment and task criticality. It has been shown that fundamental
RCM thinking, managed through a tool such as Turbo RCM, can assign equipment
criticality. But it is not so clear as to what level the task criticality
should be set for a condition monitoring task assigned for an item of
equipment.
Periodicity of the task can be set by one of the following:
1. Subjective consideration of the damage mode being investigated
2. Failure rates such as derived from the material provided in Section
4.2 of this paper, covering recent history of failures within a time period
3. Loss of reliability analysis – periodicity set on probability
of failure plus economic return from inspection, [2] – this is a
process rarely followed due to inadequate reliability information
MIL-STD 1629A provides the following formula for assigning task criticality:
tCpm?aß=
Cm is the criticality number of the failure mode, ß is the conditional
probability of mission loss, a is the failure mode ratio, ?p is the part
failure rate and t is the duration of the applicable mission phase, expressed
either in hours or cycles. ß and a are probability factors between
0 and 1, and may be set subjectively. The key item of information is ?p,
which is an empirical measure of the expected number of failures of a
single mode of failure within a idealized mission of duration t.
It is therefore of extreme value if a value for ?p can be ascertained
at the time of establishing a condition monitoring task, since this will
provide a credible determination for ranking all tasks within a predictive
maintenance strategy.
6 CONCLUSION
This paper has interwoven three key themes:
1. Decision theory for selecting the correct maintenance procedure
2. The growth of predictive maintenance and its future directions in the
area of prognostics
3. Assignment of equipment criticality and its use in management of tasks
The outcome of the work to date indicates that RCM processes retain their
effectiveness in determining the correct maintenance approach, but that
some of the housework associated with determining scheduled replacement
and discard tasks can be tested for time effectiveness on the part of
the maintenance analyst by moving directly to the assumption of a predictive
maintenance approach and the greater reliance on condition-based maintenance.
The traditional issue with this is that at times it can be more expensive
to survey and inspect than to simply replace or run to failure. The steady
introduction of remote surveillance and lower cost condition monitoring
processes that track a wider spectrum of damage modes can be seen to challenge
this thinking.
Predictive Maintenance Activities in the Context of a Maintenance Strategy
Review
R Platfoot
13
Covaris Pty Ltd Strategic Reliability Conference, 2002
Having said this, the RCM processes of assessing criticality should be
considered to have greater relevance in a condition-based maintenance
regime owing to the reliance on criticality rankings to schedule priority
tasks within a shrinking resource base of maintenance trades people. This
is a second adjustment to the traditional thinking in the application
of RCM [11], where the criticality data is scheduling tool and also used
in site management of risk associated with backlog.
The original RCM paradigms remain as relevant today as when they were
first established in the late 1960’s, simply because they were founded
on common sense and a practical appreciation of maintenance realities.
The intent of this paper was to present some contemporary thinking on
the application of RCM standards and tools in a condition-based maintenance
environment where there is a strong take-up of condition monitoring technologies.
A number of key recommendations are made in this paper:
1. The failure mode process in the FMECA needs to be accompanied by a
strong analysis of failure detection tasks.
2. The strengthening of the failure detection task analysis in the FMECA
may bias the RCM decision making tree to fault finding tasks with less
time considering alternative scheduled replacement and discard tasks,
and even run to failure considerations. The option for redesign as the
ultimate end point if all else fails in the RCM decision making process
may indeed be redesign for improved surveillance rather than redesign
to support the intended function.
3. RCM asks a series of key questions, the answers to which may be couched
in terms of criticality contributions. This is how Turbo RCM implements
this distinctive and essential aspect of RCM. The output from these questions
can be provided as an overall condition criticality ranking that is essential
for improved scheduling and risk management monitoring with the equipment
in service.
4. With the removal of scheduled replacement and discard tasks, the apparent
risk of failure in the equipment may increase with conservative time to
replace guidelines superseded. The reliance on surveillance can be associated
with increased risk as the original maintenance systems designer misses
out on failure modes that arise in the future. Hence processes to mitigate
risk such as reliability engineering and criticality analysis of backlog
become critical – the owner of the assets may exploit condition
monitoring to reduce the total size of the maintenance support team, but
should balance these reductions with human investment in analysis and
scheduling processes.
The final comment that can be made is a challenge as to whether the RCM
implementation team sufficiently distinguish between functional failure
and, for want of a better term, engineering failure modes (which we may
also call damage modes), which may be taken to mean the precise physical
mechanism of degradation (eg fatigue, abrasion, weld failure, etc) that
a failure represents. Only through a very precise understanding of the
damage mode can an effective surveillance and condition monitoring process
be implemented with appropriate timing of tasks to assure against unnecessary
risk.
ACKNOWLEDGEMENT
The author acknowledges the assistance provided by many companies, his
colleagues at Covaris and many experts within the New Zealand and Australian
maintenance community to the research described in this paper.
The support of Meridian Energy on aspects of the condition monitoring
material and RCM Turbo data is greatly appreciated.
Comments on this paper can be emailed to r.platfoot@covaris.com.au.
REFERENCES
1. JV Jones, Integrated Logistics Support Handbook, McGraw Hill 2ed, 1994
Predictive Maintenance Activities in the Context of a Maintenance Strategy
Review
R Platfoot
14
Covaris Pty Ltd Strategic Reliability Conference, 2002
2. MIL-STD-2173 (AS), Reliability-Centered Maintenance Requirements for
Naval Aircraft, Weapons Systems and Support Equipment, AMSC N3769, DoD
(US)
3. FS Nowlan & HF Heap, Reliability-Centered Maintenance, DTIC (US),
1978
4. IRCMS Overview, presentation by Naval Aviation Systems Team, USAF,
2000
5. MIL-STD-1629A, Procedures for Performing a Failure Mode, Effects and
Criticality Analysis, AMSC N3074, DoD (US)
6. J Moubray, Reliability-centred Maintenance, Butterworth Heinemann 2ed,
1997
7. H Dupow & G Blount, A Review of Reliability Prediction, Aircraft
Engineering and Aerospace Technology, 69, 1997
8. Preventative Maintenance Strategies using Reliability-Centred Maintenance
(RCM), NASA Technique PM-4, 1995
9. G Hinchcliffe, Taking the Mystery out of Reliability-Centered Maintenance,
ASME 94-JPGC-NE-5, 1994
10. Risk Management Guidelines, Airservices Australia, Draft version 1.0,
1997
11. SAE Surface Vehicle Aerospace Standard JA1011, Evaluation Criteria
for Reliability-Centred (RCM) Processes, 1999
Predictive Maintenance Activities in the Context of a Maintenance Strategy
Review
R Platfoot
15
|