|
The
Benefits of Practicing Enterprise Risk Reduction
R.A. Platfoot
University of New South Wales
Sydney NSW 2052
Risk management is a pervasive tool which can be employed by both production
and maintenance groups as a basis for communicating a perception and to
use a priority-setting tool in the absence of detailed information. Used
correctly, the quantifiable description of risk such as a risk level or
safety index, can streamline work and consequent investment to provide
that work. This paper describes a simple approach to quantifiable risk
assessment which can be tailored for smaller companies and form the basis
for more sophisticated approaches by larger organisations. Examples and
systems are provided as to how the recognition of risk can be exploited
in optimising maintenance systems.
1. Introduction
There is an increasing recognition of the need to move away from prescriptive
maintenance systems which are procedural heavy and usually unwieldy to
operate, to a more goal oriented planning and execution of maintenance.
Zebroski [1] noted some symmetries in the disasters of Chernobyl (1986),
Three Mile Island (1979) and Challenger (1986):
• Long term successful engineering program with considerable expert
knowledge available.
• Major military involvement.
• Repetitive service by the asset.
• Compartmentalisation of data and introduction of in-house politics.
These aspects were repeated in the Piper Alpha disaster (1988) [2] where
a strong “mechanistic” maintenance approach was in place.
Bringing these large disasters back to a everyday manufacturing and utility
context, prescriptive maintenance is a recipe for the following problems:
1. Inefficient work
2. Ignorance of a hazard
3. Stultification of people thinking
4. The means to the objective becomes the objective. An example arises
where completing work order cards overtakes plant reliability as the objective
of maintenance.
1
Risk management is the technique which can break the fixed mould of programmed
maintenance, introducing an element of common sense into decision making.
The assessment of risk has four primary elements, [3]..
1. Definition of the scope of technical issues associated with the possibility
of the hazard such as plant type, what is affected and criteria for setting
criticality.
2. The contribution of human perception and understanding such as the
subjective judgment of risk, and company policy and work culture.
3. Operational demands which require specific levels of plant availability,
and the high costs for assembling capital for plant replacement.
4. The greater demands introduced by environmental legislation, work place
safety, the handling of hazardous goods and insurance liability.
In other words: where are the problems, how will people react to them,
what will their effect be on the business operations, and what penalties
may be inflicted upon the company in the event of an incident?
There are two areas in which risk particularly impacts on maintenance
management. The first is that, like most quality based systems, there
is a high focus on record keeping and documentation. As a consequence,
maintenance work such as surveys and plant checking will be required to
provide information in a similar way that production logs also contribute
to the process. The second area is that risk assessment can provide a
proactive basis for maintenance planning. As risks are identified, then
so maintenance plans and objectives may be modified so that work may be
implement to assist in reducing hazards. This type of work is directed
towards replacing or refurbishing faulty equipment, redesigning a potential
problem area, and fitting safety equipment.
2. Systems Development
The systems associated with the maintenance of a company’s assets
are shown in Figure 1. This section covers general comments associated
with these systems and how they must interact in a Reliability-centred
maintenance (RCM) style environment. RCM recognises the following tenets:
• The reliability of a part cannot be improved by maintenance activity
beyond the inherent level set by its design. For example, no level of
maintenance attention is going to improve the behavior of a low alloy
steel component in an aqueous environment compared to a redesign with
the part being replaced by stainless steel.
• The reliability of the part is dependent on the operating environment
within which it carries out its specified function. It should be noted
that the reliability may be increased with improvements to that environment
whereas it will deteriorate as the conditions become more aggressive.
2
• Maintenance management, and in particular the planning process,
is subject to systematic strategies which may be generic for a wide range
of circumstances. In this manner the planning process may be elevated
above a sequence responses based solely on long term experience.
Plant IdentificationCMMS - plant dictionaryManual/ SCADAavailability/performancemonitoringPlant
inspection progamand condition monitoringCMMSDatabaseMaintenance work
orderproceduresRCM audit of workprocedures- BD, PDM, PM*Detailed risk
audit- breaking the plantdown into key areasHierarchial plant dictionarySpreadsheet
of proceduresAll procedures are entered into the CMMSWork completely scheduled
from the CMMSMW consider improved utilisation and analysis ofproduction
management informationWhat are key trouble spots?Business risk analysis-
what are the mostimportant facilities?Prioritise the surveillanceprogramCan
we tolerate a failure?Can we inspect - while in service?Can we inspect
- while out of service?Minimise plannedmainteance workRemove PM'sEdit
the CMMS procedural baseUpgrade the reporting and analysisof the CMMSCapture
plant condition and riskinformation
* BD - Breakdown work, PDM - Predictive maintenance, PM - Preventative
Maintenance
Figure 1 Integration of risk with maintenance information systems
In one US power industry study, [4], 123,057 cyclic work orders were studied
using an RCM approach which is identified as RCM Audit of Work in Figure
1.. The outcome of this study was as follows:
3.62%
4455
PM tasks added - predictive methods replacing cyclic work
17.91%
22,039
PM tasks deleted
12.43%
15,333
Frequency changed for remaining cyclic PM tasks
33.99%
42,827
PM tasks modified or changed in some way 3
(TOTAL)
Clearly a benefit can be found using RCM to modify an existing, well defined
cyclic maintenance program. This example demonstrates the commercial benefit
of integrating a risk approach to setting priority for maintenance.
The biggest factor associated with the success of an RCM analysis is the
availability of data. If there is little data within the company then
experience suggests that there is insufficient basis for setting or prioritising
work orders. Without this knowledge the company should initially establish
a cyclic program, [5] which will at least capture accurate information
about asset integrity.
The mistake that can be made is to compile volumes of data to little purpose,
so that maintenance improvement becomes a data collection exercise rather
than one which establishes immediate performance improvements in the plant.
The way to avoid this mistake is to employ a business risk study very
early in the process in order to prioritise what information needs to
be collated.
3. Risk Assessment
Risk management is central to the RCM approach, ensuring that the review
of maintenance practice is conducted according to business priorities
and that the process is not sunk by the need to handle large volumes of
data. A five level risk ranking was established for one site which was
explicit in terms of differentiating between the various levels:
1. No problems
2. Cosmetic problems, e.g. needs painting.
3. Requires work but is not process critical, e.g. hoses cracked.
4. Requires maintenance as soon as convenient, e.g. major oil leak.
5. Imminent breakdown, e.g. no oil left in the bearing.
Using such simple language, the level of risk should have meaning to all
people associated with the organisation. This is important since the risk
system needs to be utilised by a wide cross section of people. The suggested
activities for maintenance and production are tabulated:
Risk Level
Maintenance
Production
1
Do not schedule work
OK for use
2
Do not schedule work
OK for use
3
Schedule work into a normal program by
OK for use
4
bundling together work in that area
CYCLIC MAINTENANCE
4
Schedule work into a production window and seek that window as soon as
possible.
OPPORTUNISTIC MAINTENANCE
Provide production window
Not preferred to operate
5
Dispatch people now.
BREAKDOWN MAINTENANCE
Provide isolation to allow BD task
Operate only in emergency
It is necessary that these rankings are agreed between both operations
and maintenance staff such that each group may have reasonable expectations
of the other. In addition it is important that the meanings are identical,
irrespective of whether the assets are mechanical, electrical or civil.
One of the above levels is a score which may be employed in the following
circumstances:
1. Setting priority for a work order
2. Acting as a qualitative measure from a visual inspection
3. Condition of the plant either as found at the start of a job, or as
left when the trades person completes a job
For example, an operator or maintenance person may identify a piece of
equipment as in condition 4, resulting from an inspection of the plant.
Hence a work order is raised with a risk level of 4. That carries meaning
to both production and maintenance as per the table set out above. When
the trades person acquits the work, they should sign off that the condition
of the plant has been reset to a level 1 or 2.
The status of a plant needs to be reported so that operators are continually
aware of any threats to production. The levels of reporting which were
suggested included:
Risk Level
Production Management
Operators
1
2
3
All level 3 jobs tabulated and the summary considered
4
All level 4 jobs separately tabulated with some being scrutinised
All level 4 jobs identified
5
All level 5 jobs described
All level 5 jobs identified
What this table means is that a log has to be kept with the operators,
identifying when an item of plant is subject to a risk condition 4 or
5. When an item is repaired and returned to a level 1,2,3 state the fact
must be reported to the operators so that they know that the asset is
free for use with no restrictions. A trades person returning an asset
from a level 4 or 5 condition should be obliged to complete a visual inspection
and report to ensure against secondary damage or other undetected damage.
5
Production management will need to be aware of the following:
1. The balance of work which is at risk levels 3, 4 and 5. High levels
of risk level 4 or 5 is a KPI that maintenance needs to improve its effectiveness.
In addition, such high levels would correlate to high expenditure per
job which is something we are trying to avoid.
2. Access to a summary of level 4 jobs on request. This is because they
may need a please explain as to why the job crept over the level 3 mark
and secondly, they may be aware of production circumstances which may
raise a level 4 to a level 5.
3. Detailed review of level 5 since these are direct threats to their
business performance. Each level 5 needs to be reviewed in detail with
KGFM to ascertain how it can be prevented in the future. The number of
level 5 tasks is as significant a KPI as the downtime rate.
4. The number of level 5 jobs called in by operators. MW need to ensure
that their operators do not needlessly call in a level 5 job request which
would incur the financial penalty of a call out.
4. Quantitative Risk Assessment
Criticality is a direct of measure of the priority which should be allocated
to attention being paid to a plant asset, whether that be through maintenance
work or capital upgrade. It is a multiple, calculated as follows:
Criticality = Hazard Likelihood×
where hazard is a measure of the desirability of an outcome and likelihood,
the probability that the outcome will actually occur. This approach has
also been used by the US petroleum industry for risk-based inspection
programs, [6].
A criticality ranking can be used for the following purposes:
• Set priority within a list of tasks
• Set priority for budget items
• Set priority for capital spending
To determine the level of criticality, a technique has been established
to independently assess the hazard and likelihood levels. It is important
that these levels are established without reference to each other, in
order to insure against the following problems:
1. That high consequence of failure items are not exaggerated in terms
of being in poor condition.
2. That low consequence of failure items are not ignored and inadvertently
left out of planning for maintenance, even though the likelihood of failure
has raised their level of criticality.
6
It is common to apply a range of criteria for each issue, and sample criterion
are listed below. It is also common for the criteria for hazards to differ
from site to site since these are closely dependent on the business requirements
for the plant. However, it is not so common for the likelihood criteria
to differ, irrespective of how diverse the industry applications may be.
Hazard criteria
Likelihood criteria
Occupational health and safety
Environment
Risk of loss of capital
Unavailability for service/production
Inability to accommodate changes to plan, lead time for ordering spare
parts, inconvenience to users
Recent failure rate
Inspection results
Design robustness
Level of utilisation
Level of surveillance
A technique is required to ensure that the score out of 10 applied for
each issue takes into account the relevant criteria, across which a judgement
has to be applied. As part of this, the assessment has to make some judgement
regarding the corresponding magnitude of severity relevant to each criterion.
This is achieved through applying a maxima across Hazard and Likelihood
tables as shown in Figure 2.
OH&S
Environment
Capital Replacement
Availability Cost
Efficiency Loss
1
No risk at all
<$100
Redundant item
1 shift to organise
2
Unlikely to impose risk
<$500
> 2 shifts to organise
3
Irritation
Litter
<$1,000
Minor part availability loss
4
<$5,000
Exposed if another part fails
5
Minor Wound
Loud noise
<$10,000
Non Process Critical
2 months to organise
6
Wound
Process Critical
7
Loss of health
Minor release
<$100,000
6 months to organise
8
Disability
Major toxic release
<$250,000
Whole process down
1 year to organise
9
Major Impairment
Serious threat to people & environment
<$500,000
Factory down
10
Death
Population health endangered
<$1,000,000
Hazard table - a possibility
Condition
History of Failure
Design Severity
Working Environment
Level of Use
1
As new
Once in 5 years
Stable/robust
Friendly
Hardly ever 7
design
environment
2
Refurbished completely
Once in 3 years
Twice per day
3
Regularly maintained
Once per year
Average
One full day per week
4
Minor part problem
Once per 6 months
Minor erosion/
corrosion
2 full days per week
5
Once per month
Lack or protection - corrosion/ wear
6
Low speed machinery
2 shifts by 5 days
7
Major part problem
Average speed machinery
Permanently
8
Once per week
Highly dynamic machinery
Inappropriate work environment
9
More than once per week
Slim, sensitive and highly dynamic
Corrosion/ erosion
10
Hazardous
Inappropriate corrosion/ erosion
Likelihood table
Figure 2 Hazard and likelihood tables
In the case that a number of hazard levels are possible under the same
criteria, the calculation should issue a number of criticality possibilities
which are individually logged on to the cumulative frequency analysis.
This is because the relationship between these possibilities is an OR
function which is additive in probability theory. Hence a substation transformer
may have a criticality logged for explosive failure and another one for
leakage of oil.
A risk audit should be conducted across all assets at least once per 5
years. 5 years represents a half life between major refurbishment for
most industrial plant. The audit must embrace all aspects of the asset
including static aspects which normally do not receive any attention during
their service life. These could include storm water run-off, the structural
beams of a building, foundation plates for machinery and so on. For more
critical items which are known to be more prone to failure, the audit
may need to be conducted on an annual basis. A sample range of frequencies
is shown below:
Annual
2-3 years
5 years
Rotating plant
Electric motors
Turbomachines
Static plant
Load bearing elements, flexible hangers, material handling
Furnaces, pipe work
Machine bases
Civil plant
Lighting, roadways
Sewage, fresh water
Bridgework, building frame, storm water
8
An aspect of an organisation’s maintenance policy document is to
considerably extend this table in order to set guidelines for the design
of preventative maintenance programs.
The conduct of an audit should combine information from the following
sources:
1. Criticality assessment
2. Downtime records
3. Condition assessment
4. Maintenance history
It may be supplemented by a fifth piece of information:
5. Design review - status of latest technology
Due diligence is demonstrated when a company completes a criticality audit
on its outstanding capital works program. If the criticality work sheets
indicated above are applied to individual jobs, each of which costs a
sum of $xi, then the following plot may be created. 0501001502002503003505152535455565758595Criticality$
Individual Costs 050010001500200025005152535455565758595Criticality$
Cumulative Totals
Figure 3 Capital works criticality diagram
9
The statistical analysis of these diagrams provides a level of risk reduction
which may be associated with a total budget expenditure. The demonstration
of due diligence comes from the comparison of successive capital analyses
as shown in Figure 4.
050010001500200025005101520253035404550556065707580859095100Criticality$Year
1Year 2
Figure 4 Comparison of annual capital budgets
In comparing the example set in Figure 4, a total spending of $500K will
eradicate risk within the asset base up to the following levels for each
year:
Year 1
82%
Year 2
72%
Hence diligence is shown by the decreasing level of risk cut-off which
is being addressed by the capital program. Another way of looking at the
result is to consider what is the necessary expenditure to complete all
projects with a criticality level of 80%:
Year 1
$657K
Year 2
$273K
It should be noted that the example which has been developed for this
manual demonstrates significant improvements only at the higher levels
of the criticality range. This would follow in practice with capital probably
only applying to either high consequence issues or issues of immediate
urgency which address a likely hazard.
5. Acceptable Level of Risk
The levels of risk are shown in Figure 5, [7]. The key point to this diagram
is the region described as ALARP - as low as reasonably possible. To assist
a person exploiting this approach, the grades of the diagram may be in
terms of the criticality function described above.
10
Intolerable Level(Risk cannot bejustified on anygrounds)ALARP Region(Risk
is undertakenfor an identifiablebenefit)Broadly AcceptableRegion(No need
for detailedrisk examination)Tolerable only if cost is grosslydisproportionate
to improvementgainedTolerable if cost of reductionexceeds benefit to be
gainedLevel of acceptablerisk?
Figure 5 Level of risk
The management of work which is to counter risk which falls in to the
ALARP range is based on a judgement as to what constitutes acceptable
risk. In one approach, company policy can be established on a F-N criteria,
as shown on Figure 6.
Frequencyof failuresper yearNumber of fatalities10100100010-110-210-310-410-510-6ABCA
- Unacceptable limitB - Acceptable limitC - UK nuclear industryrisk target
Figure 6 F-N criteria for risk
The term fatalities may be replaced by casualties. A company may employ
a quantitative graph such as Figure 6 as part of its policy on risk containment.
This requires interpretation of the hazard criteria described above as
a measureable value such as number of fatalities or number of casualties
per year. Hence the definition of an
11
acceptable level of risk is dependent on the context of the operating
environment and is subject to an individual company’s policy. There
is no definitive recommendation on acceptable risk as yet released in
the public literature. The F-N curves provide a quantifiable method for
tracking a domain in which this level may be said to lie.
6. Conclusion
In a recent review of a medium-sized manufacturing company, a colleague
of the author wrote:
Current maintenance practices within the company are not conducive to
sound planning. Workshops are operating close to maximum capacity and
are squarely focused on output. Assets are not being made available for
routine preventative maintenance and in some instances, for lubrication.
As a result, approximately 80% of the Maintenance Section’s actions
are breakdown responses. Only 20 to 25% of plans and equipment have planned
preventative maintenance routines and there is no ongoing programme to
develop more. Most of the planned maintenance activities occur during
the two week Christmas shut-down period.
This is a classic appraisal of a company with no element of risk management
within its maintenance planning approach. Risk management provides anticipation
of problems which underlies a preventative maintenance approach. In addition,
a commitment to risk management will ensure that the ongoing program referred
to in this comment will become part of the company’s management
policy.
Risk analysis is fundamental in identifying necessary maintenance work
which may be otherwise overlooked. If a safety item such as a release
valve, an isolating valve or a moving guard breaks down, then production
is not likely to be affected. As a consequence there is some incentive
to either tolerate or even miss the fault. However, the failure of such
equipment then places the company in a dangerous position, particularly
in the event of an incident. Risk analysis provides the audit process
to determine if the company has taken every precaution to avoid unnecessary
hazards. In the event that an incident does occur, then the company is
in some way protected from both insurance and legal implications.
As one last incentive, in the current stage of development of the industrial
society, the majority of companies have poor records concerning the reliability
of their equipment. They do not have a base of information to lead off
maintenance improvement or ensure that their current mix of preventative
maintenance procedures is effective or thorough. In the absence of reliability
information, the mix of information included in the criticality assessment
procedure described in this paper can provide a reasonable basis for maintenance
improvement.
12
References
1. E. Zebroski, Sources of common cause failures in decision-making involved
in man-made catastrophes, Advances in Risk Analysis, Volume 7, 1989, Plenum
Publishing.
2. W.E. Gale et al, Human factors in operational reliability of offshore
production platforms: The Fire and Life Safety Assessment Index Methodology
(FLAIM), ASME Pressure Vessel and Piping Conf., PVP-Vol 296/SERA-Vol 3
Risk and Safety Assessment: Where is the Balance?, Hawaii, 1995.
3. W.F. Kenney, Process Risk Management Systems, VCH Publishers Inc.,
1993
4. L. Hutchinson, Application of an integrated software program for optimising
preventive maintenance programs, Fossil Plant Maintenance Conference,
Electric Power Research Institute, Baltimore, 1996
5. R.A. Platfoot, Reduction of plant downtime due to informed maintenance
planning and tailoring the maintenance system for production, Maintenance
Management Strategies, IIR Pty Ltd, Sydney, February, 1997.
6. J.E. Aller et al, Risk based inspection for the petrochemical industry,
ASME Pressure Vessel and Piping Conf., PVP-Vol 296/SERA-Vol 3 Risk and
Safety Assessment: Where is the Balance?, Hawaii, 1995.
7. J. Cross, Risk Management, Master of Business and Technology course,
University of New South Wales, 1996.
13
|