Capability Personell OH&S Poilicy Quality Policy Contact Us Affiliation
Maintenance Systems Reliability Engineering Asset Management Safety Management Planners School
Software Products Mobile Applications Custom Software Development Systems Consultation
Covaris Europe Covaris Mark Web Site Design Print Design Email Marketing VisFac
Research News

 

 

The Benefits of Practicing Enterprise Risk Reduction


R.A. Platfoot
University of New South Wales
Sydney NSW 2052
Risk management is a pervasive tool which can be employed by both production and maintenance groups as a basis for communicating a perception and to use a priority-setting tool in the absence of detailed information. Used correctly, the quantifiable description of risk such as a risk level or safety index, can streamline work and consequent investment to provide that work. This paper describes a simple approach to quantifiable risk assessment which can be tailored for smaller companies and form the basis for more sophisticated approaches by larger organisations. Examples and systems are provided as to how the recognition of risk can be exploited in optimising maintenance systems.
1. Introduction
There is an increasing recognition of the need to move away from prescriptive maintenance systems which are procedural heavy and usually unwieldy to operate, to a more goal oriented planning and execution of maintenance. Zebroski [1] noted some symmetries in the disasters of Chernobyl (1986), Three Mile Island (1979) and Challenger (1986):
• Long term successful engineering program with considerable expert knowledge available.
• Major military involvement.
• Repetitive service by the asset.
• Compartmentalisation of data and introduction of in-house politics.
These aspects were repeated in the Piper Alpha disaster (1988) [2] where a strong “mechanistic” maintenance approach was in place. Bringing these large disasters back to a everyday manufacturing and utility context, prescriptive maintenance is a recipe for the following problems:
1. Inefficient work
2. Ignorance of a hazard
3. Stultification of people thinking
4. The means to the objective becomes the objective. An example arises where completing work order cards overtakes plant reliability as the objective of maintenance.
1
Risk management is the technique which can break the fixed mould of programmed maintenance, introducing an element of common sense into decision making. The assessment of risk has four primary elements, [3]..
1. Definition of the scope of technical issues associated with the possibility of the hazard such as plant type, what is affected and criteria for setting criticality.
2. The contribution of human perception and understanding such as the subjective judgment of risk, and company policy and work culture.
3. Operational demands which require specific levels of plant availability, and the high costs for assembling capital for plant replacement.
4. The greater demands introduced by environmental legislation, work place safety, the handling of hazardous goods and insurance liability.
In other words: where are the problems, how will people react to them, what will their effect be on the business operations, and what penalties may be inflicted upon the company in the event of an incident?
There are two areas in which risk particularly impacts on maintenance management. The first is that, like most quality based systems, there is a high focus on record keeping and documentation. As a consequence, maintenance work such as surveys and plant checking will be required to provide information in a similar way that production logs also contribute to the process. The second area is that risk assessment can provide a proactive basis for maintenance planning. As risks are identified, then so maintenance plans and objectives may be modified so that work may be implement to assist in reducing hazards. This type of work is directed towards replacing or refurbishing faulty equipment, redesigning a potential problem area, and fitting safety equipment.
2. Systems Development
The systems associated with the maintenance of a company’s assets are shown in Figure 1. This section covers general comments associated with these systems and how they must interact in a Reliability-centred maintenance (RCM) style environment. RCM recognises the following tenets:
• The reliability of a part cannot be improved by maintenance activity beyond the inherent level set by its design. For example, no level of maintenance attention is going to improve the behavior of a low alloy steel component in an aqueous environment compared to a redesign with the part being replaced by stainless steel.
• The reliability of the part is dependent on the operating environment within which it carries out its specified function. It should be noted that the reliability may be increased with improvements to that environment whereas it will deteriorate as the conditions become more aggressive.
2
• Maintenance management, and in particular the planning process, is subject to systematic strategies which may be generic for a wide range of circumstances. In this manner the planning process may be elevated above a sequence responses based solely on long term experience.
Plant IdentificationCMMS - plant dictionaryManual/ SCADAavailability/performancemonitoringPlant inspection progamand condition monitoringCMMSDatabaseMaintenance work orderproceduresRCM audit of workprocedures- BD, PDM, PM*Detailed risk audit- breaking the plantdown into key areasHierarchial plant dictionarySpreadsheet of proceduresAll procedures are entered into the CMMSWork completely scheduled from the CMMSMW consider improved utilisation and analysis ofproduction management informationWhat are key trouble spots?Business risk analysis- what are the mostimportant facilities?Prioritise the surveillanceprogramCan we tolerate a failure?Can we inspect - while in service?Can we inspect - while out of service?Minimise plannedmainteance workRemove PM'sEdit the CMMS procedural baseUpgrade the reporting and analysisof the CMMSCapture plant condition and riskinformation
* BD - Breakdown work, PDM - Predictive maintenance, PM - Preventative Maintenance
Figure 1 Integration of risk with maintenance information systems
In one US power industry study, [4], 123,057 cyclic work orders were studied using an RCM approach which is identified as RCM Audit of Work in Figure 1.. The outcome of this study was as follows:
3.62%
4455
PM tasks added - predictive methods replacing cyclic work
17.91%
22,039
PM tasks deleted
12.43%
15,333
Frequency changed for remaining cyclic PM tasks
33.99%
42,827
PM tasks modified or changed in some way 3
(TOTAL)
Clearly a benefit can be found using RCM to modify an existing, well defined cyclic maintenance program. This example demonstrates the commercial benefit of integrating a risk approach to setting priority for maintenance.
The biggest factor associated with the success of an RCM analysis is the availability of data. If there is little data within the company then experience suggests that there is insufficient basis for setting or prioritising work orders. Without this knowledge the company should initially establish a cyclic program, [5] which will at least capture accurate information about asset integrity.
The mistake that can be made is to compile volumes of data to little purpose, so that maintenance improvement becomes a data collection exercise rather than one which establishes immediate performance improvements in the plant. The way to avoid this mistake is to employ a business risk study very early in the process in order to prioritise what information needs to be collated.
3. Risk Assessment
Risk management is central to the RCM approach, ensuring that the review of maintenance practice is conducted according to business priorities and that the process is not sunk by the need to handle large volumes of data. A five level risk ranking was established for one site which was explicit in terms of differentiating between the various levels:
1. No problems
2. Cosmetic problems, e.g. needs painting.
3. Requires work but is not process critical, e.g. hoses cracked.
4. Requires maintenance as soon as convenient, e.g. major oil leak.
5. Imminent breakdown, e.g. no oil left in the bearing.
Using such simple language, the level of risk should have meaning to all people associated with the organisation. This is important since the risk system needs to be utilised by a wide cross section of people. The suggested activities for maintenance and production are tabulated:
Risk Level
Maintenance
Production
1
Do not schedule work
OK for use
2
Do not schedule work
OK for use
3
Schedule work into a normal program by
OK for use
4
bundling together work in that area
CYCLIC MAINTENANCE
4
Schedule work into a production window and seek that window as soon as possible.
OPPORTUNISTIC MAINTENANCE
Provide production window
Not preferred to operate
5
Dispatch people now.
BREAKDOWN MAINTENANCE
Provide isolation to allow BD task
Operate only in emergency
It is necessary that these rankings are agreed between both operations and maintenance staff such that each group may have reasonable expectations of the other. In addition it is important that the meanings are identical, irrespective of whether the assets are mechanical, electrical or civil.
One of the above levels is a score which may be employed in the following circumstances:
1. Setting priority for a work order
2. Acting as a qualitative measure from a visual inspection
3. Condition of the plant either as found at the start of a job, or as left when the trades person completes a job
For example, an operator or maintenance person may identify a piece of equipment as in condition 4, resulting from an inspection of the plant. Hence a work order is raised with a risk level of 4. That carries meaning to both production and maintenance as per the table set out above. When the trades person acquits the work, they should sign off that the condition of the plant has been reset to a level 1 or 2.
The status of a plant needs to be reported so that operators are continually aware of any threats to production. The levels of reporting which were suggested included:
Risk Level
Production Management
Operators
1
2
3
All level 3 jobs tabulated and the summary considered
4
All level 4 jobs separately tabulated with some being scrutinised
All level 4 jobs identified
5
All level 5 jobs described
All level 5 jobs identified
What this table means is that a log has to be kept with the operators, identifying when an item of plant is subject to a risk condition 4 or 5. When an item is repaired and returned to a level 1,2,3 state the fact must be reported to the operators so that they know that the asset is free for use with no restrictions. A trades person returning an asset from a level 4 or 5 condition should be obliged to complete a visual inspection and report to ensure against secondary damage or other undetected damage.
5
Production management will need to be aware of the following:
1. The balance of work which is at risk levels 3, 4 and 5. High levels of risk level 4 or 5 is a KPI that maintenance needs to improve its effectiveness. In addition, such high levels would correlate to high expenditure per job which is something we are trying to avoid.
2. Access to a summary of level 4 jobs on request. This is because they may need a please explain as to why the job crept over the level 3 mark and secondly, they may be aware of production circumstances which may raise a level 4 to a level 5.
3. Detailed review of level 5 since these are direct threats to their business performance. Each level 5 needs to be reviewed in detail with KGFM to ascertain how it can be prevented in the future. The number of level 5 tasks is as significant a KPI as the downtime rate.
4. The number of level 5 jobs called in by operators. MW need to ensure that their operators do not needlessly call in a level 5 job request which would incur the financial penalty of a call out.
4. Quantitative Risk Assessment
Criticality is a direct of measure of the priority which should be allocated to attention being paid to a plant asset, whether that be through maintenance work or capital upgrade. It is a multiple, calculated as follows:
Criticality = Hazard Likelihood×
where hazard is a measure of the desirability of an outcome and likelihood, the probability that the outcome will actually occur. This approach has also been used by the US petroleum industry for risk-based inspection programs, [6].
A criticality ranking can be used for the following purposes:
• Set priority within a list of tasks
• Set priority for budget items
• Set priority for capital spending
To determine the level of criticality, a technique has been established to independently assess the hazard and likelihood levels. It is important that these levels are established without reference to each other, in order to insure against the following problems:
1. That high consequence of failure items are not exaggerated in terms of being in poor condition.
2. That low consequence of failure items are not ignored and inadvertently left out of planning for maintenance, even though the likelihood of failure has raised their level of criticality.
6
It is common to apply a range of criteria for each issue, and sample criterion are listed below. It is also common for the criteria for hazards to differ from site to site since these are closely dependent on the business requirements for the plant. However, it is not so common for the likelihood criteria to differ, irrespective of how diverse the industry applications may be.
Hazard criteria
Likelihood criteria
Occupational health and safety
Environment
Risk of loss of capital
Unavailability for service/production
Inability to accommodate changes to plan, lead time for ordering spare parts, inconvenience to users
Recent failure rate
Inspection results
Design robustness
Level of utilisation
Level of surveillance
A technique is required to ensure that the score out of 10 applied for each issue takes into account the relevant criteria, across which a judgement has to be applied. As part of this, the assessment has to make some judgement regarding the corresponding magnitude of severity relevant to each criterion.
This is achieved through applying a maxima across Hazard and Likelihood tables as shown in Figure 2.
OH&S
Environment
Capital Replacement
Availability Cost
Efficiency Loss
1
No risk at all
<$100
Redundant item
1 shift to organise
2
Unlikely to impose risk
<$500
> 2 shifts to organise
3
Irritation
Litter
<$1,000
Minor part availability loss
4
<$5,000
Exposed if another part fails
5
Minor Wound
Loud noise
<$10,000
Non Process Critical
2 months to organise
6
Wound
Process Critical
7
Loss of health
Minor release
<$100,000
6 months to organise
8
Disability
Major toxic release
<$250,000
Whole process down
1 year to organise
9
Major Impairment
Serious threat to people & environment
<$500,000
Factory down
10
Death
Population health endangered
<$1,000,000
Hazard table - a possibility
Condition
History of Failure
Design Severity
Working Environment
Level of Use
1
As new
Once in 5 years
Stable/robust
Friendly
Hardly ever 7
design
environment
2
Refurbished completely
Once in 3 years
Twice per day
3
Regularly maintained
Once per year
Average
One full day per week
4
Minor part problem
Once per 6 months
Minor erosion/
corrosion
2 full days per week
5
Once per month
Lack or protection - corrosion/ wear
6
Low speed machinery
2 shifts by 5 days
7
Major part problem
Average speed machinery
Permanently
8
Once per week
Highly dynamic machinery
Inappropriate work environment
9
More than once per week
Slim, sensitive and highly dynamic
Corrosion/ erosion
10
Hazardous
Inappropriate corrosion/ erosion
Likelihood table
Figure 2 Hazard and likelihood tables
In the case that a number of hazard levels are possible under the same criteria, the calculation should issue a number of criticality possibilities which are individually logged on to the cumulative frequency analysis. This is because the relationship between these possibilities is an OR function which is additive in probability theory. Hence a substation transformer may have a criticality logged for explosive failure and another one for leakage of oil.
A risk audit should be conducted across all assets at least once per 5 years. 5 years represents a half life between major refurbishment for most industrial plant. The audit must embrace all aspects of the asset including static aspects which normally do not receive any attention during their service life. These could include storm water run-off, the structural beams of a building, foundation plates for machinery and so on. For more critical items which are known to be more prone to failure, the audit may need to be conducted on an annual basis. A sample range of frequencies is shown below:
Annual
2-3 years
5 years
Rotating plant
Electric motors
Turbomachines
Static plant
Load bearing elements, flexible hangers, material handling
Furnaces, pipe work
Machine bases
Civil plant
Lighting, roadways
Sewage, fresh water
Bridgework, building frame, storm water
8
An aspect of an organisation’s maintenance policy document is to considerably extend this table in order to set guidelines for the design of preventative maintenance programs.
The conduct of an audit should combine information from the following sources:
1. Criticality assessment
2. Downtime records
3. Condition assessment
4. Maintenance history
It may be supplemented by a fifth piece of information:
5. Design review - status of latest technology
Due diligence is demonstrated when a company completes a criticality audit on its outstanding capital works program. If the criticality work sheets indicated above are applied to individual jobs, each of which costs a sum of $xi, then the following plot may be created. 0501001502002503003505152535455565758595Criticality$
Individual Costs 050010001500200025005152535455565758595Criticality$
Cumulative Totals
Figure 3 Capital works criticality diagram
9
The statistical analysis of these diagrams provides a level of risk reduction which may be associated with a total budget expenditure. The demonstration of due diligence comes from the comparison of successive capital analyses as shown in Figure 4.
050010001500200025005101520253035404550556065707580859095100Criticality$Year 1Year 2
Figure 4 Comparison of annual capital budgets
In comparing the example set in Figure 4, a total spending of $500K will eradicate risk within the asset base up to the following levels for each year:
Year 1
82%
Year 2
72%
Hence diligence is shown by the decreasing level of risk cut-off which is being addressed by the capital program. Another way of looking at the result is to consider what is the necessary expenditure to complete all projects with a criticality level of 80%:
Year 1
$657K
Year 2
$273K
It should be noted that the example which has been developed for this manual demonstrates significant improvements only at the higher levels of the criticality range. This would follow in practice with capital probably only applying to either high consequence issues or issues of immediate urgency which address a likely hazard.
5. Acceptable Level of Risk
The levels of risk are shown in Figure 5, [7]. The key point to this diagram is the region described as ALARP - as low as reasonably possible. To assist a person exploiting this approach, the grades of the diagram may be in terms of the criticality function described above.
10
Intolerable Level(Risk cannot bejustified on anygrounds)ALARP Region(Risk is undertakenfor an identifiablebenefit)Broadly AcceptableRegion(No need for detailedrisk examination)Tolerable only if cost is grosslydisproportionate to improvementgainedTolerable if cost of reductionexceeds benefit to be gainedLevel of acceptablerisk?
Figure 5 Level of risk
The management of work which is to counter risk which falls in to the ALARP range is based on a judgement as to what constitutes acceptable risk. In one approach, company policy can be established on a F-N criteria, as shown on Figure 6.
Frequencyof failuresper yearNumber of fatalities10100100010-110-210-310-410-510-6ABCA - Unacceptable limitB - Acceptable limitC - UK nuclear industryrisk target
Figure 6 F-N criteria for risk
The term fatalities may be replaced by casualties. A company may employ a quantitative graph such as Figure 6 as part of its policy on risk containment. This requires interpretation of the hazard criteria described above as a measureable value such as number of fatalities or number of casualties per year. Hence the definition of an
11
acceptable level of risk is dependent on the context of the operating environment and is subject to an individual company’s policy. There is no definitive recommendation on acceptable risk as yet released in the public literature. The F-N curves provide a quantifiable method for tracking a domain in which this level may be said to lie.
6. Conclusion
In a recent review of a medium-sized manufacturing company, a colleague of the author wrote:
Current maintenance practices within the company are not conducive to sound planning. Workshops are operating close to maximum capacity and are squarely focused on output. Assets are not being made available for routine preventative maintenance and in some instances, for lubrication. As a result, approximately 80% of the Maintenance Section’s actions are breakdown responses. Only 20 to 25% of plans and equipment have planned preventative maintenance routines and there is no ongoing programme to develop more. Most of the planned maintenance activities occur during the two week Christmas shut-down period.
This is a classic appraisal of a company with no element of risk management within its maintenance planning approach. Risk management provides anticipation of problems which underlies a preventative maintenance approach. In addition, a commitment to risk management will ensure that the ongoing program referred to in this comment will become part of the company’s management policy.
Risk analysis is fundamental in identifying necessary maintenance work which may be otherwise overlooked. If a safety item such as a release valve, an isolating valve or a moving guard breaks down, then production is not likely to be affected. As a consequence there is some incentive to either tolerate or even miss the fault. However, the failure of such equipment then places the company in a dangerous position, particularly in the event of an incident. Risk analysis provides the audit process to determine if the company has taken every precaution to avoid unnecessary hazards. In the event that an incident does occur, then the company is in some way protected from both insurance and legal implications.
As one last incentive, in the current stage of development of the industrial society, the majority of companies have poor records concerning the reliability of their equipment. They do not have a base of information to lead off maintenance improvement or ensure that their current mix of preventative maintenance procedures is effective or thorough. In the absence of reliability information, the mix of information included in the criticality assessment procedure described in this paper can provide a reasonable basis for maintenance improvement.
12
References
1. E. Zebroski, Sources of common cause failures in decision-making involved in man-made catastrophes, Advances in Risk Analysis, Volume 7, 1989, Plenum Publishing.
2. W.E. Gale et al, Human factors in operational reliability of offshore production platforms: The Fire and Life Safety Assessment Index Methodology (FLAIM), ASME Pressure Vessel and Piping Conf., PVP-Vol 296/SERA-Vol 3 Risk and Safety Assessment: Where is the Balance?, Hawaii, 1995.
3. W.F. Kenney, Process Risk Management Systems, VCH Publishers Inc., 1993
4. L. Hutchinson, Application of an integrated software program for optimising preventive maintenance programs, Fossil Plant Maintenance Conference, Electric Power Research Institute, Baltimore, 1996
5. R.A. Platfoot, Reduction of plant downtime due to informed maintenance planning and tailoring the maintenance system for production, Maintenance Management Strategies, IIR Pty Ltd, Sydney, February, 1997.
6. J.E. Aller et al, Risk based inspection for the petrochemical industry, ASME Pressure Vessel and Piping Conf., PVP-Vol 296/SERA-Vol 3 Risk and Safety Assessment: Where is the Balance?, Hawaii, 1995.
7. J. Cross, Risk Management, Master of Business and Technology course, University of New South Wales, 1996.
13