With an indelible marker, start labeling failed items and associated
items that are to be removed from the system.
Items, such as pipes, that have no clear orientation after removal
should be labeled in a way that their orientation is clear after removal,
i.e., top, bottom, horizontal, vertical, etc.
If there are several parts, label each one of them, with a simple
system such as "A", "B", "C", etc.; keep a logbook and write down what
each represents.
Use a digital camera to take area shots of the equipment with the
failed component and the surrounding areas and gradually work toward the
failed part.
Finally take close-ups of the actual failure at every angle possible.
After taking the above pictures, remove the failed component without
disturbing any of its surroundings. Take pictures as the equipment is
disassembled. Do not clean off any contamination, unless a safety risk is
involved.
After removing the parts, do not attempt to put the faces of the
failed components back to together. This can dislodge the real evidence
and create damage on the surface that will have to be explained during the
investigation.
Wrap the parts in plastic bags, tape them shut and label them using
the same convention you used to label the parts ("A", "B", "C", etc.).
Gather them all together and take several overall pictures to show what is
being transported to the laboratory.
Make a copy of your notes and your digital pictures and give them to
the lab along with your pictures.
Background
You just had a major
piece of equipment go down because of the failure of a metal component. How
do you get the most value out of what may otherwise be a disaster? In short,
can you make "lemonade" out of lemons. Should you conduct a failure
analysis, or perhaps, a Root Cause Investigation? What do you have to gain
by doing so? And, if you proceed, how do you do it right?
Introduction
Over $300 billion a year is lost by the process
industries due to corrosion-induced equipment failures and the downtime,
system re-engineering, equipment replacement, personal injury liability, and
other costly circumstances that may result. Additional failures result from
mechanical causes. When a failure occurs, process liquids may be released
from the system violating the Clean Water Act, among others. There can be
civil and criminal charges.
When the failure of a small metal component causes your
plant to shut down, short term, you have to get your plant back up and
running. What you may not realize is that you have been given a golden
opportunity to improve the reliability of your plant. If you learn from this
failure, you have a chance to improve the reliability of your process, your
internal procedures, and even your management abilities.
Criticality of Metal Failures
Our examples will be primarily about metal failures
caused by corrosion, but much of what is discussed here will be valid for
any type of metal or material failure. Metal failures of sophisticated
parts, such as impeller blades, can be debilitating, as delivery times of
metal parts from foundries, fabrication shops, welding shops, etc. can take
months, sometimes years in today's market. In the time it takes to get your
plant back on line, a profitable use of your time is to understand why this
metal part failed and address the root cause of the failure; otherwise, your
plant will face similar situations in the future.
Table 2.
Examples of Other Supporting Data
Recent process logs, physical or electronic.
Data sheets with original equipment specifications
Interviews with anyone present during the failure.
Incident Reports of the failure or previous failures of the same
equipment or process.
From Quality Control, Control Limits, Control Reports, and Excursions
for the Last Three Months.
Get process to explain the QC excursions and their cause.
If a mechanical component has failed, such as a shaft, get operating
speeds, motor amps, noise and vibration data if available, and again
compare it to the "normal".
From engineering, Design or Equipment Specifications and or Data
Sheets for the failed part.
Related Vendor data.
Failure Analysis, as the first Step
in a Root Cause Investigation
Services to understand
metal failures and to prevent their recurrence is a specialized and growing
field. However, bridging the gap between a specialized failure analysis from
a metallurgist and identifying and correcting the root cause of the failure
are two related, but different activities.
Every metal failure represents a unique opportunity to
improve the technical, production and management systems in your company.
Although each response must be tailored to the circumstances, some general
steps that must be taken to insure that your failure analysis is the first
step in your root-cause investigation include:
Table 3.
Typical Questions To Ask During the Laboratory
Investigation.
What is the mechanism of failure, some examples include:
A Mechanical Failure
(tensile, shear, fatigue, etc.)
A Corrosion Failure
(under-deposit, stress corrosion cracking, etc.)
A Fatigue Failure
Are there defects in the metal? Its assembly or
fabrication?
If it is a fatigue failure, are stress concentrators
present?
Did the part fail because of corrosion?
What type of corrosion?
What was the mechanism?
What is the quality of welds (if present)?
Protect the evidence, Table 1.
Conduct a failure analysis to identify the physical
cause of the failure and to characterize the type of failure.
Develop an action plan to address replacing the present
part, by necessity, usually before all the facts are gathered.
Review the process conditions, especially deviations in
the process, and make a determination of their role in the failure.
Review how management policies, company procedures, and
quality control, or the lack thereof, contributed to the failure. Use the
failed part as an example of how your company works. How was the metal
chosen, for what conditions, and how did this compare to actual process
conditions. If you could do it over, how would prevent this failure from
happening. Now apply this to knowledge to the future.
If appropriate, review the economic trade-offs between
process changes and material changes to prevent a future failure of the
same system.
Based on the insights gained from this failure, develop
an initiative to address: (1) preventing a future failure in the same
system, (2) modification of management policies, systems, and procedures
to reduce the probability of failures in similar locations in the rest of
the plant, and (3) early warning systems (corrosion monitoring,
non-destructive testing, inspections) to determine the progress of
corrosion in high risk areas. If possible, employ an independent
consultant to play a devil's advocate throughout the entire reformation.
Table 4.
Typical Questions Asked in a Root Cause Investigation
of the Failure?
What is the purpose of this investigation?
Identifying who caused the problem?
Correcting the problem?
Reducing downtime? expense?
Are their legal ramifications?
Are their secondary damages?
Was someone hurt or killed?
Are their other parties with standing in the case?
What is the most appropriate report format?
Who is the Report For?
Is the process being operated outside of design conditions?
Are their inadequate controls in place to assure quality?
Is the part under-designed? poorly designed?
Is the part being exposed to loads/conditions it was not designed
for?
Was the correct material specified? Supplied? Used?
Was the process properly defined? Was a Process and/or a Material
Design Criteria Developed?
Was the process running within design specifications?
Did the process change?
Did process upsets contribute to the corrosion?
Are more parts in the system subject to the same conditions?
Protecting the Physical Evidence
When your plant goes down because of component failure, what you do in the
first few days, and even hours after the failure can protect evidence that
may be crucial in understanding why the component failed. Imagine your next
metal failure as a crime scene. Whether the failed part is a leaking pipe, a
shaft on a motor, a leaking heat exchanger, a discolored stainless steel
fixture, or, any other failures, it contains evidence on what caused the
failure.
If possible, get the metallurgist performing the failure
analysis involved in collecting this data. Let him know up front that he is
be involved in a root cause investigation, not just a characterization of a
failure.
Collecting Supporting Data
Next, set up a box, either physical or electronic to
collect data on the operating conditions, Table 2. It is important to
understand how the process or equipment was working compared to how it was
designed to work.
The laboratory receiving this information and samples
should review the documents furnished and be advised on what the
expectations are for the investigation . Any safety or potential legal
issues with the samples should be understood. There are ASTM guidelines
regarding the handling of evidence if a legal case is involved, ASTM E 860
and ASTM E 1188. Once this discussion occurs, the laboratory receiving the
documents and samples should log each sample and document received from the
client into a log, again physical or electronic. The laboratory should take
their own pictures and label each sample as it is removed from its
container. It is now possible to start the failure analysis and/or root
cause investigation with as much of the evidence intact as possible and all
the background information necessary to understand the failure, not only of
the component, but possibly of the system. The above is an ideal scenario.
In most cases, the investigator gets a broken piece of metal and a three
line email. It is not hard to imagine which approach has the potential of
improving future performance of your plant.
Table 5.
Typical Questions That Should Be Addressed
Immediately
What is the most economical way to resolve the
problem?
Upgrade the alloy?
Control the process better?
Are there secondary benefits of controlling the process better?
What is the most expeditious option?
What materials are available today?
Will changing alloys lead to galvanic problems?
What Design Life is Reasonable?
What are the Life Cycle Costs of the Options Under
Consideration?
Are spare parts available?
Can the Part Be Purchased or Made Quickly? What
material options exist?
What is the strategy? Get back on line now and deal
with long-term issues later?
If the same material is put back in service, how much
time do we have to affect needed changes in the process?
Laboratory Characterization of the
Failure, the Classic "Failure Analysis"
There are
literally hundreds of failure modes and an infinite combination of failure
modes than cause a piece of metal to fail. Fortunately, metals have been
used and studied for thousands of years and the greater majority of failure
analysis can be performed by simply collecting evidence
from the sample, using the background information and some common sense to
match the evidence with what is described in the literature. (It is the
exception that makes the failure analyst's job interesting.) Although each
failure analysis has to be tailored to the job, some typical steps in
collecting the laboratory evidence include:
Detailed photography at the site of the failure, before,
during, and after any cutting or actions that may potential destroy some of
the evidence on the samples. Many failure analysis is solved by going back
to pictures taken at a earlier stage in the investigation.
Table 6.
Typical Questions That Should Be Asked to Learn From
the Failure
Is the organization capable of dealing with the
Material Challenges Facing It?
Does some organizational changes need to be made to
deal with Material Challenges?
Do some departments need to be strengthened?
Is this our core competency? Do we need outside help
on an ongoing or part-time basis?
What is best long-term solution?
Is this a repetitive problem? A typical problem?
Should Corrective Action could include:
Reducing the Corrosivity of the Process Through
Improved Process Control or Process Modification
Alternative Materials of Construction
Improved Use of Inhibitors
Monitoring of Corrosive Conditions with
Electrochemical Probes or Corrosion Coupons
Field Measurement of the Rate of Corrosion with
Electrochemical Probes
Using corrosion probes as an indicator of a process
upsets?
Stereoscopic Examination and Photography,
before, during and after cleaning the failure site. It is important in many
corrosion failures to understand where a corrosion product is originating
and its composition. Also, the origin of the failure on the part can usually
be assessed at this stage. For example, beach marks on a fatigue failure can
often be followed on the fracture surface back to the point of origin.
The sample is cross-sectioned in the area of the failure and polished to a
mirror finish to look at the microstructure of the sample. Many corrosion
modes, such as stress corrosion cracking, will progress far below the
surface similar to a tree, and are best studied under a metallograph.
Chemical analysis of deposits and residues, taken from the failure site
often provides evidence on the corrodent involved in the failure. Normally
anion analysis are the most useful since the cations in the corrosion
products typically come from the metal being corroded. Halides, chlorides
especially, sulfates, and oxides are the most common corrodents. The
electrochemical corrosion process can often concentrate anions, especially
chlorides, inside of cracks and provide evident about the bulk environment
that may not be known.
Metallographic examination of key cross sections near the failure site
can be used to characterize the failure and the microstructure near the
failure site.
Scanning electron microscope can be used to characterize the failure
site/fracture site and characterize the failure mode. For instance,
intergranular attack, metal cleavage and ductile fractures could be
characterized by looking at the fracture surface on the SEM. An SEM equipped
with EDX (energy dispersive spectroscopy) can be used to identify corrosion
products in the cracks, on fracture faces, or to identify segregation of
alloying agents or the concentration of various elements between the grains
and the grain boundaries.
Summary
All of these tools enable the metallurgist to characterize
the alloy and failure and to identify any irregularities. Evaluation of the
evidence allows the investigator to begin answering some questions, Table 3. If
a Root Cause Investigation is to take place, the questions broaden to include
questions about the process, your engineering, your company’s quality control
department, etc., Table 4. After the Root Cause Investigation identifies the
probable cause of the failure, and possibly contributing factors, there is still
the question of what actions should be taken immediately, Table 3, and what
longer term improvements need to be made in the way your business is conducted,
Table 4.
As a tool during a Root Cause Investigation, a decision tree
can be developed for your company. The decision tree can be organized around the
type of failure or it can be organized around your functional departments and
management groups. Whatever the route, focus must be centered on improving the
quality and functionality of your organization, not on placing blame. A
simplified Root Cause Diagram is given in Figure 1 illustrating the elements
that go into a Root Cause Decision Tree.