Alarm
management
Alarm management is the application of human
factors (or ergonomics as the field is referred to outside the U.S.)
along with instrumentation engineering and systems thinking to
manage the design of an alarm system to increase its usability. Most
often the major usability problem is that there are too many alarms
annunciated in a plant upset, commonly referred to as alarm flood,
since it is so similar to a flood caused by excessive rainfall input
with a basically fixed drainage output capacity. However, there can
also be other problems with an alarm system such as poorly designed
alarms, improperly set alarm points, ineffective annunciation,
unclear alarm messages, etc.
From their conception, large chemical, refining, power
generation, and other processing plants required the use of a
control system to keep the process operating successfully and
producing products. Due to the fragility of the components as
compared to the process, these control systems often required a
control room to protect them form the elements and process
conditions. In the early days of control rooms, they utilized what
were referred to as "panel boards" which were loaded with control
instruments and indicators. These were tied to sensors located in
the process streams and on the outside of process equipment. The
sensors relayed their information to the control instruments via
4-20 mA current loop in the form of twisted pair wiring. At first
these systems merely yielded information, and a well-trained
operator was required to make adjustments either by changing flow
rates, or altering energy inputs to keep the process within its
designed limits.
ALARMS were added to alert the operator to a condition
that was about to exceed a design limit, or had already exceeded a
design limit. Additionally, Emergency Shut Down (ESD) systems were
employed to halt a process that was in danger of exceeding either
safety, environmental or monetarily acceptable process limits.
ALARMS were indicated to the operator by Annunciator horns, and
lights of different colors. (For instance, green lights meant OK,
Yellow meant not OK, and Red meant BAD.) Panel boards were usually
laid out in a manner that replicated the process flow in the plant.
So instrumentation indicating operating units with the plant was
grouped together for recognition sake and ease of problem solution.
It was a simple matter to look at the entire panel board, and
discern whether any section of the plant was running poorly. This
was due to both the design of the instruments and the implementation
of the alarms associated with the instruments. Instrumentation
companies put a lot of effort into the design and individual layout
of the instruments they manufactured. To do this they employed
behavioral psychology practices which revealed how much information
a human being could collect in a quick glance. More complex plants
had more complex panel boards, and therefore often more human
operators or controllers.
Thus, in the early days of panel board systems, alarms
were regulated by both real estate, and cost. In essence, they were
limited by the amount of available board space, and the cost of
running wiring, and hooking up an Annunciator (horn), indicator
(light) and switches to flip to acknowledge, and clear a resolved
alarm. It was often the case that if you wanted a new alarm, you had
to decide which old one to give up.
As technology developed, the control system and control
methods were tasked to continue to advance a higher degree of plant
automation with each passing year. Highly complex material
processing called for highly complex control methodologies. Also,
global competition pushed manufacturing operations to increase
production while using less energy, and producing less waste. In the
days of the panel boards, a special kind of engineer was required to
understand a combination of the electronic equipment associated with
process measurement and control, the control algorithms necessary to
control the process (PID basics), and the actual process that was
being utilized to make the products. Around the mid 80's, we entered
the digital revolution. Digital control systems (DCS- originally
called Distributed Control Systems before they became digital) were
a boon to the industry. The engineer could now control the process
without having to understand the equipment necessary to perform the
control functions. Panel boards were no longer required, because all
of the information that once came across analog instruments could be
digitized, stuffed into a computer and manipulated to achieve the
same control actions once performed with amplifiers and
potentiometers.
As a side effect, that also meant that alarms were easy
and cheap to configure and deploy. You simply typed in a location, a
value to alarm on and set it to active. The unintended result was
that soon people alarmed everything. INitial installers set an alarm
at 80% and 20% of the operating range of any variable just as a
habit. One other unfortunate part of the digital revolution was that
what once covered several square yards of real estate, now had to be
fit into a 17 inch computer monitor. Multiple pages of information
was thus employed to replicate the information on the replaced panel
board. Alarms were utilized to tell an operator to go look at a page
he was not viewing. Alarms were used to tell an operator that a tank
was filling. Every mistake made in operations usually resulted in a
new alarm. With the implementation of the OSHA 1910 regulations,
HAZOPS studies usually requested several new alarms. Alarms were
everywhere. Incidents began to accrue as a combination of too much
data collided with too little useful information.
Alarm Management History
Recognizing that alarms were becoming a problem, users banded together and formed the Alarm Management Workgroup. It was an alliance of operating companies from Chemical and Petrochemical and refining operations. They gathered and wrote a document on the issues associated with alarm management. This group quickly realized that alarm problems were simply a subset of a larger problem, and formed the ASM consortium (ASM is a registered trademark of Honeywell, and stands for Abnormal Situation Management). See the website at The ASM consortium was originally a charter of NIST, (National institute of standards and Technology) and the group of users. Essentially, they realized that alarms exist because of a problem referred to as Situation Awareness.
The Alarm Management Consortium produced documents on best
practices in alarm management. It further produced documentation on
other best practices in operator situation awareness, operator
effectiveness, and other operator-oriented issues. Some of these
documents are available at their website.
The ASM consortium funded an alarm management guidelines published by the EEMUA in the UK. Providing data from their member companies, and contributing to the editing of the guidelines, they produced the EEMUA 191 "Alarm Systems- A Guide to Design, Management and Procurement".
Several institutions and societies are producing standards
on alarm management to assist their members in the best practices
use of alarms in industrial manufacturing systems. Among them are
the ISA (ISA SP-18), API (API 1167) and NAMUR (Namur NA 102).
Several companies also offer software packages to assist users in
dealing with alarm management issues. Among them are DCS
manufacturing companies, and third-party vendors who offer add-on
systems.
The fundamental purpose of alarm annunciation is to alert
the operator to deviations from normal operating conditions, i.e.
abnormal operating situations. The ultimate objective is to prevent,
or at least minimize, physical and economic loss through operator
intervention in response to the condition that was alarmed. For most
digital control system users, losses can result from situations that
threaten environmental safety, personnel safety, equipment
integrity, economy of operation, and product quality control as well
as plant throughput. A key factor in operator response effectiveness
is the speed and accuracy with which the operator can identify the
alarms that require immediate action.
By default, the assignment of alarm trip points and alarm
priorities constitute basic alarm management. Each individual alarm
is designed to provide an alert when that process indication
deviates from normal. The main problem with basic alarm management
is that these features are static. The resultant alarm annunciation
does not respond to changes in the mode of operation or the
operating conditions.
When a major piece of process equipment like a charge
pump, compressor, or fired heater shuts down, many alarms become
unnecessary. These alarms are no longer independent exceptions from
normal operation. They indicate, in that situation, secondary,
non-critical effects and no longer provide the operator with
important information. Similarly, during startup or shutdown of a
process unit, many alarms are not meaningful. This is often the case
because the static alarm conditions conflict with the required
operating criteria for startup and shutdown.
In all cases of major equipment failure, startups, and
shutdowns, the operator must search alarm annunciation displays and
analyze which alarms are significant. This wastes valuable time when
the operator needs to make important operating decisions and take
swift action. If the resultant flood of alarms becomes too great for
the operator to comprehend, then the basic alarm management system
has failed as a system that allows the operator to respond quickly
and accurately to the alarms that require immediate action. In such
cases, the operator has virtually no chance to minimize, let alone
prevent, a significant loss.
In short, one needs to extend the objectives of alarm
management beyond the basic level. It is not sufficient to utilize
multiple priority levels because priority itself is often dynamic.
Likewise, alarm disabling based on unit association or suppressing
audible annunciation based on priority do not provide dynamic,
selective alarm annunciation. The solution must be an alarm
management system that can dynamically filter the process alarms
based on the current plant operation and conditions so that only the
currently significant alarms are annunciated.
The fundamental purpose of dynamic alarm annunciation is
to alert the operator to relevant abnormal operating situations.
They include situations that have a necessary or possible operator
response to insure:
-
Personnel and Environmental Safety,
-
Equipment Integrity,
-
Product Quality Control.
The ultimate objectives are no different than the previous basic alarm annunciation management objectives. Dynamic alarm annunciation management focuses the operator’s attention by eliminating extraneous alarms, providing better recognition of critical problems, and insuring swifter, more accurate operator response.
Alarm management is usually necessary in a process
manufacturing environment that is controlled by an operator using a
control system, such as a Distributed Control System, or DCS or a
PLC, or Programmable Logic Controller. Such a system may have
hundreds of individual alarms that up until very recently have
probably been designed with only limited consideration of other
alarms in the system. Since humans can only do one thing at a time
and can pay attention to a limited number of things at a time, there
needs to be a way to ensure that alarms are presented at a rate that
can be assimilated by a human operator, particularly when the plant
is upset or in an unusual condition. Alarms also need to be capable
of directing the operator's attention to the most important problem
that he or she needs to act upon, using a priority to indicate
degree of importance or rank, for instance. A good example of this
problem is from the old US sitcom MASH. A common scene was
Radar O'Reilly slipping in a requisition for something that Hawkeye
wanted in the stack of papers for Colonel Potter to sign. In much
the same way, if alarms were un-prioritized, the important ones can
be mixed in with lower value nuisance ones.
The techniques for achieving rate reduction range from the
extremely simple ones of reducing nuisance and low value alarms to
redesigning the alarm system in a holistic way that considers the
relationships among individual alarms.
The first step in a continuous improvement program is
often to measure alarm rate, and resolve any chronic problems such
as alarms that have no use (often described as one that does not
require the operator to take an action).
This step involves documenting the methodology or
philosophy of how to design alarms. It can include things such as
what to alarm, standards for alarm annunciation and text messages,
how the operator will interact with the alarms, etc.
Documentation and Rationalization
This phase is a detailed review of all alarms to document
their design purpose, and to ensure that they are selected and set
properly and meet the design criteria. Ideally this stage will
result in a reduction of alarms, but doesn't always.
Advanced Methods
The above steps will often still fail to prevent an alarm
flood in an operational upset, so advanced methods such as alarm
suppression under certain circumstances are then necessary. As an
example, shutting down a pump will always cause a low flow alarm on
the pump outlet flow, so the low flow alarm may be suppressed if the
pump was shut down since it adds no value for the operator, because
he or she already knows it was caused by the pump being shutdown.
This technique can of course get very complicated and requires
considerable care in design. In the above case for instance, it can
be argued that the low flow alarm does add value as it confirms to
the operator that the pump has indeed stopped.
Alarm management becomes more and more necessary as the
complexity and size of manufacturing systems increases. A lot of the
need for alarm management also arises because alarms can be
configured on a DCS at nearly zero incremental cost, whereas in the
past on physical control panel systems that consisted of individual
pneumatic or electronic analog instruments, each alarm required
expenditure and control panel real estate, so more thought usually
went into the need for an alarm. Numerous disasters such as Three
Mile Island and the Chernobyl accident have established a clear need
for alarm management.
The Seven Steps to Alarm
Management
Step 1: Create and Adopt an Alarm Philosophy
A comprehensive design and guideline document that makes
it clear “exactly how to do alarms right.”
Step 2: Alarm Performance Benchmarking
Analyze the alarm system to determine its strengths and
deficiencies, and effectively map out a practical solution to
improve it.
Step 3: “Bad Actor” Alarm Resolution
From experience, it is known that around half of the
entire alarm load usually comes from a relatively few alarms. The
methods for making them work properly are documented, and can be
applied with minimum effort and maximum performance improvement.
Step 4: Alarm Documentation and Rationalization (D&R)
A full overhaul of the alarm system to ensure that each
alarm complies with the alarm philosophy and the principles of good
alarm management.
Step 5: Alarm System Audit and Enforcement
DCS alarm systems are notoriously easy to change and
generally lack proper security. Methods are needed to insure that
the alarm system does not drift from its rationalized state.
Step 6: Real-Time Alarm Management
More advanced alarm management techniques are often needed
to ensure that the alarm system properly supports, rather than
hinders, the operator in all operating scenarios. These include
Alarm Shelving, State-Based Alarming, and Alarm Flood Suppression
technologies.
Step 7: Control and Maintain Alarm System Performance
Proper management of change and longer term analysis and
KPI monitoring are needed, to ensure that the gains that have been
achieved from performing the steps above do not dwindle away over
time. Otherwise they will; the principle of “entropy” definitely
applies to an alarm system.
-
^ Jensen, Leslie D.
"Dynamic Alarm Management on an Ethylene Plant". Retrieved
2008-05-22.
-
"Better Alarm handling", from the British Government's Health
and Safety Executive (HSE)
-
EEMUA 191 Alarm Systems - A Guide to Design, Management and
Procurement (1999)
ISBN 0-85931-076-0
-
"Principles for alarm system design" Norwegian Petroleum
Directorate
-
Draft ISA SP18.02 Management of Alarm Systems for the Process
Industries