ORCHSE Book Review: “Organizing for Reliability"
A Guide for Research and Practice
A Guide for Research and Practice
MIT’s John Carroll explains that “reliability represents an intersection of effectiveness, safety, and resilience…”. He cites HRO organizational/cultural attributes codified by Karl Wick and Kathleen Sutcliffe. The five attributes are:
• Preoccupation with Failure
• Reluctance to Simplify
• Sensitivity to Operations
• Commitment to Resilience
• Deference to Expertise
For more information on these concepts see their book Managing the Unexpected: Sustained Performance in a Complex World, 2015 edition.
Carroll also points out that in complex operations, no one person can be aware of all potential failure modes that reside in a big picture that is beyond their knowledge. When it comes to low frequency high consequence events, these rare events may never have occurred before.
“Expecting a person to interpret and respond to a unique event in the moment is like blaming the goalie in soccer or ice hockey for every goal—reducing shots on goal is everyone’s job in a team sport, whereas the goalie is the last line of defense”, says Carroll.
Dr. Carroll examines lessons from process safety and the 2005 BP Texas City refinery disaster that killed 15 and injured 180. “Process safety hazards are often invisible and can involve combinations of multiple pieces of equipment, materials in process, human actions, and computer software that cannot be understood just by looking at the screen. Nor will everyone doing what is in the procedure manual necessarily avoid accidents, since procedures are frequently missing, incomplete, confusing or wrong,” he explains.
Peter Madsen (BYU) and Vinit Desai (University of Colorado, Denver) explain that incentives for efficiency and short-term profitability are ever-present while major incidents are rare. This can result in drift away from valuing reliability and safety until a major mishap occurs. The Deepwater Horizon disaster is an example of this dynamic. In some organizations, the importance of learning from events is judged by their impact on profit after a major incident takes place. HROs are on guard to manage conflicting production and reliability goals by continuously verifying that the “real goals of the organization are the same as public goals”.
Madsen explains that, whereas HROs rarely experience disasters firsthand, learning only from your own incidents provides relatively little learning. Organizations that seek high reliability must learn from incidents and near misses from other companies as well as from other sectors around the world.
After the 1979 Three Mile Island accident, the nuclear power industry formed the Institute of Nuclear Power Operations (INPO), which facilitates broad-based learning that promotes reliability. INPO functions as a center for safety excellence that goes much further than compliance with regulations. After the Deepwater Horizon disaster, the President’s Oil Spill Commission recommended that the oil industry create its own INPO-type organization.
Paul Schulman (Mills College) and Emery Roe (University of California, Berkeley) discuss the role of safety regulators for reliability-seeking organizations and sectors. They observe that effective regulations and regulators can raise performance standards across a sector. This helps to prevent some rivals from undercutting safety in order to gain a competitive advantage. An effective regulator also impacts operational practices and culture regarding reliability.
The authors go on to explain that safety regulators should continuously examine and improve their own performance. Regulators need to evaluate the effectiveness of their regulations and inspection activities to see if they are promoting primarily minimum compliance rather than actual improvements in safety and reliability. The self-evaluation by regulators should also ask “to what degree might adversarial relations between regulators and organizations lead to formalization and rigidity of safety management?” Regulators should also ask if their approach results in reliance on a small set of compliance metrics that are retrospective rather than prospective.
Nuclear power plants in the U.S. are most often viewed as examples of high-reliability organizations. In terms of near-term avoidance of major incidents and unplanned outages their operations are reliable. However, if the time frame is expanded and we include environmental sustainability, the chronic unsolved problem of nuclear waste generation and storage can force us to rethink what we mean by reliability. Schulman and Roe explain that “reliability analysis, cast on an extended time frame, must inevitably address risks and consequences that current ‘reliable’ operations are likely to impose on future generations.
With a century-long legacy, the National Safety Council is a global center for safety expertise. Let's work together to align resources. We look forward to learning about ways we can join efforts to expand safety everywhere!