PPRuNe Forums - View Single Post - Merged: Erebus site launched
View Single Post
Old 7th Jul 2009, 03:26
  #110 (permalink)  
Brian Abraham
 
Join Date: Aug 2003
Location: Sale, Australia
Age: 80
Posts: 3,832
Likes: 0
Received 0 Likes on 0 Posts
Given that we are treading old ground it may be timely to reproduce the following in order to gain some understanding of how accidents occur.

Sidney Dekker
Associate Professor
Centre for Human Factors in Aviation, IKP
Linköping Institute of Technology
SE - 581 83 Linköping
Sweden

Punishing People or Learning from Failure?
The choice is ours
Disinheriting Fitts and Jones '47
Abstract

In this paper I describe how Fitts and Jones laid the foundation for aviation human factors by trying to understand why human errors made sense given the circumstances surrounding people at the time. Fitts and Jones remind us that human error is not the cause of failure, but a symptom of failure, and that "human error"—by any other name or by any other human—should be the starting point of our investigations, not the conclusion. Although most in aviation human factors embrace this view in principle, practice often leads us to the old view of human error which sees human error as the chief threat to system safety. I discuss two practices by which we quickly regress into the old view and disinherit Fitts and Jones: (1) the punishment of individuals, and (2) error classification systems. In contrast, real progress on safety can be made by understanding how people create safety, and by understanding how the creation of safety can break down in resourcelimited systems that pursue multiple competing goals. I argue that we should de-emphasize the search for causes of failure and concentrate instead on mechanisms by which failure succeeds, by which the creation of safety breaks down.

Keywords: human error, mechanisms of failure, safety culture, human factors, classification, creation of safety

Introduction
The groundwork for human factors in aviation lies in a couple of studies done by Paul Fitts and his colleague Jones right after World War II. Fitts and Jones (1947) found how features of World War II airplane cockpits systematically influenced the way in which pilots made errors. For example, pilots confused the flap and gear handles because these typically looked and felt the same and were co-located. Or they mixed up the locations of throttle, mixture and propeller controls because these kept changing across different cockpits. Human error was the starting point for Fitts' and Jones' studies—not the conclusion. The label "pilot error" was deemed unsatisfactory, and used as a pointer to hunt for deeper, more systemic conditions that led to consistent trouble. The idea these studies convey to us is that mistakes actually make sense once we understand features of the engineered world that surrounds people. Human errors are systematically connected to features of people's tools and tasks. The insight, at the time as it is now, was profound: the world is not unchangeable; systems are not static, not simply given. We can re-tool, re-build, re-design, and thus influence the way in which people perform. This, indeed, is the historical imperative of human factors understanding why people do what they do so we can tweak, change the world in which they work and shape their assessments and actions accordingly.

Years later, aerospace human factors extended the Fitts and Jones work. Increasingly, we realized how trade-offs by people at the sharp end are influenced by what happens at the blunt end of their operating worlds; their organizations (Maurino et al., 1995). Organizations make resources available for people to use in local workplaces (tools, training, teammates) but put constraints on what goes on there at the same time (time pressures, economic considerations), which in turn influences the way in which people decide and act in context (Woods et al., 1994; Reason, 1997). Again, what people do makes sense on the basis of the circumstances surrounding them, but now circumstances that reach far beyond their immediate engineered interfaces. This realization has put the Fitts and Jones premise to work in organizational contexts, for example changing workplace conditions or reducing working hours or de-emphasizing production to encourage safer trade-offs on the line (e.g. the "no fault go-around policy" held by many airlines today, where no (nasty) questions will be asked if a pilot breaks off his attempt to land). Human error is still systematically connected to features of people's tools and tasks, and, as acknowledged more recently, their operational and organizational environment.

Two views of human error
These realizations of aviation human factors pit one view of human error against another. In fact, these are two views of human error that are almost totally irreconcilable. If you believe one or pursue countermeasures on its basis, you truly are not able to embrace the tenets and putative investments in safety of the other. The two ways of looking at human error are that we can see human error as a cause of failure, or we can see human error as a symptom of failure (Woods et al., 1994). The two views have recently been characterized as the old view of human error versus the new view (Cook, Render & Woods, 2000; AMA, 1998; Reason, 2000) and painted as fundamentally irreconcilable perspectives on the human contribution to system success and failure.

In the old view of human error:

• Human error is the cause of many accidents.
• The system in which people work is basically safe; success is intrinsic. The chief threat to safety comes from the inherent unreliability of people.
• Progress on safety can be made by protecting the system from unreliable humans through selection, proceduralization, automation, training and discipline.
This old view was the one that Fitts and Jones remind us to be skeptical of. Instead, implicit in their work was the new view of human error:
• Human error is a symptom of trouble deeper inside the system.
• Safety is not inherent in systems. The systems themselves are contradictions between multiple goals that people must pursue simultaneously. People have to create safety.
• Human error is systematically connected to features of peoples tools, tasks and operating environment. Progress on safety comes from understanding and influencing these connections.

Perhaps everyone in aviation human factors wants to pursue the new view. And most people and organizations certainly posture as if that is exactly what they do. Indeed, it is not difficult to find proponents of the new view—in principle—in aerospace human factors. For example:

"...simply writing off aviation accidents merely to pilot error is an overly simplistic, if not naive, approach.... After all, it is well established that accidents cannot be attributed to a single cause, or in most instances, even a single individual. In fact, even the identification of a 'primary' cause is fraught with problems. Instead, aviation accidents are the result of a number of causes..." (Shappell & Wiegmann, 2001, p. 60).

In practice, however, attempts to pursue the causes of system failure according to the new view can become retreads of the old view of human error. In practice, getting away from the tendency to judge instead of explain turns out to be difficult; avoiding the fundamental attribution error remains very hard; we tend to blame the man-in-the-loop. This is not because we aim to blame—in fact, we probably intend the opposite. But roads that lead to the old view in aviation human factors are paved with intentions to follow the new view. In practice, we all too often choose to disinherit Fitts and Jones '47, frequently without even knowing it. In this paper, I try to shed some light on how this happens, by looking at the pursuit of individual culprits in the wake of failure, and at error classification systems. I then move on to the new view of human error, extending it with the idea that we should de-emphasize the search for causes and instead concentrate on understanding and describing the mechanisms by which failure succeeds.

The Bad Apple Theory I: Punish the culprits
Progress on safety in the old view of human error relies on selection, training and discipline— weeding and tweaking the nature of human attributes in complex systems that themselves are basically safe and immutable. For example, Kern (1999) characterizes "rogue pilots" as extremely unreliable elements, which the system, itself safe, needs to identify and contain or exile:

"Rogue pilots are a silent menace, undermining aviation and threatening lives and property every day.... Rogues are a unique brand of undisciplined pilots who place their own egos above all else—endangering themselves, other pilots and their passengers, and everyone over whom they fly. They are found in the cockpits of major airliners, military jets and in general aviation...just one poor decision or temptation away from fiery disaster."

The system, in other words, contains bad apples. In order to achieve safety, it needs to get rid of them, limit their contribution to death and destruction by discipline, training or taking them to court (e.g. Wilkinson, 1994). In a recent comment, Aviation Week and Space Technology (North, 2000) discusses Valujet 592 which crashed after take-off from Miami airport because oxygen generators in its forward cargo hold had caught fire. The generators had been loaded onto the airplane without shipping caps in place, by employees of a maintenance contractor, who were subsequently prosecuted. The editor:

"...strongly believed the failure of SabreTech employees to put caps on oxygen generators constituted willful negligence that led to the killing of 110 passengers and crew. Prosecutors were right to bring charges. There has to be some fear that not doing one's job correctly could lead to prosecution." (p. 66)

Fear as investment in safety? This is a bizarre notion. If we want to know how to learn from failure, the balance of scientific opinion is quite clear: fear doesn't work. In fact, it corrupts opportunities to learn. Instilling fear does the opposite of what a system concerned with safety really needs: learn from failure by learning about it before it happens. This is what safety cultures are all about: cultures that allow the boss to hear bad news. Fear stifles the flow of safety-related information—the prime ingredient of a safety culture (Reason, 1997). People will think twice about going to the boss with bad news if the fear of punishment is hanging over their heads. Many people believe that we can punish and learn at the same time. This is a complete illusion. The two are mutually exclusive. Punishing is about keeping our beliefs in a basically safe system intact.

Learning is about changing these beliefs, and changing the system. Punishing is about seeing the culprits as unique parts of the failure. Learning is about seeing the failure as a part of the system. Punishing is about stifling the flow of safety-related information. Learning is about increasing that flow. Punishing is about closure, about moving beyond the terrible event. Learning is about continuity, about the continuous improvement that comes from firmly integrating the terrible event in what the system knows about itself. Punishing is about not getting caught the next time. Learning is about countermeasures that remove error-producing conditions so there won't be a next time.

The construction of cause
Framing the cause of the Valujet disaster as the decision by maintenance employees to place unexpended oxygen generators onboard without shipping caps in place immediately implies a wrong decision, a missed opportunity to prevent disaster, a disregard of safety rules and practices.

Framing of the cause as a decision leads to the identification of responsibility of people who made that decision which in turns leads to the legal pursuit of them as culprits. The Bad Apple Theory reigns supreme. It also implies that cause can be found, neatly and objectively, in the rubble. The opposite is true. We don't find causes. We construct cause. "Human error", if there were such a thing, is not a question of individual single-point failures to notice or process—not in this story and probably not in any story of breakdowns in flight safety. Practice that goes sour spreads out over time and in space, touching all the areas that usually make practitioners successful. The "errors" are not surprising brain slips that we can beat out of people by dragging them before a jury. Instead, errors are series of actions and assessments that are systematically connected to people's tools and tasks and environment; actions and assessments that often make complete sense when viewed from inside their situation. Were one to trace "the cause" of failure, the causal network would fan out immediately, like cracks in a window, with only the investigator determining when to stop looking because the evidence will not do it for him or her. There is no single cause. Neither for success, nor for failure.

The SabreTech maintenance employees inhabited a world of boss-men and sudden firings. It was a world of language difficulties—not just because many were Spanish speakers in an environment of English engineering language, as described by Langewiesche (1998, p. 228):

"Here is what really happened. Nearly 600 people logged work time against the three Valujet airplanes in SabreTech's Miami hangar; of them 72 workers logged 910 hours across several weeks against the job of replacing the "expired" oxygen generators—those at the end of their approved lives. According to the supplied Valujet work card 0069, the second step of the sevenstep process was: 'If the generator has not been expended install shipping cap on the firing pin.' This required a gang of hard-pressed mechanics to draw a distinction between canisters that were 'expired', meaning the ones they were removing, and canisters that were not 'expended', meaning the same ones, loaded and ready to fire, on which they were now expected to put nonexistent caps. Also involved were canisters which were expired and expended, and others which were not expired but were expended. And then, of course, there was the simpler thing—a set of new replacement canisters, which were both unexpended and unexpired."

And, oh by the way, as you may already have picked up: there were no shipping caps to be found in Miami. How can we prosecute people for not installing something we do not provide them with? The pursuit of culprits disinherits the legacy of Fitts and Jones. One has to side with Hawkins (1987, p. 127) who argues that exhortation (via punishment, discipline or whatever measure) "is unlikely to have any long-term effect unless the exhortation is accompanied by other measures... A more profound inquiry into the nature of the forces which drive the activities of people is necessary in order to learn whether they can be manipulated and if so, how". Indeed, this was Fitts's and Jones's insight all along. If researchers could understand and modify the situation in which humans were required to perform, they could understand and modify the performance that went on inside of it. Central to this idea is the local rationality principle (Simon, 1969; Woods et al., 1994). People do reasonable, or locally rational things given their tools, their multiple goals and pressures, their knowledge and their limited resources. Human error is a symptom—a symptom of irreconcilable constraints and pressures deeper inside a system; a pointer to systemic trouble further upstream.

The Bad Apple Theory II: Error classification systems
In order to lead people (e.g. investigators) to the sources of human error as inspired by Fitts and Jones '47, a number of error classification systems have been developed in aviation (e.g. the Threat and Error Management Model (e.g. Helmreich et al., 1999; Helmreich, 2000) and the Human Factors Analysis and Classification System (HFACS, Shappell & Wiegmann, 2001)). The biggest trap in both error methods is the illusion that classification is the same as analysis. While classification systems intend to provide investigators more insight into the background of human error, they actually risk trotting down a garden path toward judgments of people instead of explanations of their performance; toward shifting blame higher and further into or even out of organizational echelons, but always onto others. Several false ideas about human error pervade these classification systems, all of which put them onto the road to The Bad Apple Theory.

First, error classification systems assume that we can meaningfully count and tabulate human errors. Human error "in the wild", however—as it occurs in natural complex settings—resists tabulation because of the complex interactions, the long and twisted pathways to breakdown and the context-dependency and diversity of human intention and action. Labeling certain assessments or actions in the swirl of human and social and technical activity as causal, or as "errors" and counting them in some database, is entirely arbitrary and ultimately meaningless. Also, we can never agree on what we mean by error:

• Do we count errors as causes of failure? For example: This event was due to human error.
• Or as the failure itself? For example: The pilot's selection of that mode was an error.
• Or as a process, or, more specifically, as a departure from some kind of standard? This may be operating procedures, or simply good airmanship.

Depending on what you use as standard, you will come to different conclusions about what is an error.

Counting and coarsely classifying surface variabilities is protoscientific at best. Counting does not make science, or even useful practice, since interventions on the basis of surface variability will merely peck away at the margins of an issue. A focus on superficial similarities blocks our ability to see deeper relationships and subtleties. It disconnects performance fragments from the context that brought them forth, from the context that accompanied them; that gave them meaning; and that holds the keys to their explanation. Instead it renders performance fragments denuded: as uncloaked, context-less, meaningless shrapnel scattered across broad classifications in the wake of an observer's arbitrary judgment.

Second, while the original Fitts and Jones legacy lives on very strongly in human factors (for example in Norman (1994) who calls technology something that can make us either smart or dumb), human error classification systems often pay little attention to systematic and detailed nature of the connection between error and people's tools. According to Helmreich (2000), "errors result from physiological and psychological limitations of humans. Causes of error include fatigue, workload, and fear, as well as cognitive overload, poor interpersonal communications, imperfect information processing, and flawed decision making" (p. 781). Gone are the systematic connections between people's assessments and actions on the one hand, and their tools and tasks on the other. In their place are purely human causes—sources of trouble that are endogenous; internal to the human component. Shappell and Wiegmann, following the original Reason (1990) division between latent failures and active failures, merely list an undifferentiated "poor design" only under potential organizational influences—the fourth level up in the causal stream that forms HFACS. Again, little effort is made to probe the systematic connections between human error and the engineered environment that people do their work in. The gaps that this leaves in our understanding of the sources of failure are daunting.

Third, Fitts and Jones remind us that it is counterproductive to say what people failed to do or should have done, since none of that explains why people did what they did (Dekker, 2001). With the intention of explaining why people did what they did, error classification systems help investigators label errors as "poor decisions", "failures to adhere to brief", "failures to prioritize attention", "improper procedure", and so forth (Shappell & Wiegmann, 2001, p. 63). These are not explanations, they are judgments. Similarly, they rely on fashionable labels that do little more than saying "human error" over and over again, re-inventing it under a more modern guise:

• Loss of CRM (Crew Resource Management) is one name for human error—the failure to invest in common ground, to share data that, in hindsight, turned out to have been significant.
• Complacency is also a name for human error—the failure to recognize the gravity of a situation or to adhere to standards of care or good practice.
• Non-compliance is a name for human error—the failure to follow rules or procedures that would keep the job safe.
• Loss of situation awareness is another name for human error—the failure to notice things that in hindsight turned out to be critical.
Instead of explanations of performance, these labels are judgments. For example, we judge people for not noticing what we now know to have been important data in their situation, calling it their error—their loss of situation awareness.
Brian Abraham is offline