PPRuNe Forums - View Single Post - Automation Bogie raises it's head yet again

18th Jan 2011, 07:18

#145 (permalink)

PBL

Join Date: Sep 2000

Location: Bielefeld, Germany

Posts: 955

Likes: 0

Received 0 Likes on 0 Posts

Terpster, Dozy, fdr,

there is such a lot to say that it has taken me some time to think about how to reduce it to a few sentences. Let me first say that I am glad you all take these issues seriously. I think that Cali could provide a good test example for thoughts and tropes about automation and human operation.

First, terpster asked for references to our work on Cali. Here an extended answer. The paper AG RVS - Analysing the Cali Accident With a WB-Graph was presented at the first Human Error and Systems Development workshop, organised by Chris Johnson in Glasgow in March 1997. It is now called Human Error, Safety, and Systems Development and has had seven meetings. It is now due for the 8th, but I haven't heard. Chris also started a workshop on Accident Analysis, of which the first two meetings, in Glasgow and Virginia (organised by John Knight and Michael Holloway) were superb, but then petered out. The problem seemed to be that everyone wanted to come and talk, but no one would/could submit a paper (we got three submissions the next year!). That is, everyone is interested in accidents (witness the explosion of threads here on recent accidents) but few actually want to work on them, that is, submit an analysis in public and subject it to open criticism. There is no open forum for accident analysis, not even in the context of the premier system safety conferences. One should ask why this is. I was on the Editorial Board of the UK IMechE Journal of Risk and Reliability for a number of years, responsible for system safety and accident analysis, and we got not one submission. Even the people I worked closely with on the Bieleschweig Workshops submitted not one paper on accident analysis during the entire time. Bieleschweig was all just talk and slides (except for the papers I produced) - a good example of the PowerPoint syndrome.

The paper was incorporated into the first draft of a Causal System Analysis text in 1991:
http://www.rvs.uni-bielefeld.de/publ...i_accident.pdf.

This text will not appear as it is. It has been split (and extended) into a text on system safety for computer-based systems, which is in draft form, and there will be a separate text on WBA. The original turned out to be simply too hard to teach from. It includes some logic, and proof of correctness of an accident analysis (according to explicit criteria) and it turns out that no one who is interested in accident analysis has the background in logic to be able to read this, let alone apply it themselves. Even after ten years. So I have given up this approach to teaching.

The last paper is AG RVS - Comments on Confusing Conversation at Cali, which will be incorporated in extended form in an article on safety-critical communication protocols in the Handbook of Technical Communication, ed. Gibbon and Mehler, Mouton de Gruyter, Berlin, to appear 2011.

Terpster, you will see if you look at these carefully that we knew about your work when you wrote it! Thanks for including some of it again above.

Second, I would like to emphasise and expand the view proposed by fdr. There is a context in which the human pilots played out their fatal game. Terpster says earlier here "the pilots did this and this wrong" and puts responsibility solely there, with the human pilots. But in his older writing which he quotes, he acknowledges the context (namely, what is "common acceptable behavior" in US operations) and puts the responsibility for encouraging/allowing that context to develop solely in the hands of the US regulatory authority (FAA). I want to say: thank you for making my point for me!

Third, let me say explicitly what terpster repeatedly points out: it is true that the pilots did not follow good, safe, TERPS-consistent procedure. But then, as terpster keeps pointing out, this is endemic in US operations. Not only that, but there are parts of the world (dare one say, Colombia?) which are not necessarily regulated by TERPS (indeed, one might say anywhere outside the US). Pilots cannot be expected to know all the approach-design criteria in use all over the world, just as those pilots regularly flying across German airspace cannot be expected to have read and understood German air law (first of all, it's in German; second, even if you know German, German-legal-speak is a different language with some - and I emphasis some - syntax in common).

There is no point of disagreement with the fact that the Cali pilots did not follow advisable, safe procedure. But I disagree strongly that that is the only factor (even terpster must back away from that claim, as his indictment of the FAA shows). I would even doubt that it is the most important factor, given that that kind of behavior is pervasive, as terpster points out, and most people behaving like that don't crash. There is a line of thinking about explanation, which I shall call "contrastive explanation" after the late Peter Lipton (Inference to the Best Explanation, 2nd edition, Routledge, London, 2004), which proposes that explanatory causal factors are those factors which were different in the (in this case) fatal case from how they are in all the non-fatal cases. If we are to explain contrastively (and Lipton gives deep arguments why we do and should, which I have not completely digested), then the contrast is not in how these pilots accepted a clearance and tried to clarify what that clearance was, but in the to them misleading naming conventions and the misleading "affirmative"s uttered by the controller when he knew that "negative" was the correct procedural response. (Terpster, if we are to lay responsibility on pilots for not following defined procedure, why not on the controller for not following defined procedure? The only answer could be: the controller is there to ensure separation only. But of course that is not his only role. There was only one aircraft around, so it cannot be. He is also issuing approach clearances, distributing critical information, and, one would forlornly hope, trying to ensure the approach is more or less followed).

Fourth, just as the pilots obtained misleading information through their miscommunication with the controller, the pilots also obtained misleading information from the nav database, whose detailed and sometimes whimsical design they were not completely familiar with (and, indeed, it takes a computer expert to be completely familiar with such things. That's my day job). Now, one may want to argue who is responsible for that? The pilots who were misled and "should have known better", or the DB designer who should have thought through the safety consequences of the design decision (routinely: hazard analysis, risk analysis, elimination and mitigation. I would bet you that no hazard analysis as we system safety people teach it was performed on that DB design before it was used)? The answer, surely, is that assigning responsibility is a different question from determining causality. As a causal factor, it is irrefutably there. Similarly with the FMS. Let me also say that the manufacturer, Honeywell, is very concerned with such questions, not particular as a consequence of the Cali accident and the adverse court decision but because they have some very smart people there who take such things very seriously indeed.

There is lots more to say, but let me (almost) quit here. The final thing is that it is futile to continue to put the majority of the responsibility on people not following procedure to the letter. They never do. People working their roles in complex systems optimise their roles according to criteria local to them (e.g., "I can get my job done faster with less fuss, and have more time to think about the *important* things, that is, what I consider important"). This is a pervasive phenomenon which has been identified independently in two noteworthy works and is probably about as permanent a feature of human operations in complex systems as there is. You cannot wish it away by saying "people should have followed procedure to the letter", because they never, or almost never, do. The phenomenon was identified first by Jens Rasmussen and called "migration to the boundary" (in his Accimaps paper from 1997). It was also rediscovered, ostensibly independently, by Scott Snook in his work on the Iraq Black Hawk friendly-fire shootdown, where he called it "practical drift".

So in some sense terpsters admonition that people should be sticking to procedure is tilting at windmills, if you take it at face value. The only way to change the human operators' habits is to create a context which does not allow them the latitude to "optimise" their work to the point at which safety is diminished.

Such contexts could be created at the carrier by, for example, instituting a rule that all approaches are to be flown as published. Then up go the fuel bills! Up go the flight times! Controllers in busy airspace such as NY, SF and LA are terminally aggravated! In short, the whole way in which a major carrier uses airspace and gets along with the rest of the system is radically changed. Won't work.

In contrast, fixing a DB design or an FMS design is easy.

Terpster says some of that design is still with us. I have my opinions on that, and I am working hard in standards circles to see that things evolve for the better.

PBL