PPRuNe Forums - View Single Post - Another 777 uncommanded engine rollback

21st Dec 2008, 09:11

#30 (permalink)

AnthonyGA

Join Date: Sep 2007

Location: Paris, France

Posts: 350

Likes: 0

Received 0 Likes on 0 Posts

Unfortunately, aviation software engineering isn't as different from other types of software engineering as many people in aviation would like to believe. The list of problems with aviation software is extremely long, although it is not widely publicized.

Most software fails because of design defects rather than bugs, although bugs are certainly important sources of problems. Aviation software is just as prone to design defects as any other type of software. Checking code against a spec will not prevent design defects. Having three versions of the code written by three different teams also will not prevent design defects. Even verifying bit patterns in firmware will not prevent design defects.

Design defects exist in mechanical systems, too. The problem is that defects in software manifest in a completely different way from defects in mechanical systems. Software systems usually have catastrophic modes of failure because of the disconnect that exists between bits and bytes and actual physical systems. Mechanical systems typically do not fail in catastrophic ways because of physical limitations that prevent failures from moving outside a certain envelope. Physical laws limit the magnitude and nature of mechanical failures. No such laws protect software.

A classic, simple example: Suppose you have a throttle lever that can mechanically travel between two stops, corresponding to idle and full power. A large number of different physical limitations prevent isolated failures from having dramatic effects. For instance, if the lower stop on the throttle fails, the worst that can happen is that the throttle will descend past idle—but since the engine is already at idle, that doesn't change much of anything. And if the upper stop fails, the throttle can move a bit past full power, but many other factors limit the effects of that: the throttle travel is only so far, the engines cannot run at two or three times full power no matter what the throttle setting, and so on. There's a lot of inevitable physical interaction that limits the effects of failures and design flaws. This interaction doesn't have to be built in by the designers, it's just a consequence of the laws of physics.

Now consider a "FBW" throttle. The throttle provides inputs to a computer, which then decides what setting to use for the actual engine throttle. The engine throttle settings are represented by a three-digit number, from 000 (idle) to 999 (full power). The throttle lever in the cockpit merely commands changes in throttle setting: pushing the throttle tells the computer to increase the throttle control number, and pulling the throttle back tells the computer to decrease the throttle control number.

Now you have a situation where the FBW can cause a catastrophic failure that would be impossible with the mechanical throttle. If the pilot advances the throttle repeatedly or very aggressively, the computer will continue to add to the throttle control number: 500, 501, 506, 600, and so on. What happens when the number reaches 999? Well, that's the problem. If the designers of the software haven't foreseen this in advance, adding one to 999 will produce 000. Which means that if the captain advances the throttle too far, the engines will be abruptly set to idle. That is catastrophic behavior that can cause a crash. And it WILL happen, UNLESS the designers foresee the possibility and program the software to not add anything to a throttle control number that is already at 999. The same problem exists if the pilots retard the throttles too far: they could go from 000 to 999.

In a mechanical system, there's no way for a throttle to snap back to idle if you push it past full power. In a FBW system, it's easy, and in fact it's guaranteed to happen unless the people who build the system specifically protect against the possibility.

That's why software is dangerous. Designers often overlook things, and the things they overlook can kill you a lot more easily than any mechanical failure could, because there are no laws of physics to limit the magnitude of a malfunction.

And while aviation is certainly more careful about software than many other fields, it's not nearly careful enough. I've been following the use of software in aviation for years, and it's grim. Way too much software is put in place with way too little thought, and periodically that causes problems, sometimes deadly problems.

The weird thing is that people tend to trust computers more than mechanical systems, when it should be the other way around. Yes, in theory, computers can do a better job, but as long as they have to depend on flawed software designed and written by human beings, they may actually be more dangerous than the mechanical systems they replace. It's interesting to see the blinders that people put on when they are looking at computer controlled systems.

They say that the rules of aviation are written in blood, but apparently that is only true if the blood is shed by mechanical systems. If software is at fault, people try to hide the problem, downplay the problem, and even resort to falsifying data to conceal the problem—they seem very unwilling to simply acknowledge and fix the problem, and equally unwilling to take responsibility for their design flaws and bugs. That kind of attitude might be more understandable when the only possible consequence is a day of bookkeeping lost, but I'm surprised to see it still in full force even when people die.