PDA

View Full Version : American investigates as 777 engine fails to respond to throttle


ppppilot
29th Feb 2008, 16:12
http://www.flightglobal.com/articles/2008/02/29/221923/american-investigates-as-777-engine-fails-to-respond-to-throttle.html:\
Tailwinds

BerksFlyer
29th Feb 2008, 16:14
Oh dear. Amazing how this particular problem has all of a sudden started coming up on the 777s isn't it?

Very strange.

HalloweenJack
29th Feb 2008, 16:43
All of American’s 777-200ERs are fitted with Rolls-Royce Trent 800


not to be a conspiracy post - but wasn`t the engines on BA 787`s also trents? maybe a software glitch?

Longtimer
29th Feb 2008, 16:47
So far, at least according to reports, only on RR powerplants. Hmmmmmm

I wonder if we will hear about any problems with the other powerplant options.

Powerplant (777-200) two Pratt & Whitney PW4077 turbofans
or two Rolls-Royce Trent 877 turbofans
or two General Electric GE90-77B turbofans
(777-200ER) two Pratt & Whitney PW4084/PW4090 turbofans
or two Rolls-Royce Trent 884/892/895 turbofans
or two General Electric GE90-85B/92B/94B turbofans
(777-200LR) two General Electric GE90-110B1 turbofans
(777-300) two Pratt & Whitney PW4098 turbofans
or two Rolls-Royce Trent 892 turbofans
(777-300ER) two General Electric GE90-115B turbofans

glad rag
29th Feb 2008, 16:56
As has been said here before, don't forget that to all intents and purposes it's two engines flying in close formation...................:}

Hmm how does the saying go, once is an incident, twice a coincidence, thrice???????????

Worrying for both operators, PAX and RR I would imagine.

Wiley
29th Feb 2008, 17:04
Wouldn't be at all surprised to find a directive coming out very soon telling every operator to revert to a software set that's at least three months old.

Zorst
29th Feb 2008, 18:44
As has been said here before, don't forget that to all intents and purposes it's two engines flying in close formation...................


Yes, it has been said before.

Sadly it's not true. Many systems are identical but paired, a few are much closer to each other than even the manufacturer believed...

Rightbase
29th Feb 2008, 19:02
Either it was a long shot coincidence losing both (assuming losing one is not a regular occurrence!)

or losing both is a different problem all together,

or It is one problem that can affect one or both engines.

There can't be many places to look for the source of a problem like that ... Perhaps the approach configuration/conditions ...

Puzzling. But I'm not an expert in how things go wrong.

ChristiaanJ
29th Feb 2008, 19:47
Something similar to BA038 happening again may concentrate minds wonderfully.

Zorst
29th Feb 2008, 19:52
Errr, I don't think there's been any lack of concentration - just an absence of results...

relax.jet
29th Feb 2008, 20:50
I`m going to Cuba next month, hope it`s not going to be 777 with RR :} (just joking)

gas path
29th Feb 2008, 22:56
Er! Before you all get too excited...... Autothrottle:hmm:

hmmm I see its already drifted into the other thread!

aarrboy1
1st Mar 2008, 03:21
I'm told the FO had his left arm on the autothrottle as he was holding the speed brake lever. This prevented the throttle from moving. No similarity with BA. This is standard AA procedure.

maui
1st Mar 2008, 03:40
aarboy

That would have to be some strangely configured arm that guy has. He'd probably be able to scratch his balls and pick his nose at the same time whilst playing the star spangled banner on a picollo, with a weapon like that.

M

aarrboy1
1st Mar 2008, 04:08
Well, everything's bigger in Texas.

maui
1st Mar 2008, 04:11
I'm truly in awe!

M

Xeque
1st Mar 2008, 04:44
Way back in the original BA038 thread I (tongue in cheek) said "Windows fatal error". Could it be? :hmm:

ppppilot
1st Mar 2008, 08:33
There are rumors about jeppesen is preparing new approach plates for the main airports, one or two engines at idle.

Airmotive
1st Mar 2008, 11:34
The FO has to reach over the throttles in order to reach the speed brake handle. Not only is it possible to interfere with auto-throttle movement, it's very easy to do and HAS HAPPENED numerous times before...

Usually the FO says, "Oops. Did I do that?"
After BA, they say, "Argh! Engine Failure! Engine Failure!"

Tags
1st Mar 2008, 11:57
Got a mate who is an engineer at an overseas station, he recalled recently a story of a Continental 777 a few years ago, that whilst on approach, one of the engines had shut itself down! Despite checks - no fault found!

Huck
1st Mar 2008, 12:40
It's considered bad form to have spoilers out with anything above idle thrust, at least in the Mighty Dog 11. With spoilers deployed I keep my forearm against the PLA's, just so I know if the autothrottles are coming up. Yes, from the right seat.

Orestes
1st Mar 2008, 13:28
So far, most of the speculation on the cause of the BA038 crash has focused on computer control (EMI? Software?) or fuel issues (Low fuel?Cold fuel? Water/ice? Waxing?).
This new info regarding the American Airlines 777 engine that failed to throttle up (and the little anecdote above regarding a mysterious Continental engine shutdown on approach) has led me to wonder if we've been ignoring a possible aerodynamic cause - i.e. engine inlet conditions? These incidents seem to have all occurred at the same point in the flight envelope - low & slow, high AOA on approach.

Any opinions?

Oluf Husted
1st Mar 2008, 15:59
Dear Orestes,

A possible aerodynamic cause could be ice on the back side of the front end core engine compressor blades, we don't know if this crew, or the BA38 crew, even had the engine anti-ice system "on" ?

Even if they had it "on" it does not work when power is at idle from top of descent (as was the case with BA38) to 2000 or 600 feet.

This kind of ice can/will cause the engines to "hesitate" but will long be gone,
when you start looking for it, if the airport temperature is well above freezing.

AAIBs conclusion, that (birds and) ice are ruled out, is way premature.

Read more about engine icing here: www.whistleblowers.dk (http://www.whistleblowers.dk)

Enjoy your reading and thinking.

AnthonyGA
1st Mar 2008, 17:50
If these two incidents are indeed as described, it sounds like a software problem. A mechanical problem would not show coincidentally and abruptly like this after years of presumably trouble-free operation, since the mechanical parts and designs of engines remain unchanged for relatively long periods. Software, on the other hand, is updated much more frequently, and if a recent update contained a bug …

cribble
1st Mar 2008, 21:23
Oluf
In my (Trent powered) 777s the norm is for engine anti-ice to be in "Auto" (I assume other operators do it similarly?): when the ice detection system decrees, the A/I valves are commanded open or closed.

MEL aside, "on" is normally only used on the ground, when conditions dictate.

Even if, in an idle descent, there is not enough bleed air from the LP bleed, there is always the HP bleed- that is, pretty much, why it is there. Even if HP bleed were not enough, when the A/I valves are commanded open the EECs command "approach idle" and a good bit more bleed air is available. So no problem with lack of bleed air for engine A/I in a descent, I think.

Having said that, there is an FCOM caution to avoid prolonged operation in moderate or severe icing conditions.

Oluf Husted
1st Mar 2008, 23:00
Dear Cribble.

Thanks for your technical "run down" on your TRENT-engine!

The wx. during app. of BA38 was in broken clouds with a ground temperature of 11 degrees C. that gives a possible severe icing, since: "The higher the temperature the more severe is the icing condition in 100% humid air" (up to plus 10 degrees C.)

How was the wx. at Los Angeles, where American Airlines had "hesitation" in a Trent?

And according to the latest Norwegian research, by meteorologist Aasmund Rabbe, freezing rain is far more dangerous to jet engines than freezing fog, (by factor 100) due to droplet size, this leaves the FAA AD from 23. jan. 2008 send out to: "Avoid engine failures on short final" in need to explain: "Why the reduction from 1000m to 300m in freezing precipitation, during ground operation (before "run-ups" shall be performed) can be a "tool" to have less (all) engine failures on short final"

Engine anti-ice procedures should be stressed and improved or Boeing 777s with RR engines grounded until a better reason is found.

sky9
2nd Mar 2008, 09:23
Even if, in an idle descent, there is not enough bleed air from the LP bleed, there is always the HP bleed- that is, pretty much, why it is there. Even if HP bleed were not enough, when the A/I valves are commanded open the EECs command "approach idle" and a good bit more bleed air is available. So no problem with lack of bleed air for engine A/I in a descent, I think.
Cribble

When I used to fly 757's (RB211) I was amazed at the number of times that I had to fault the failure of the HP valve to open on descent. I had the reputation with the engineers for snagging the valves more than the rest of the pilot force combined. On the 757 there was no eicas message, is there one on the 777?

BusyB
2nd Mar 2008, 10:17
Cribble,

The SOP in at least one other Airline is to change from "auto" to "on" on entering icing conditions and back to "auto" on leaving them.:ok:

ACMS
2nd Mar 2008, 11:11
At CX on the 777 we have been running the Eng Anti-ice manually for the last 6 + years.

Auto is a backup only.

av8tor94
2nd Mar 2008, 11:49
As a long time cold weather operator (Canadian eh) and colleague for years (read decades) of Scandinavian, US and European pilots the accepted practice is to use the AUTO feature only as a backup. System selected on before entering known icing consitions. Maybe that's why it is called Nacelle ANI Ice; wings are a different story of course.

Oluf Husted
2nd Mar 2008, 12:47
When a student at Cranfields Institute of Technology, named Javid Karim, conducted: "An Investigation Of Aircraft Accidents And Incidents Attributed To Icing, And Cold Weather Operations" among 60 major airlines, he concluded (sep. 1995):

"There is a general lack of crew awareness and training concerning winter operations"

I think, that this is still valid and can tell about my airlines AOM, on the DC9 stated: "Idle power is sufficient" (For engine anti-ice operations on ground and when airborne) this was also valid for the MD80 until 1990, then the procedure was sharpened to: "You SHALL do "run-ups" at least just before T/O and once every teen minutes, to as high a power setting for at least 15 sec. (min. 70% N1) also during taxiing in"

:ugh: It seems, that only a few pilots and in the Cranfield Study only one of sixty airlines, namely Finnair, take "winter" (all year) operations seriously enough. P & W did not warn their customers before sep. 1994, at least 30 years late!!

Lets hear more experiences about engines "hesitating" and "stalling" on ground and when airborne.

Read more about this on: www.whistleblowers.dk

ppppilot
2nd Mar 2008, 13:03
Severe Icing is very usual during winter at Geneve. It has a very high MSA and they are used to make you holding high, close to the Montblanc on intense traffic. Holding at -20º into the rain is severe icing and we used to deploy some spoiler and increase the power since one our pilots declared emergency because of the high vibrations. Vibrations were not only high, were mighty. 11º at surface and just a layer of clouds it is not enough to be guilty and the records don’t appoint that way on the BA38.
Sky9 I have flown a B757 RB211 with the same problem as yours. Changing from IP to HP as the engine reduced to idle, there was a synchronization fault and that produced a stall at the compressor inlet with a strong explosion and a yaw push. The explosion indicates that the flame was out for a while. In such a case, even being a very small stall the recorded parameters must show it.
Same explanation can apply to aerodynamics due to AOA.
Don’t forget that all the records showed normal operation. At least if Boeing has shown all the data in its pocket.¿?:suspect:

lomapaseo
2nd Mar 2008, 14:14
I sense that this thread is straying far afield from the original subject title.

This is not the place to discuss weather and icing issues beyond what is stated in FCOMs. Perhaps a new general thread in the technical section might suffice.

Just to illustrate the complexitity of the discussion

Icing can cause engine problems due to probe blockage; or accretion and sheds resulting in damage; or vibration from uneven sheds; or sheds which mometarily disrupt the airflow.

Simply citing icing as a speculative cause without relating specific factual support applicable for an incident under discussion is like looking for a red herring.

Orestes
3rd Mar 2008, 01:41
Sorry if my earlier comment caused the thread to stray - I was just wondering about other possible causes for a lack of throttle response. I hadn't thought of the possibility of icing issue. Instead, I was wondering if a slow/sticky variable stator vane actuation system in the engine compressor might be a possible culprit.

Anyway, this is all just speculation in the absence of info. I guess we'll just have to be patient and wait for the results of the investigations to find out what's really going on.

petermcleland
3rd Mar 2008, 13:54
Orestes...

I asked in the BA thread is it was possible that the compressors were stalled due to high AOA and mentioned that the TGT/JPT would be excessive in that case with failure to produce thrust but lots of noise! I was concerned about several ground witnesses mentioning unusally loud noises from the engines.

I don't think I have ever had compressors stall in an airliner (except possibly those loud bangs on the ground roll with reverse thrust), but I certainly did in a jet fighter with a Rolls Royce Avon engine.

I felt sure that the recorders would show excessive TGT/JPTs in this case but nobody but you seems to have mentioned compressors at high AOA.

Anyway, my post was immediately removed so nobody saw it and replied :ouch:

ChristiaanJ
3rd Mar 2008, 14:09
Peter...
Do you mean aircraft AoA or compressor blade AoA as Orestes says?

If aircraft AoA, you'd be putting the cart before the horse.... the high aircraft AoA happened after the engine problem, as they were trying to drag her over the fence.

petermcleland
3rd Mar 2008, 17:04
Yes, I meant aircraft AOA but I meant later in the flight as coming over the road with the engines making an unusually loud noise. The aircraft AOA was certainly high then and I was wondering if the blades were stalled at that time and preventing any increase in thrust. The throttles would have been manually set hard open then, but to no avail.

bill_s
4th Mar 2008, 05:11
What's being posited here is some compressor flow degradation short of the classic compressor stall, where laminar flow collapses violently, with backwards flow due to the instantaneous combustion pressure.

Compressor degradation without total collapse, and with unimpaired fuel flow, should yield lots of smoke - has this been noted in any of these incidents?

gas path
4th Mar 2008, 06:36
From the QAR/DFDR both engines were 'right on the money' with regard to n1, n2, n3, vane position and bleeds.

Halfnut
5th Mar 2008, 17:02
Thursday afternoon, a B777, aircraft 7BH, en route from MIA to LAX, descended at flight idle power on the profile descent into LAX. At approximately 2000 feet, slowing to 170kts, the autothrottles moved forward to maintain the selected airspeed. The right engine responded normally, but the left engine remained at flight idle for approximately 10-15 secs. The left engine then responded to the demand for increased thrust. The rest of the flight continued with normal engine operation and the flight landed without incident. The aircraft was taken out of service and has been moved to a hanger for further inspection. In light of the recent events at British Airways, we have placed the highest priority on this investigation. Representatives from Boeing and Rolls Royce will participate. We will keep you informed of our findings.

-- Initial review of DFDR data by Rolls Royce indicates a very different event than what British Airways experienced. There were a number of markers in the BA event that are not present In our DFDR data. As it is a concerning event you will be kept informed as more details become available.

-- Boeing, Rolls Royce, AA Maintenance, TAESL Engineering and Flight Test have been unable to uncover any mechanical discrepancies which may have caused the left engines delayed thrust during approach into LAX. All parties reviewed DFDR data and performed extensive fuel and systems testing. At the time the left engine thrust failed to advance, the First Officer (PF) had the speedbrake extended slowing the aircraft from 220kts to 170kts, descending through approximately 3000ft. The DFDR data, which measures actual throttle angle, indicated the left throttle was at or near flight idle while the right throttle advanced.

The only theory that we could prove, which was done in the simulator and again during flight test on Saturday, was the possibility the First Officer may have had his left hand resting on the left throttle for leverage while holding the speedbrake extended, slightly impeding the forward movement of the throttle. Tests on both throttles during flight indicated 1.5 pounds of pressure was required to keep each throttle from advancing. With the left hand on top of the throttle, maintaining back pressure on the speedbrake handle and focused on a very demanding approach, the opposite throttle could advance to maintain airspeed momentarily unnoticed by the crew until a yaw was experienced from differential thrust. The TAC system and autopilot compensation with rudder makes the onset almost unnoticeable. Timed from RT throttle advancement, immediate recognition by the crew, to left throttle awakening and advancing to join the right throttle approximately 12 secs elapsed.

During 2.9 hours of flight test with 2 TUL flight test pilots, the Fleet Captain, and TAESL and Rolls Royce engineers all autothrottle functions and protections worked properly. Minus an unknown mechanical malfunction this scenario was the only plausible way we could recreate this event to match what the crew experienced and what the DFDR data and testing indicated at the time of the event.

This crew did a fabulous job handling this situation and we appreciate their cooperation in this investigation. We certainly are not placing blame, just attempting to explain a scenario that may have produced this set of observed and recorded circumstances.

lomapaseo
5th Mar 2008, 18:33
Halfnut

I can't for the life of me figure out what you posted.

Are these your words or did you lift them from some place else and forget to put a time and place on them?

CONF iture
5th Mar 2008, 19:30
Yes please Halfnut, your source ?
"This crew did a fabulous job handling this situation"
... probably not the most appropriate adjective for a simultaneous use of thrust and speed brakes ?

Halfnut
5th Mar 2008, 23:34
The entire post is a direct unembellished quote from on High. Nothing more and nothing less. "Just the facts, ma'am."

Airmotive
8th Mar 2008, 15:35
So this was the hotest topic as long as everyone thought it was equipment failure, but as soon as it was seen to be pilot error, it drops dead....not a single post...other than to challenge the source of the information suggesting pilot error.

Early theories of pilot error were quickly scorned, if not outright ridiculed.

That's interesting, as pilot error is the ONE thing flight crews have direct and immediate control over; but nobody seems interested in talking about that. It's highly frustrating from a safety standpoint.

That's all. I am speaking outside my range of experience but I was hoping to learn a few things by observing the discussion here. Instead, the lack of discussion is deafening.

skiesfull
8th Mar 2008, 16:55
Surely the use of power against speedbrake would have brought up an EICAS caution "speedbrake"?
I would have thought that it would be instinctive to advance the lagging thrust lever in line with the other lever?? I think that there is more to this incident than has been reported by the publisher of the above theory (as reported in Flightglobal.com today).

Old Fella
9th Mar 2008, 06:10
Never did have any faith in FBW systems. Lets get back to manual control of FCU's and know that when the throttle is moved the message gets to the FCU via mechanical linkages/cables.

Smilin_Ed
9th Mar 2008, 17:09
Seems kind of silly doesn't it? Take a very simple and reliable system like wires, push-rods, and bell-cranks to get a control input out to whatever needs to be moved and we replace it with a computer, wires, and actuators which are dependent, among other things, on the availability of electric power. Oh well, that's the price we pay for progress.:rolleyes:

sevenstrokeroll
9th Mar 2008, 17:16
its not just the fly by wire stuff.

pilots simply are so overwelmed with gadgets , that they are not flying the plane.

a gadget to keep the plane going straight if an engine quits is great! but if the plane in question had started going sideways, the copilot would have hit the rudder pedals and noticed something wasn't right...some pilots are always using rudder and never get the throttles/engines equal.

after getting tired of subconsciously holding rudder, he or she would have looked down to see that the throttles/engines weren't right and fixed it.

demanding approach to LAX? puhleese. then all approaches are demanding.

makes one wonder about pilot induced errors on many mysterious engine problems.

simpler is better!

deltayankee
9th Mar 2008, 17:52
The simplicity vs complexity argument was already an old discussion before Pprune but for younger readers someone has to answer the claim that simpler is always more reliable.

This comes from the mathematical "law" that if something breaks every ten years and you have ten of them then you will get *on average* one failure per year. But this is only applicable to stuff that is independent. When the complexity is part of a system then adding more functions can add reliability.

A simple example is multi engine A/C. If I add a second engine to my airplane maybe I double the chances of one engine failing but I improve dramatically the chances of being able to finish the flight with at least one engine. Having two engines doesn't mean I end up in the sea twice as often.

You can also see it from common experience. Old, simple cars used to be supplied with tool kits and breakdowns were common. Modern cars packed with electronics and extra systems yet they fail much less often.

Adding complexity often adds benefits that outweigh any parts count calculations. And be honest with yourself, does the complicated 777 really have a safety record worse than say the DC3?

barit1
9th Mar 2008, 18:12
Simply adding a second engine DOES NOT assure redundancy!

Look at FAR 23.49 (http://www.flightsimaviation.com/data/FARS/part_23-49.html) - if a multi-engine ship cannot meet a specified OEI climb profile, then it is constrained to a Vso no greater than 61 kt., same as a single, because is it assumed a forced landing is likely. :uhoh:

Wartime C-47s were often flown at TOGW 20% above the civil limit, and their OEI performance at that weight was nil - the plane was going DOWN. Thus the second engine doubled their risk; they would have been better served by a single R-2800!

Granted, no modern transport is so constrained, but this example serves as a reminder: multiple systems DO NOT automatically provide redundancy.

sevenstrokeroll
9th Mar 2008, 18:26
deltayankee

your comparisoin of the dc3 and the 777 is unfair. they have different kinds of engines...so at least compare jets with jets.

compare the 727 and the 777.

bubbers44
11th Mar 2008, 03:04
It seems after reading the report, maintenance did an extensive check of all pertinent systems and found no faults, then did an extensive flight check trying to duplicate the problem and found everything normal. Trying to duplicate the lax landing profile they found that by keeping your hand on the speedbrake handle while deployed it was possible to restrict the left throttle from advancing with just 1.5 lbs resistance to that thrust lever. This procedure was standard after the Cali crash. If the auto throttle was on and the speed knob was set above or at the speed the ac slowed to the throttles would advance even though the pilot didn't intend for them to since the speed brake was out. They decided the FO's left arm might have restricted the left thrust lever inadvertently causing the delay in spool up. Sometimes with all that automation you have conflicting results you don't expect but sort it out when it happens. I have seen similar things happen dozens of times but it is easy to fix once you see the conflict. The 777 automatically adds rudder with assymetical thrust so it would be difficult to notice with no yaw.

HF3000
11th Mar 2008, 04:54
Wartime C-47s were often flown at TOGW 20% above the civil limit, and their OEI performance at that weight was nil - the plane was going DOWN. Thus the second engine doubled their risk; they would have been better served by a single R-2800!

Hmm... I'd rather driftdown at 100fpm on one engine than glide at 1500fpm on none!

Might have a slightly better chance of survival... especially at night.

Kiwiguy
11th Mar 2008, 06:51
Shoot just give them all glider time... problemo solved :eek:

J.O.
15th Mar 2008, 00:28
An update indicates that the investigation found no aircraft defects. It goes on to say that there is potential for the F/Os arm to restrict thrust lever movement when using the speed brake lever.

chksix
15th Mar 2008, 07:57
Cheaper to blame it on pilot error.... :E

HarryMann
15th Mar 2008, 11:50
Modern cars packed with electronics and extra systems yet they fail much less often.

Not because they're more complex though, because they're made to higher standards... make an older simpler system to modern higher standards and they'd rarely ever fail. Garages are full of modern cars with electronic ecu and other failures ... requiring expensive 'diagnoses by substitution'.

...and many taking say a modern diesel Tdi overland in hostile environments where failure cannot be tolerated, replace the Tdi injection system with a mechanical diesel injection pump with just a couple of 12V feeds, rather than using a black box that when it fails, can leave you stranded without a chance in the world of a fix.

glad rag
15th Mar 2008, 12:20
Overall a very informative thread, I'm not wishing to upset any of the "pro's", but would this happen on, say, a FBT system?:E

Two sides to every argument I say...........

barit1
15th Mar 2008, 14:13
Back in Zorst's post #8 (http://www.pprune.org/forums/showpost.php?p=3947695&postcount=8):
... Many systems are identical but paired, a few are much closer to each other than even the manufacturer believed...

To me - this is strangely analagous to the GPS navigation situation that led to the GOL 1507 midair. The more identical the two paired systems become, the better the chance of the holes in the cheese lining up...:uhoh:

chksix
15th Mar 2008, 14:31
A small step in prevention would be to prohibit updating the software in both ECU's on the aircraft simultaneously.

lomapaseo
16th Mar 2008, 00:02
A small step in prevention would be to prohibit updating the software in both ECU's on the aircraft simultaneously.

Would you put two different tires on a dragster race car, or two different brands of spark plugs in a rotary engine.

The answer is obviously dependent on the reliability vs the differences in performance. In the case of FADECS the differences in performance is sure to be noticed and need to be accomodated by specific pilot workload changes (not practical), while the change in reliability was expected to be within a tolerance band way below the threshold of being noticed over the lifespan of the two FADECS. The decision is weighted and obvious given these conditions.

HarryMann
16th Mar 2008, 00:52
Would you put two different tires on a dragster race car, or two different brands of spark plugs in a rotary engine.

IMHO, quite dreadful analogies :O

Not that I necessarily subscribe to the original suggestion either... but it derves a better rebuttal I think.

chksix
16th Mar 2008, 09:28
Was just suggesting it as a precaution against installing fresh software with an unknown bug on both engines simultaneously.

aviate1138
16th Mar 2008, 10:00
chksix said

"A small step in prevention would be to prohibit updating the software in both ECU's on the aircraft simultaneously."

Aviate opines....

Spot-on! Whenever do updates work perfectly from the start? Some snag Always crops up doesn't it?

BEagle
16th Mar 2008, 12:08
In Fate is the Hunter, Ernest K Gann describes an alarming incident in a fully-laden C-54 out of La Guardia when 3 out of 4 engines either failed or were on the point of failing - backfiring and overrevving like crazy. Somehow they got it on the ground after something like 3 minutes total flight time, never climbing above 50 ft.

It seems that the engineers had changed the plugs on 3 engines for a new experimental type...

And they later aplogised because they hadn't had time to change the plugs on the fourth...:(

NEVER do simultaneous upgrades - whether hardware or software!!

Although the RN had a novel QA technigue for double engine changes at sea - they would 'invite' the engineering officer to occupy the looker's seat for the flight test...:ok:

llondel
16th Mar 2008, 15:40
It seems that the engineers had changed the plugs on 3 engines for a new experimental type...

And they later aplogised because they hadn't had time to change the plugs on the fourth...:(

NEVER do simultaneous upgrades - whether hardware or software!!

What about changing the oil on all engines at the same time? I'm sure that got tried and screwed up once so now with perfect hindsight it's not done to all engines at the same time. There are reasons why space shuttle flight computers have two different implementations of the spec - usually one contractor provides most of them and a second contractor does an independent implementation to the same spec. in case there are bugs. Of course, if the spec is wrong...

boaclhryul
16th Mar 2008, 17:03
Whenever do updates work perfectly from the start? Some snag Always crops up doesn't it?

I don't read Tech Log all that often, but I don't recall seeing a raft of entries on aircraft software update issues. Are you making a broad comment that covers MS Windows etc., or is it truly your experience that all aircraft updates demonstrate some sort of "snag"? Not being critical, just curious...

aviate1138
16th Mar 2008, 19:10
boaclhryul asked if I was making a broad comment....

It was broad but look at the Chinook, JSF and other machines having problems with electronics software updates.

The 777......
"
If you read the entire AD, what it says is that the anomaly was introduced by an error in a certain software update for the FADEC on the LR. However, because the -300 has an identical software update, it could presumably cause the same problems on that aircraft as well......"
"Some would be surprised that the FAA should allow an ETOPS aircraft with a defect that reduces engine power by up to 77 percent on takeoff to be considered serviceable. In theory, an engine with FADEC version A.0.4.5 installed has a defect that can't be cleared and is therefore unserviceable. Others might wonder how such safety critical software can make it through the validation and verification regime into world-wide fleet service. Overall, it's shades of the previous GE90 "rollback" and IFSDs (inflight shutdowns) from earlier days. The only difference was in those cases, it was in cruise and was caused by moisture freezing in the P3B and PS3 lines to the FADEC, and it was resolved by increasing the tubing diameters. Perhaps the software now needs uppercase zeroes and ones in its coding -- or a larger pitch font."

Air Safety Week, Oct 9, 2006

PBL
17th Mar 2008, 21:21
There are reasons why space shuttle flight computers have two different implementations of the spec - usually one contractor provides most of them and a second contractor does an independent implementation to the same spec. in case there are bugs. Of course, if the spec is wrong...

Not so. It is quad-redundant with identical SW. There is a fifth computer, with different HW and SW, which can conduct an abort should everything go pear-shaped.

For the architecture of the space shuttle primary control, see various articles at NASA Office of Logic Design (http://klabs.org/DEI/Processor/shuttle/), in particular The Space Shuttle Primary Computer System (http://klabs.org/DEI/Processor/shuttle/shuttle_primary_computer_system.pdf), Communications of the ACM, 27(9), September 1984; Madden and Rone, Design, Development, Integration: Space Shuttle Primary Flight Software System (http://klabs.org/DEI/Processor/shuttle/madden_rone.pdf), also CACM 27(9), September 1984. This is direct from the people who designed and built this system.

For a very much shorter comment, see Epstein, Risks 24.71 (http://catless.ncl.ac.uk/Risks/24.71.html#subj5.1), Morton, Risks 24.73 (http://catless.ncl.ac.uk/Risks/24.73.html#subj7.1), but especially Neumann, Risks 24.73 (http://catless.ncl.ac.uk/Risks/24.73.html#subj7.2) and Passy, Risks 24.74 (http://catless.ncl.ac.uk/Risks/24.74.html#subj3.1)

PBL

PBL
19th Mar 2008, 08:29
aviate1138 quoted an article from Air Safety Week about SW problems. I don't recall the article, and David often talked to me before writing about computer-related things, so it may have been written after he retired from ASW.

There are some facts about SW quality which you won't find in textbooks and which it may be worthwhile to mention here.

First, most problems with critical SW (which includes all that covered by DO178B Levels A and B) occur during a mismatch between operational environment and the environments envisaged in the specification of the (sub)system requirements. A study by Robin Lutz in the early 90's of mission-critical failures of NASA systems came up with a figure of over 95%; studies by the UK HSE of all types of critical systems (simple to complex, mechanical to computer-based) showed 70%.

Second, the quality of SW itself (that is, where the SW will *not* fulfil its requirements) for critical code lies typically in the region of at least one error per KLOC (thousand lines of source code). The very best measured quality of which I know has been attained by SW written by the UK system house Praxis HIS, which has a largish system which has demonstrated only one error per 25 KLOC in service. They also have one small system (10K-ish LOC) in service which has demonstrated no errors in a number of years of operation. Praxis uses so-called "Correct by Construction" (CbC) methods.

So in the EECs of the B777, which I am told contain on the order of hundreds of KLOC, we can expect in every release, as quite normal, a handful of errors, somewhere between ten and a hundred. That is the reality. Most of those errors will not be discovered.

Many of us are not content with this reality, and are actively working to propagate different approaches to the quality assurance of critical SW-based systems than are currently used.

PBL

arcniz
20th Mar 2008, 08:47
As PBL describes, real errors will creep into software designs and updates with a certain inexorability, despite well-funded efforts by the smartest and best to prevent them during development and find them during deployment. The raw complexity of possible interactions in even moderately demanding real-time environments, combined with the fallibility of testing models, the limits of schedule and push of operations pressures increase the likelihood that slightly erroneous or simply mismatched software components will crawl into mission-critical systems. The staffing levels vs frequency of ops in Space and Military environments allow much higher levels of scrutiny than in Commercial and General aviation, but even the most careful systems teams occasionally discover unhappy surprises in software-hardware computing system designs and updates.

After an initial learning curve of months to years, the residual probability of in-use discovery of serious design flaws comes mostly from mods, changes and revisions. These are often installed "in the field" by workers who are less well trained and equipped than their fellows "at the factory". Variability of individual aircraft (differences between units creating subtle compatibility issues) is inherently greater in a working fleet for many reasons. Testing of mods and updates prior to release and during installation is necessarily limited and does not begin to approach the exhaustive top-to-bottom tests applied to system designs before initial operational release.

We have just about reached the point where electronic, component, and software technologies can provide high enough physical density and low enough incremental cost (per million gates, per terraflop or whatever) so as to permit greater redundancy in the mission-critical systems for commercial aircraft. Some of this bounty should be applied to simple redundancy - at least 3-way for anything important - but some of it also should be applied to more novel approaches to reliability. One of these would be to make systems more generally self-aware, detecting and tracking even small inconsistencies in operation among peers for local alarm sensing and more comprehensive later analysis by info-grinding systems elsewhere. If a triple-redundant control system were made quadruple or quintuple in design, then it might well be possible to have copies of one or two levels of prior maintenance revision running alongside the current one in a "monitoring" capacity (i.e. not in the active control triad, but still able to flag inconsistencies) for validating the correctness (or at least consistency) of the system. A single large aircraft might contain hundreds of quintuply redundant systems like this, with the extra processing engines serving mostly as monitors and quality-control devices but also potentially available to provide critical decision data when things are not entirely in whack. Whether one would ever want to default back to an earlier revision of a software release while in mid-mission is a question that only could be answered in the context of systems with that capability installed. Likely some fuzzy logic needed in the mix for similarity analysis in process streams that are everso slightly differently tasked.

The real gold in higher levels of redundancy for self diagnosis during operation would come from the 'consistency' log data, I reckon, and the occasional rare advisory to stay on the ground and sort something out before it can get you.

boaclhryul
20th Mar 2008, 09:52
...per terraflop...

Now you're referring to BA38 :) .

Michael

arcniz
20th Mar 2008, 17:02
Quote:
Originally Posted by arcniz
...per terraflop...
Now you're referring to BA38 .

Michael

... or could be a moniker for a noticeably awkward Danish gymnast ...


We apologise for our excessive r's.

For those normal folks who do not dwell amidst 1's and 0's, some clarification may be appropriate: Teraflop is computerese for a process capability of 1,000,000,000,000 floating-point operations per second. My comment above was slightly hyperbolic, in that an extra gigaflop or two would do very nicely in nearly any practical control system of current style, and a megaflop is often more than enough. With teraflops you're in the present-day world of the code-breakers, but it may'nt be so long before they're ubiquitous in things that can use 'em.

Many controls are designed (for cost, simplicity, reliability) to deliberately avoid floating-point math in calculations; the appropriate metrics for them are MIPS, GIPS, TIPS, etc.

The advantage of increasing speed per processor is that it slices time into ever finer chunks, so that diagnostic and comparison housekeeping can be woven more extensively into the execution stream without affecting the mainstream control program. This is accomplished by stealing a microsecond now and again for the overhead stuff in an invisible and demonstrably non-interfering manner. Similar effect can be achieved with multiple processors sharing a single process stream as if they were one processor. When time-stealing is done right, the revenue process sees only a slightly slower execution universe, with no possibility of interaction and therefore no need for expensive case-by-case testing of changes to the monitoring process. It is there, watching things and doing work, but the time-slicing mechanism acts as an effective firewall. (insert theory/practice disclaimer here)