PPRuNe Forums

PPRuNe Forums (https://www.pprune.org/)
-   Rumours & News (https://www.pprune.org/rumours-news-13/)
-   -   Malaysian Airlines MH370 contact lost (https://www.pprune.org/rumours-news/535538-malaysian-airlines-mh370-contact-lost.html)

YYZjim 19th Apr 2014 02:19

ADIRU failure once again?
 
In 2005, a Malaysian Airlines B777-200 (9M-MRG) was on a flight from Perth to Kuala Lumpur when it experienced a failure of its navigation system. The airplane suddenly climbed to FL410, then dropped 4000 feet, then climbed 2000 feet. The pilots flew the airplane manually back to Perth. Australian authorities investigated the incident. They determined that the failure was in the "operating software of the air data inertial reference unit (ADIRU), a device that supplies acceleration figures to the aircraft's flight computer." The device was manufactured by Honeywell and contained the fourth version of the operating system. A review of the software showed that the error did exist on the first three versions of the software, but had been suppressed by other features of the software. These other features were removed during the transition from the third version to the fourth version.

The problem was serious enough for the FAA to issue an emergency airworthiness directive in August 2005 to all B777 operators to revert to version three of the operating system.

Note that the airplane lost on March 8, 2014, was 9M-MRO, apparently a sister ship.

theAP 19th Apr 2014 04:33

YYZjim wrote,

Note that the airplane lost on March 8, 2014, was 9M-MRO, apparently a sister ship.
One of the most relevant post I've seen after a long time and most probably you have hit the nail imho.

ampclamp 19th Apr 2014 05:11

YYZjim, re the sister ship ADIRU mishap. Investigation: 200503722 - In-flight upset; Boeing 777-200, 9M-MRG, 240 km NW Perth, WA

kayej1188 19th Apr 2014 05:51

First off, terrific post. I'm struggling to fathom how this is the first time this incident has been mentioned. Maybe it isn't. It seems a number of parallels can be already drawn between the 2 flights. Could someone with more knowledge than me provide an answer as to whether or not a faulty ADIRU could correspond to ACARS + transponder being disabled?

albatross 19th Apr 2014 06:09

Well considering it took place in 2005:
One would hope, being as there was an AD, that the present day software version would preclude a repeat, especially as the has not been another event in 9 years.

harrryw 19th Apr 2014 07:33

@ampclamp
wxcept posibly as another comment on how useless the CVR is as it only had 5 minutes of relevent information because it had not been switched off on the ground.

Lookleft 19th Apr 2014 07:48


The problem was serious enough for the FAA to issue an emergency airworthiness directive in August 2005 to all B777 operators to revert to version three of the operating system.
If Boeing thought that this incident was similar in nature and they had no answers then the 777 fleet would be grounded ( a situation which very nearly occurred after the 2005 incident). The fact that Boeing have not issued any AD's to operators (to my knowledge) suggests that they are not concerned that the aircraft has an inherent fault that could cause another 777 to disappear. The crew in the 2005 incident were able to override the automatics and recover the aircraft. For something similar to have occurred there would have to be another undetected software failure followed by a double incapacitation. Something which IMHO would be an order of magnitude beyond 10-9.

Rightbase 19th Apr 2014 08:56

Mistakeology
 

It would be a very long bow to draw to link the 2 incidents in any way whatsoever.
Logic errors can remain undetected in programmed systems for a long time.

The protocol of flying on with 'redundant' units defective is such a 'program' that by definition does not create an accident but equally obviously does erode safety margins.

When it is the integrity of the 'intelligence' between pilot and aircraft that is jeopardised by such a program it then puts at risk the strategy of having a human in ultimate control.

Programmer humility deficiency might be a common root cause.

mmurray 19th Apr 2014 09:33

A week to finish current search?
 
A report here that the current search will take 5-7 days to complete if the weather and the bluefin 21 holds up.

Malaysia Airlines MH370: Underwater search at 'very critical juncture', could be completed this week - ABC News (Australian Broadcasting Corporation)

Ian W 19th Apr 2014 09:49


Originally Posted by Rightbase (Post 8441107)
Logic errors can remain undetected in programmed systems for a long time.

The protocol of flying on with 'redundant' units defective is such a 'program' that by definition does not create an accident but equally obviously does erode safety margins.

When it is the integrity of the 'intelligence' between pilot and aircraft that is jeopardised by such a program it then puts at risk the strategy of having a human in ultimate control.

Programmer humility deficiency might be a common root cause.

I know that there is a wish to find an answer but this is not it.

Logic errors can remain undetected - but this one was detected the quotes are from an investigation into an event that was caused and an AD was very publicly issued to return to the previous version of the software.

So now are you really suggesting that Honeywell, having been told of the fault in their software in unequivocal terms, forgot about it? Then over the 9 years since the incident that they have not updated the ADIRU software to fix the fault? To use a quote from tennis - You cannot be serious.

And of course this ADIRU software fault would need to also disconnect ACARS and switch off all three redundant VHF radios incapacitate the crew and then recover itself and fly the aircraft in uneventful cruise to the southern Indian Ocean.

Perhaps you would like to revisit your logic?

Rightbase 19th Apr 2014 10:28

Mistakeology
 

Logic errors can remain undetected - but this one was detected
Your post kindly emphasised 'was' making the point that the logic error has been detected,

My point is the logic error of flying on with a tolerated defect in a system with the danger that a second defect could mislead the pilot is a critical vulnerability.

The vulnerability does not go away now that this one has been detected.

mseyfang 19th Apr 2014 14:00


If Boeing thought that this incident was similar in nature and they had no answers then the 777 fleet would be grounded ( a situation which very nearly occurred after the 2005 incident). The fact that Boeing have not issued any AD's to operators (to my knowledge) suggests that they are not concerned that the aircraft has an inherent fault that could cause another 777 to disappear. The crew in the 2005 incident were able to override the automatics and recover the aircraft. For something similar to have occurred there would have to be another undetected software failure followed by a double incapacitation. Something which IMHO would be an order of magnitude beyond 10-9.
I tend to agree with this, but I have to admit that my first thought upon learning of this incident was ADIRU failure and/or an EE bay fire. The latter still explains everything known about the incident except for one important issue -- how the plane wound up headed in the general direction of Perth and the supposed track around Indonesia (still not entirely convinced of that as established fact given the source).

As for Boeing, in the absence of evidence that there is a fault in the aircraft (and theories aren't evidence), there are ample economic and liability/legal reasons to do nothing unless/until concrete evidence of a fault is discovered. Grounding the 777 fleet would be an enormous hardship for a number of carriers for which this aircraft type is the backbone of their long-haul intercontinental fleets, a group that includes the three legacy US carriers.

You don't ground a fleet of aircraft in the absence of specific evidence of a design problem. Prior groundings such as the Comet I (c. 1952), Lockheed Electra (c. 1959), the DC-10 (1979) and 787 were based on physical evidence of a potentially catastrophic problem with the aircraft. In this case, such physical evidence is, to date, completely lacking.

Ian W 19th Apr 2014 14:14


Originally Posted by Rightbase (Post 8441210)
Your post kindly emphasised 'was' making the point that the logic error has been detected,

My point is the logic error of flying on with a tolerated defect in a system with the danger that a second defect could mislead the pilot is a critical vulnerability.

The vulnerability does not go away now that this one has been detected.

You have obviously not worked developing safety critical software.

The software in the ADIRU is not developed as if it were a video game or a university project: it is developed in line with RTCA DO-178 and ARINC 653. These are very strict standards with a lot of testing. However, despite all the testing some faults may/will be found and in most cases the system is designed that a fault in one module will be contained as part of a Failure Mode Effects Analysis. It would appear that a fault was successfully contained and then unmasked when another module was updated.

Now at that stage with safety critical software the FAA and Honeywell reverted back to the previous version - which had worked without a problem using an AD. Honeywell would then have had a 'MUST FIX' top emergency software fix to carry out. In many organizations that means NO new software version can be delivered unless that fault is fixed.

Your attitude that they would have left it on the old version as that was 'good enough' is just not the way the industry works.

I would expect that the fault was fixed within days and then after recertification testing with the FAA and Boeing, Honeywell would have delivered a new ADIRU software build with all known bugs including this one fixed. The longest part of that effort will have been testing, and the particular issue that caused the ADIRU to fail would be included in the new acceptance test suite. Almost certainly there would also have been some effort to defend against ADIRU faults in the FMC software as part of the FMEA work.

High availability safety critical software development demands getting things right, designing systems to be resilient to subsystem faults, and rapid resolution of any faults found.

2dPilot 19th Apr 2014 16:09

@Ian W,
Unfortunately the best testing can only test for what you are looking for.
Test scripts will only be based on what are considered possible scenarios.
Furthermore, the IT industry is now full of people who have been taught to program, not learnt to program.
One would like to think an operating system like windows would be fully tested, yet a host of bug-fixes are released every month, for years.

holdatcharlie 19th Apr 2014 19:38

CVR question
 
One of the many frustrating and ironic twists in this baffling accident is that, if and when the CVR is finally found (and recovered), it will in all probability, just contain two hours of silence - because only the last two hours of cockpit audio is retained on the CVR. All preceding audio is over-written thus denying investigators arguably their best clues to this bizarre tragedy.

My question is this - How complete is that over-writing?

We all know that deleted data from a PC hard drive can still be read - if you know the right (maybe that should be 'wrong') people. We have already seen this demonstrated with regard to the Captain's flight sim. data.

Could this also be made easier by being over-written by continuous silence? Could there be recoverable 'soft' data under that pure white over-write? Or are there some subtle technical differences between delete, erase and over-write?

YYZjim 19th Apr 2014 19:48

Sudden climb may cause loss of windscreen
 
I have speculated that the ADIRU failure experienced by 9M-MRG, which led to a sudden climb, was experienced by 9M-MRO (on route MH370) on March 8, 2014. The sudden increase in the pressure differential across the hull may have led to another problem recently experienced by two other B777-200s.

On April 13, 2012, an Alitalia B777-200 (EI-ISB on route AZ-8320) flying from Rome to Dubai at FL370 declared an emergency near Athens. The first officer's windscreen had cracked. The crew descended rapidly to 6000 feet and diverted to Athens.

On July 3, 2012, an Air France B777-200 (F-GSPL on route AF-85) flying from San Francisco to Paris at FL370 declared an emergency over Hudson's Bay. The windscreen had cracked and the crew reported problems maintaining pressurization in the cabin. The crew descended to 10,000 feet and diverted to Montreal.

All three aircraft are of pretty much the same vintage:
Alitalia EI-ISB first flight December 18, 2002
Air France F-GPSL first flight June 12, 2000
Malaysian 9M-MRO first flight on May 14, 2002

One would need to know the number of cycles, rather than simply the calendar age, to determine if windscreen problems are fatigue-related, and possibly represent a systemic problem which is just now coming to light in the B777 fleet.

Rightbase 19th Apr 2014 20:14

Mistakeology
 
My concern is a systems concern,

The software engineer works in an environment that makes assumptions about its upstream inputs (eg, a sensor might fail - et al,) and downstream consequences.

Within that is the acknowledgement that the downstream resources (eg. fault condition SOPs - et al.) cannot cater for all eventualities and so must rely on pilot professional competence & expertise.

The assumptions (eg middle value is safe - et al.) exploiting multiple redundancy can render the remaining two of three working transducers worse than a singleto, since failure of either would give an erroneoous result - the Australian 777 episode exemplified this,

Iin that case, flying with a faulty third channel was worse than the system having no redundancy, Has this now been built into all triple redundancy middle value systems?

In both that case and the ill fated Air France episode, incorrect transducer readings were not sufficiently visible to the pilots - the last resort safety sytem - for it to be obvious to them just what was happening. Even the last resort 'hand fly the beast' option has to be negotiatedwith a software system that is already percieved - at lest partially - as working otherwise than as intended,

It is the combination of reliance on the pilot and being unable to guarantee to present the information the pilot needs that give the total system a level of vulnerability that can make a safely redundant system dangerous in the presence of a known failure.

An MEL that says a defective component can be tolerated must demonstrate a safe system (including a suitably informed pilot) in the event of ANY subsequent failure.

And when the statisticians do their sums, making standard 'independence' assumptions, they must be obsessive about them, as must everybody from people buying components to the authors of safety procedures,

JamesGV 19th Apr 2014 20:30

Windscreen failure.

Accepted. At FIR handover ? Divert then to Penang.

Northern route ? Then change to Southern route at 180/182 ?
We forget "no trans", "no acars/satcom".

olasek 19th Apr 2014 20:30


incorrect transducer readings were not sufficiently visible to the pilots
Actually they were, the rest of your stuff is so convolutely written, it is impossible to follow.

GHOTI 19th Apr 2014 22:01

A small point of order
 
"Prior groundings such as the Comet I (c. 1952), Lockheed Electra (c. 1959), the DC-10 (1979) and 787 were based on physical evidence of a potentially catastrophic problem with the aircraft. In this case, such physical evidence is, to date, completely lacking."

The L-188 Electra fleet was never grounded. Restrictions on max IAS were imposed until the flutter problem was worked out, but unlike the DC-10s and Comets, they continued to operate. That was a decision based on the economics of an airline whose sole equipment was the L-188, and also because no-one knew how the two mid-air breakups they suffered had related causes.


All times are GMT. The time now is 15:40.


Copyright 2018 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.