Ethiopian airliner down in Africa

Closed Thread Subscribe

Thread Tools

Search this Thread

31st Mar 2019, 16:26

#2821 (permalink)

patplan

Join Date: Nov 2018

Location: Vancouver

Posts: 68

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by GarageYears

Firstly, as it stands the cause of the second crash is unknown. Fingers pointing at MCAS are speculation, at least until the interim report is published. It may well be a better narrative than other options.

Notwithstanding that, the Ethiopian environment is way different than those that occurred with Lion Air. Check out the MSL altitude of both departure airfields...

Finally, the same AOA sensor is flying in several thousand 737NGs today. It doesn’t seem the sensor is likely to be to blame.

Truth is the Lion Air aircraft shouldn’t have been in service, given the maintenance log and lack of accurate documentation of issues with the aircraft on previous flights. As for Ethiopian we just don’t know any facts, other than the actual crash.

- GY

Well, actually this Ethiopian investigation is almost as leaky as the Indonesian one... The main suspects are very much the same: AOA reading and MCAS.

Since MCAS, in its past iteration, after being fed by erroneous data by a single AOA vane, have a knack to drive the trim mechanism to the end of the jackscrew, essentially doing its job as programmed spectacularly "well", that will leave the AOA vane as the fall guy.

Except..., it is ALMOST IMPOSSIBLE for the vanes which had been in used since forever and thought to have been very reliable would be implicated as the cause for the two crashes within the span of 5 months. This will leave us with something else more plausible as the caused but has been largely ignored: the Max-8 flight control system or something therein...

31st Mar 2019, 17:11

#2822 (permalink)

hjd10

Join Date: Mar 2018

Location: UK

Posts: 2

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by GordonR_Cape

A third AOA would only have worked if the Boeing 737 MAX had new flight control computers like other models (including Airbus). That was never going to happen, due to the huge cost, certification and training issues. I never implied that 3 AOA sensors have no function, but unless the system architecture can process and vote on them, the third one has no purpose.

I bet Boeing wished that they had spent that extra cash now!

31st Mar 2019, 17:45

#2823 (permalink)

EDLB

Join Date: Aug 2005

Location: EDLB

Posts: 363

Likes: 8

Received 4 Likes on 3 Posts

I take bets that it has something to do with the signal wiring form the AoA vane to the flight computer (ADIRU) like shorting out one half of the SIN or COS symmetric signal and creating with that something around a 45 degree/2 offset. If the Ethopian airline FDR does show a similar problem, then there is some latent harness, connector or ADIRU problem which will show up in the other 737 MAX made in a similar timeframe. So if that establishes, the investigation might look into some of the grounded planes build in similar timeframe.

31st Mar 2019, 17:57

#2824 (permalink)

Daft01

Join Date: Mar 2019

Location: Usa

Posts: 1

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by Blythy

As an example, on the Space shuttle, there were four identical computers which voted against each other in the case of discrepancy. However, there was a 5th computer (limited to ascent and reentry only) which was different hardware and different software in the event of something which had the same root cause in the software / hardware.

Not entirely true. All 5 computers are AP-101. It had a different subset of functions for ascent and descent, written by Rockwell (IBM was the main contractor for the hardware and flight software). It wasn't a complete rewrite of the flight software. The reason given for not having different hardware was that is would have cost too much. The software itself was an OS written in assembly, and the main code written in HAL/S, possibly on different versions of compiler.

Has there ever really been an aircraft with 2 completely separate hardware and software teams?

Info was gotten from
Computers in Spaceflight: The NASA Experience Chapter 4-3
and The Space Shuttle Primary Computer System, Communications of the ACM September 1984 Volume 27 Issue 9

31st Mar 2019, 18:28

#2825 (permalink)

VicMel

Join Date: Jun 2009

Location: Dorset

Posts: 31

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by safetypee

Whilst all of you Tech ‘bit’ people provide valuable information and possible scenarios, could you please consider why ‘failures’ appear to be very rare and so far only relate to two aircraft / three vanes.

How something fails does not necessarily explain why (when) it failed.
Random, probabilistic, bit count, world clock ?

OK, having considered why the MCAC failures only appear on some flights, possible candidates (the holes in the cheese) that could trigger a software fault in the processing of an AoA correction table (bearing in mind that the ADIRU software was developed only to a “non -safety critical standard”) are:-
1) pin fault: How does an ADIRU recognise it is L or R? Similar sytems I’ve worked on in the past had a fixed pin in the harness connector of one box to designate it as L. If on the problem 737 Max flights the pin became a bad connection, or was bent/missing, the ADIRUs would think they are both L (or both R).
2) IAS fault: As pointed out by patplan in #2799 there was an “IAS & ALT Disagree shown after take off”. If the correction table is indexed by IAS (or has some dependency on it), the software could have used bad IAS data as an index and read garbage data for the correction items.
3) interrupt corruption: A problem that has bitten me hard on a few occasions over the years is with the software that handles interrupts. Typically, interrupt software has to save data held in registers, perform its actions, and then restore the registers to their entry values. A latent problem can exist (just waiting for the right holes to line up) that a piece of software uses another register that the spec writer of the interrupt routine was unaware was being used. Or, more likely, a software update was made (e.g. MCAS date preparation update) that used another register. So after 100s or 1000s of flight hours the holes line up, the interrupt pings off in the middle of the new software, something (could be a data value, a status flag, a jump address or ….) is corrupted. The consequential behaviour could be any of many surprises!
4) processor overload:

Quote:

Originally Posted by patplan

I have a suspicion that in Boeing 737 Max 8 [B38M] perhaps the LEFT/CAPT ADIRU is constantly being overwhelmed by new routines [i.e. MCAS/AOA related programmings] which may from time to time corrupt the system.

I agree, as the (old tech) processors become more and more loaded, there will reach a point where, given the right circumstance of several things needing to be computed in one cycle, a software routine will not complete. Just as an example, the start up stage will be quiet busy. I would expect the ADIRU to determine its L or R status (perhaps read a pin) and store the result for other software routines to use. So if this action does not complete the ADIRU L or R status will stay at the default value; ADIRUs would both stay as L (or R).
5) flap position: From the preliminary report (Fig. 5 on accident flight & Fig. 7 on previous flight) there is a difference in when the flap position changes. Fig 5 shows a change well after rotation, Fig 7 shows a change at the point of rotation. Could this difference have affected how MCAS subsequently behaved?

31st Mar 2019, 19:40

#2826 (permalink)

DaveReidUK

Join Date: Jan 2008

Location: Reading, UK

Posts: 15,820

Likes: 6

Received 201 Likes on 93 Posts

Quote:

Originally Posted by VicMel

So after 100s or 1000s of flight hours the holes line up, the interrupt pings off in the middle of the new software, something (could be a data value, a status flag, a jump address or ….) is corrupted. The consequential behaviour could be any of many surprises!

That sounds like a description of a random failure, rather than something that would manifest itself over several consecutive flights, as was the case with Lion Air.

31st Mar 2019, 19:46

#2827 (permalink)

GlobalNav

Join Date: Aug 2013

Location: Washington.

Age: 74

Posts: 1,077

Likes: 278

Received 151 Likes on 53 Posts

Quote:

Originally Posted by bill fly

Well I don’t agree, for me AoA is an anolog value, which can be related directly to vane angle much more easily on a dial, than yet another strip display.

The AoA display should be designed to support the pilot’s proper and effective use of it (whatever that is). It should be considered, not in isolation, but in the context of the rest of the flight display(s) and the instrument scan which the pilots are expected to conduct. Does any airline which has aircraft equipped with the AoA display have approved pilot procedures for its use? If a check ride was conducted, in which phases of flight would a pilot be faulted for failure to maintain awareness of the AoA display? If one were to evaluate the AoA display design, what measure of performance would be used? Considering the approved Boeing EFIS with the AoA in the upper right corner, how does that fit Basic T flight display philosophy? Of course it doesn’t, because AoA was never part of the Basic T. But if there was a logical, task performance-based purpose for the AoA display, why would it be placed above the altitude display, about as far from the airspeed and attitude indications as it could be? Yet, we suppose it enhances safety?

31st Mar 2019, 19:58

#2828 (permalink)

ecto1

Join Date: Nov 2018

Location: madrid

Posts: 47

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by EDLB

In theory, (older 737s) each computer takes the three wires (the two analog signals) and amplify them, then demodulates them (turns them into DC) using the reference AC current that powers the vane, then filters them and finally go to an A/D converter. The values are stored at a memory block, and then a software block reads them, and translate them into a AOA (degrees (atan(sin/cos)), which is once more filtered (so two AOA are available, raw and filtered).

I would be really surprised if the software block did not, at that point, perform the plausibility check (sin^2+cos^2=vmax). (for instance, it shows a warning if the vane didn't move more than 3 degrees for a period of time).

But even if it didn't, shorting one of the signals to vref would produce 9 degrees offset at the vane. Does that translate to 22 degrees airplane AOA?

31st Mar 2019, 21:01

#2829 (permalink)

Wilderone

Join Date: Feb 2017

Location: Adelaide

Posts: 3

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by patplan

Except the AOA is not hooked up to MCAS on the NG

31st Mar 2019, 21:32

#2830 (permalink)

safetypee

Join Date: Dec 2002

Location: UK

Posts: 2,451

Likes: 0

Received 9 Likes on 5 Posts

VicMel, #2858, thanks for the reply.

So with my simplistic view the problem appears to be random, chance. Alternatively, as a sceptic, why 2 aircraft in 4 months, whereas the remaining fleet …
OK, so this is the nature of probability, together with the ever-increasing fleet size.

Thus the next question is where a ‘good’ AoA software fix could - should be made, but if not … at least the output of MCAS should be limited.
And if AoA is not fixed (still probability), there could still be problems with speed pressure error correction, air-data disagree, feel, and low speed awareness, but will these events be no more than experience in previous 737s, or if it still is an issue with the Max (inadequate software / FGC ADIRU overloaded) then there will be an increase in disagree alerts due to ‘corrupt’ AoA.

I may be dancing around the same tree as in my post in the other thread - #485 Boeing 737 Max Software Fixes Due to Lion Air Crash Delayed
Where is the value of AoA sampled by the FDR; would this clarify current understanding.

31st Mar 2019, 22:00

#2831 (permalink)

armchairpilot94116

Join Date: Jul 2007

Location: the City by the Bay

Posts: 547

Likes: 0

Received 0 Likes on 0 Posts

Is the Boeing 737 MAX Worth Saving?

can Boeing save it?

31st Mar 2019, 22:41

#2832 (permalink)

fdr

Join Date: Jun 2001

Location: 3rd Rock, #29B

Posts: 2,956

Likes: 470

Received 861 Likes on 257 Posts

Quote:

Originally Posted by GordonR_Cape

It is deeply ironic that the issue MCAS was designed to cater for was never flight critical, and might never have occurred during the lifetime of the aircraft. Instead the fix ended up killing hundreds of people.

This highlights the underlying issue that the industry has processes that can bite back. To achieve compliance with a particular rule a simple fix is implemented, and that has a potential for unintended consequences. The failure mode of the compliance fix has it's own unknown interaction with the operating system at the man machine interface; somewhere along the way recognition failed as to the underlying cause, for a crew that had never heard of the "fix" and to another crew that had learnt of the problem due to the revelations of the first crews misfortune. In fact, the knowledge gained in the flight preceding JT610 was lost on the next crew as well, the system doesn't allow for the timely transfer of information, and it probably cannot do so under any process that validates the information and the output to avoid errant information being introduced.

The constant offset once in motion would appear inconsistent with a loss of the sin or cos output alone as far as I understand the use of those functions to derive the A-D output state. Contend as previously commented that the sensor itself is unlikely to be the component that has the fault, which leads to the install, wiring, or processing of the signal as being the point of failure. The loss of a single resolved output is intriguing, giving an erroneous result but it would appear that the offset error would alter with the change of actual AOA. The aircraft was operated from low speed, through to high speed, with substantial change in actual AOA, but the offset appears to be constant.

31st Mar 2019, 23:04

#2833 (permalink)

Livesinafield

Join Date: Feb 2011

Location: Leeds

Posts: 1

Likes: 0

Received 0 Likes on 0 Posts

I think the publicity that the max has generated over the last few weeks especially since the grounding, could kill the airframe off... there may be no way back for it. The public are very powerful and could refuse to fly on it even after Boeing has "updated IOS" or whatever they are doing, I personally wont be going near one with my family even after its "fixed" and i fly for a living, so joe public will be even more cautious.

I honesly think its crazy that we are now in a situation where a plane is crashing and we are saying we need a software update... its gone too far.

People except humans make errors and pilots sometimes screw up, what they wont accept is software in charge of their lives

31st Mar 2019, 23:37

#2834 (permalink)

TryingToLearn

Join Date: Mar 2019

Location: Bavaria

Posts: 20

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by fdr

The constant offset once in motion would appear inconsistent with a loss of the sin or cos output alone as far as I understand the use of those functions to derive the A-D output state. Contend as previously commented that the sensor itself is unlikely to be the component that has the fault, which leads to the install, wiring, or processing of the signal as being the point of failure. The loss of a single resolved output is intriguing, giving an erroneous result but it would appear that the offset error would alter with the change of actual AOA. The aircraft was operated from low speed, through to high speed, with substantial change in actual AOA, but the offset appears to be constant.

There are not many failure modes which may cause such a constant deviation. If normal checks are in place, it rules out everything except wrong calculation within or after atan(sin/cos). Cabling, loss of ground, ADC error... there are checks for such problems and they do not cause a constant offset.
Again, there are 2 possibilities left if I didn't miss something (I'm evaluating such resolvers for 2 safety relevant systems within electric cars at the moment):
-> Electromagnetic interference (EMI) at exactly the frequency the sensor is working on (or the sensor locks on the interference frequency with it's resonator) I tried to find a correlation between engine running/rpm and sensor failure but could not find any. EMI from the new engines would have been a nice one.
-> Error within the calculation after sin/cos and plausibilisation (sin²+cos²=1) -> Software change / bug?

The sin and cos voltage is simply the x and y of a 2D unit vector (look for 'unit vector' on english wikipedia, I'm not allowed to post a link) The receiver simply checks if the unit vector has a length of 1. If a cable breaks or the ADC has an error, it won't.
If there is no need to measure 360°, the electrical full circle is often a fraction of the mechanical one. So the electrical vector would make 2/3/4 turns on one mechanical revolution of the fin. Therefore 22.5° deviation could come from 90° signal error or calculation error. Those 22° somehow smell like some 90° computational error (e.g. wrong sign). Especially since atan calculations in old software only have a table for one quadrant of the unit vector and then switch signs or add/subtract 90°/180°/270°.
Switching cables (sin/cos) would btw. invert the angle (90°-x).

Still: If this sensor design is so bad, why is it still the same for the last decades? What changed on the MAX which tampered the probability of this error that much?

Without an answer to this question I would not trust the AoA signals (and many other) at all! (...and I'm a functional safety consultant)

This SW fix tries to fix the impact of the failure, the root cause seems to be still unknown. But without knowing the cause, other side effects cannot be identified.

1st Apr 2019, 00:11

#2835 (permalink)

megan

Join Date: Mar 2005

Location: N/A

Posts: 5,944

Likes: 35

Received 394 Likes on 209 Posts

Quote:

I honesly think its crazy that we are now in a situation where a plane is crashing and we are saying we need a software update... its gone too far

This is not a first, software code has been responsible for prior accidents, Iberia A320 being one.

Quote:

The design of the flight control system was such that the actions of both pilots over the flight controls were ignored by the logic of the control system and prevented the aircraft from flaring.

The cause of the accident was the activation of the angle of attack protection system which, under a particular combination of vertical gusts and windshear and the simultaneous actions of both crew members on the sidesticks, not considered in the design, prevented the aeroplane from pitching up and flaring during the landing.

Fixed by a code modification.

http://www.fomento.es/NR/rdonlyres/8...006_A_ENG1.pdf

1st Apr 2019, 00:59

#2836 (permalink)

b1lanc

Join Date: Mar 2015

Location: North by Northwest

Posts: 476

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by megan

This is not a first, software code has been responsible for prior accidents, Iberia A320 being one.
Fixed by a code modification.

Many years ago while working on a fire-control system, we were evaluating test methodologies between the F-16's Westinghouse, General Dynamics Phalanx fire-control, and Airbus fly-by-wire. The Airbus strategy (as I recall which was about 4 decades back) was to deliver the code to 3 companies in three different countries, none of whom knew of the others existence. AB expected each would find some unique code exceptions by doing so. Not so. Well over 90% were identifed by multiple vendors including all deemed critical bugs save maybe one. The rest were not considered major flight control errors.

Maybe Gums could chime in here, but we had heard rumors (maybe urban legend) that some of the early F-16 deployments in Germany with look-down did on occasion lock on to low flying Mercedes on the Autobahn. As a designer, how many would consider that possibility?

There will never be a perfect balance between automation and human interaction. Automation is programmed by humans - mistakes will happen on both ends.

Last edited by b1lanc; 1st Apr 2019 at 11:26.

1st Apr 2019, 01:48

#2837 (permalink)

CurtainTwitcher

Join Date: Jul 2014

Location: Harbour Master Place

Posts: 662

Likes: 0

Received 0 Likes on 0 Posts

I'm not a software person, however, I have been interested in the automation + human factors since a Computer Science friend put me on a lead in the early 1990's with the Therac-25 accidents, this lead to reading more by Nancy Leveson: High-Pressure Steam Engines and Computer Software. This is a great introduction to the larger picture of the interaction between sophisticated hardware racing well beyond the much slower and risky software engineering in historical context with the engineering of steam engine vs dangerous and lagging boiler tech. Public pressure forced the formation of safety laws to protect the end users from dangerously engineered devices. Although written in 1992, I believe it still has many insights that make it relevant. She also has written much on safe software development techniques.

Her homepage: Nancy Leveson Professor of Aeronautics and Astronautics

1st Apr 2019, 05:32

#2838 (permalink)

armchairpilot94116

Join Date: Jul 2007

Location: the City by the Bay

Posts: 547

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by Livesinafield

The Max is all in for Boeing. Fastest selling Boeing jet, yada yada. Boeing is going to do whatever it takes to get that bird back in the air. Too much at stake. The only way for the airframe to end production is if the majority of the orders evaporate overnight. This probably won't happen. So if all goes well and the software patch works and airlines don't cancel orders and the general public goes back to flying it . All will be well. But Boeing will be wise to learn that it is time to get cracking at an all new 737 based loosely on the 757 perhaps. Or better, basically copy the A320. And to amortize the costs of the Max quickly so the line can end soon as the new one is ready. When will it be? Ten years? Now if the Max has another accident within the next few years, no matter the cause, that may be it. Boeing can't afford another Max going down.

1st Apr 2019, 06:33

#2839 (permalink)

bill fly

Join Date: Feb 2015

Location: The woods

Posts: 5

Likes: 0

Received 3 Likes on 2 Posts

Quote:

Originally Posted by GlobalNav

Hi Nav,
The purpose of the AoA indicator on the Max is not to read as an additional flight instrument.
It is a position indicator.
Therefore there is no requirement for it to be included in the scan etc.
If you get a disagree warning you can tell quickly which signal is the troublemaker.
To me it makes sense and yes, it is a safety feature.
If, together with the mod on MCAS travel and necessary information to converting crews, it had been incorporated from the beginning, then it would have given a valuable clue and could have prevented these tragic events.
I am still not a fan of the MCAS as a solution to the control force requirement. That doesn’t make me criticise every move Boeing makes, however, when they try to learn from their mistake.

1st Apr 2019, 07:01

#2840 (permalink)

ManaAdaSystem

Join Date: Aug 2000

Posts: 1,501

Likes: 0

Received 0 Likes on 0 Posts

This could very well be related to something else than just the AOA sensors.
What is different in the MAX AOA system compared to the NG? The sensors are the same.
I have never seen an AOA disagree caution on the NG, so why is does it fail on the MAX?
The sensor may have been installed wrongly on the Lion Air MAX, but that was not the case on the Etiopian MAX.