MAX’s Return Delayed by FAA Reevaluation of 737 Safety Procedures

Closed Thread Subscribe

Thread Tools

Search this Thread

2nd Aug 2019, 15:44

#1701 (permalink)

OldnGrounded

Thread Starter

Join Date: Apr 2015

Location: Under the radar, over the rainbow

Posts: 788

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by Mad (Flt) Scientist

What appears (from the outside) to be delaying a return to flight status isn't the complexity of the task, frankly. It's FAA now going into complete CYA mode and every other decision during the MAX certification being dragged out and placed under a microscope.

The clearly-flawed certification process and the number and severity of issues, including but not limited to those related to MCAS, being encountered essentially require that the FAA (and the other national and supranational authorities) "go into complete CYA mode." The consequences of additional catastrophic losses following return to service would be . . . catastrophic. And we should remember that it is not only regulatory A's that are being covered -- quite properly.

Quote:

With the people looking through the microscope (who are not just the FAA, or even industry authorities, but every politician or journo sensing a news opportunity) sometimes having little conception of how the delegated/overseen certification process is supposed to work. (And has worked well for years)

Substantial evidence now before us suggests that the process has not worked as well as some believe/would like to believe. And those politicians and journalists have every right to question the process. Please remember for whose benefit it is supposed to operate and who, ultimately, is responsible to insure that it does what is intended..

Last edited by OldnGrounded; 2nd Aug 2019 at 15:54. Reason: Typo.

2nd Aug 2019, 15:58

#1702 (permalink)

B737MAX Documentary

Join Date: Jul 2019

Location: London

Posts: 3

Likes: 0

Received 0 Likes on 0 Posts

Thanks for this, thcrozier. Its an interesting connection to make. My understanding is that NASA officials said that the chance of failure of the shuttle was about 1 in 100,000; Feynman found that this number was closer to 1 in 100.

2nd Aug 2019, 16:04

#1703 (permalink)

B737MAX Documentary

Join Date: Jul 2019

Location: London

Posts: 3

Likes: 0

Received 0 Likes on 0 Posts

Thanks for this, I will dig through them. I agree, Dominic Gates reporting has been terrific and is an excellent resource.
Kind regards, Clementine

2nd Aug 2019, 20:42

#1704 (permalink)

Tomaski

Join Date: Jun 2019

Location: VA

Posts: 210

Likes: 0

Received 0 Likes on 0 Posts

The Seattle Times article made this important point:

Quote:

“While it’s a theoretical failure mode that has never been known to occur, we cannot prove it can’t happen,” he said. “So we have to account for it in the design.”

He added that early published accounts of the fault suggesting that the microprocessor had been overwhelmed and its data-processing speed slowed, causing the pilot-control column thumb switches that move the stabilizer to respond slowly, were inaccurate.

Lemme said he was happy to learn this because those accounts hadn’t made sense technically.

When the first reports of this testing came out there was a tremendous amount of discussion and speculation about the FCC processors, how they got overwhelmed and the "fact" that the electric trim switches were slow to respond. Appears that discussion was all based on some reporters misunderstanding, so before the current discussion goes to far afield it might be worth remembering this. Even the best news outlets may not have the best information and it is very hard to understand deep technical details of this system without direct access to the experts and their data. Certainly frustrating as our understanding continues to evolve but a good caution against becoming to wedded to any particular theory.

2nd Aug 2019, 20:49

#1705 (permalink)

sooty655

Join Date: Jan 2008

Location: Weedon, UK

Age: 77

Posts: 125

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by B737MAX Documentary

In thcrozier's link, Appendix F is the Feynman observations. His final sentence reads -
"For a successful technology, reality must take precedence over public relations, for nature cannot be fooled."
Something which Boeing might do well to note today.

2nd Aug 2019, 20:54

#1706 (permalink)

Tomaski

Join Date: Jun 2019

Location: VA

Posts: 210

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by RickNRoll

i was scratching my head. A neutron had no charge but it has energy. A proton or anti proton has charge.

If it a neutron makes a direct impact in the right spot, the kinetic energy gets deposited and can cause a change in the state of the bit. It doesn't need a charge to do so.

2nd Aug 2019, 21:35

#1707 (permalink)

Fly Aiprt

Join Date: Mar 2019

Location: French Alps

Posts: 326

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by Tomaski

If it a neutron makes a direct impact in the right spot, the kinetic energy gets deposited and can cause a change in the state of the bit. It doesn't need a charge to do so.

More precisely, due to not being charged, neutrons cannot change the state of a circuit by themselves.
But within an certain energy range, they can be captured by a nucleus in a semi-conductor, eventually provoking a nuclear fission with emission of charged particles (alpha, etc.) and gamma rays, which themselves may cause a change of electromagnetic state in a small scale semi-conductor.

2nd Aug 2019, 22:02

#1708 (permalink)

etudiant

Join Date: May 2011

Location: NEW YORK

Posts: 1,352

Likes: 0

Received 1 Like on 1 Post

If there has not been any incident related to this purported cosmic ray upset risk thus far in the 737 experience, it seems a very low risk indeed.
What is suggests is that the regulators are actually doing their job diligently, catching up after an extended period of business as usual.
Whether the MAX will survive the process remains to be seen. Imho it will probably become the ritual sacrifice burned to purify Boeing of its past sins.

2nd Aug 2019, 23:33

#1709 (permalink)

Bend alot

Join Date: Oct 2017

Location: Tent

Posts: 916

Likes: 3

Received 19 Likes on 12 Posts

"As the FAA re-evaluates and recertifies the updated flight-control systems, it has specifically rejected Boeing’s assumption that the plane’s pilots can be relied upon as the backstop safeguard in scenarios such as the uncommanded movement of the horizontal tail involved in both the Indonesian and Ethiopian crashes. That notion was ruled out by FAA pilots in June when, during testing of the effect of a glitch in the computer hardware, one out of three pilots in a simulation failed to save the aircraft."

If this is fact - it puts a very large spotlight on what training will be required as part of the re-certification of the MAX. (another shadow on the NG?)

It will also be interesting to know the training/experience levels of these pilots that passed/failed to "save the aircraft" in the simulator.

Were there only 3 simulations carried out? or 6/9?
Use of the word "pilots" is not "crew" the word that should be used?

2nd Aug 2019, 23:40

#1710 (permalink)

Speed of Sound

Join Date: Jul 2002

Location: Ireland

Posts: 596

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by B737MAX Documentary

Feynman found that this number was closer to 1 in 100.

The actual failure rate turned out to be 1 in 65!

3rd Aug 2019, 00:57

#1711 (permalink)

krismiler

Join Date: Jul 2010

Location: Asia

Posts: 1,534

Likes: 8

Received 49 Likes on 31 Posts

If the MAX gets back in the air, its pilots will have to know more about the trim system then anything else on the aircraft and will spend hours in the simulator practicing every conceivable NON NORMAL from a totally revised manual. This is assuming Boeing can modify the aircraft to a standard which satisfies several regulatory authorities.

After a few flights with the airlines CEO and families onboard, there will be a slow, phased reintroduction as aircraft are modified and crews trained. Passenger confidence will eventually return as long as there are no further incidents. Boeing will need to chop the MAX as soon as they can and switch future orders to an up to date design with modern safety features. Getting the MAX back into the air is a short term solution to give Boeing cash flow and breathing space while they come up with a new aircraft, I can't see 5000 of them being produced. As soon as a new type is ready the B737s days are over.

3rd Aug 2019, 01:20

#1712 (permalink)

Notanatp

Join Date: Jul 2019

Location: Mass

Posts: 23

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by Speed of Sound

The actual failure rate turned out to be 1 in 65!

Interestingly, none of the reporting has said just how likely the test scenario was to occur in flight, other than to say it was "esoteric," or "theoretical," or "extremely improbable."

The test simulated the effect of 5 bits being simultaneously flipped in the FCC's memory due to cosmic rays or some other unnamed cause. The 5 bits were independent. One bit was flipped to tell the FCC that MCAS was active when it wasn't (this disabled the yoke cut-off switches). Another bit told the FCC to incorrectly issue a nose-down trim command. The other 3 bits weren't described but apparently were necessary to create the runaway trim scenario.

Assuming a 5-bit event, the odds of these 5 specific bits being flipped depends on the size of the memory. If the FCC has 1-megabyte of RAM, then the odds of those particular 5 bits being flipped in a 5-bit event should be on the order 10**-32. I don't know how frequently the FAA estimates a 5-bit event will occur on an aircraft with a 1-meg memory, but even if it approaches 1 (i.e., it can be expected to happen each flight), the likelihood of this particular 5-bit event occurring on any single flight is on the order 10**-32. You could fly all the 737 Max's that will ever be made 3 times a day for a billion years, and you'd still be looking at probabilities on the order of 10**-16.

As for the rest of the reporting, I cannot find it now but I've read one account of the test that said all three test pilots recovered the airplane if it was assumed they recognized the problem within 3 seconds (the time to respond to an autopilot runaway pitch trim). The FAA wasn't satisfied with that so it ran additional tests were it allowed the failure to go longer before being recognized, and one of the three pilots either couldn't recover or didn't recovery fast enough. My understanding is that it was on the basis of this addition test where it was assumed that only an exceptional pilot would have recovered, therefore, the catastrophic classification. Has anyone else seen this account, and if it is correct, then doesn't it sound like this is rigged against Boeing?

3rd Aug 2019, 01:22

#1713 (permalink)

Loose rivets

Psychophysiological entity

Join Date: Jun 2001

Location: Tweet Rob_Benham Famous author. Well, slightly famous.

Age: 84

Posts: 3,270

Likes: 11

Received 33 Likes on 16 Posts

Which makes being a customer an unattractive proposition.

3rd Aug 2019, 01:55

#1714 (permalink)

Loose rivets

Psychophysiological entity

Join Date: Jun 2001

Location: Tweet Rob_Benham Famous author. Well, slightly famous.

Age: 84

Posts: 3,270

Likes: 11

Received 33 Likes on 16 Posts

Quote:

Even the best news outlets may not have the best information and it is very hard to understand deep technical details of this system without direct access to the experts and their data.

It's been mentioned in a past thread the work needed to post, perhaps only say, 300 words, in a readable and technically accurate form. The poster in mind must have spent many, many hours on input valued by this forum's contributors.

Even the choice of valid English words can be a challenge; that definitive Seattle Times summation used the word 'instrument' for what was obviously the AoA detector vane. While this is broadly correct, of course in aviation we are strongly biased to imagine an instrument as being a display device. I for one would find writing and proof-reading an entire documentary more demanding than I care to imagine. Add to this, information pouring in anew.

In the last months we've agonised over recorder details and not been totally sure about some of the readouts. The ANU electric trim input pulses being a case in point. Now, all this time later, there is the suggestion that some of the electronic processing may, just possibly, be inhibiting some vital inputs. Just imagine if this turns out to be true of the electronics in use at the time of the accidents. The pilot-competence v Boeing argument would become frighteningly weighted with a new bias. When findings radically move goal posts, it's easy to see how an entire chapter can be rendered obsolete.

3rd Aug 2019, 01:56

#1715 (permalink)

ARealTimTuffy

Join Date: Jun 2019

Location: Tdot

Posts: 48

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by Bend alot

According to the Seattle Times , there were 3 Faa pilots. 2 test pilots, typically ex military and one typically former airline pilot. They tested 33 different scenarios. After feeling uncomfortable with the reaction time necessary during a test, they delayed response a bit to allow a more realistic scenario. One of the pilots did not recover. It was the former airline pilot that had trouble according to a source.

It is in the Seattle Times article.

One would hope that identifying this now and rewriting most of the flight control software that these particular problems won’t occur anymore. Training could then be spent on more generic stabilizer cases and not the rare cases they are hopefully rectifying.

3rd Aug 2019, 04:06

#1716 (permalink)

GlobalNav

Join Date: Aug 2013

Location: Washington.

Age: 74

Posts: 1,077

Likes: 277

Received 151 Likes on 53 Posts

Quote:

Originally Posted by Mad (Flt) Scientist

I think some people are overestimating the difficulty of a software design change.

Having been part of a significant software redesign of a flight control system for a part 25 aircraft, which addressed a multitude of failure cases (including some we found in the course of the redesign and the associated design reviews) and which included some fundamental architectural changes, easily of greater scope than going from flip-flop alternating single input to dual inputs, and which took us from incident, through grounding, return to test flight, (re)certification and EIS inside a 12 month period, with frankly an order of magnitude less resources than Boeing can put on this task, I have to say that the timescales are more than achievable.

What appears (from the outside) to be delaying a return to flight status isn't the complexity of the task, frankly. It's FAA now going into complete CYA mode and every other decision during the MAX certification being dragged out and placed under a microscope. With the people looking through the microscope (who are not just the FAA, or even industry authorities, but every politician or journo sensing a news opportunity) sometimes having little conception of how the delegated/overseen certification process is supposed to work. (And has worked well for years)

From the standpoint of software functionality, I would not strongly disagree. But considering how this aspect of the airplane was certified, I can hardly imagine the certified software has the appropriate Design Assurance Level. I say this because the entire design of the system (including interfaces) demonstrates an underestimation of the hazard classification of malfunctions, particularly an invalid AoA input. Single AoA input with no comparisons with other sensors to validate the integrity of the input implies a low hazard classification - Minor or (less likely) Major, but certainly not Catastrophic as history has established. To bring the software up toe DAL A from what was likely no more than C is NOT a trivial matter and amounts to starting from scratch to meet all the certification requirements.

Of course, considering the certification processes of the last several years, it would not surprise me if a way to avoid this complication by some sort of justification will be attempted by Boeing, in concert with FAA. The attention now given to this by other national/multi-national aviation authorities may prevent the FAA management from allowing such a shortcut, if indeed the FAA was tempted to try it.

3rd Aug 2019, 04:56

#1717 (permalink)

Water pilot

Join Date: Mar 2015

Location: Washington state

Posts: 209

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by mrdeux

Some years ago, I discovered a way to reliably, and repeatedly, make a 767, with autopilot engaged in VNAV, fly through the MCP altitude. I reported this to our tech people, who passed it along to Boeing. Very quickly I heard that they’d been able to replicate it in a system sim, and a red bulletin was soon issued. It was fixed in an update a few months later.

Fast forward ten years, and I was now flying the 747. An update came out, and lo and behold, the MCP bug had reappeared. Apparently the software had simply been modified to bypass the offending code, and a later update, had removed the bypass.

The point is that the software fix itself was not permanent.

That is a very important point, and something that I have unfortunately noticed in navigation systems recently. A lot of software these days is done with temporary contractors, so there is no institutional memory about why the code is done the way that it is -- and in my experience many programmers (especially newer ones) do not bother to look through the change history of the code to understand it. In addition, many programmers (both new and old) don't bother to document their decisions properly, despite all of the automated nag systems, management reviews, etc. (You tend to get a lot of boilerplate.)

On one machine that I worked on, there was a nexus of software bugs that had been revolving around each other for about a decade. There were three symptoms, all moderately difficult to fix. In condition 1, the machine would (very rarely) lock up. In condition 2, a critical table in the system was (very rarely) corrupted. In condition 3, the transaction log (used for distributed processing) was corrupted. I don't remember the exact relationship but it was something like if you fixed 1 & 2 you got 3, and if you fixed three you got 2, and if you fixed 2 and 3 you got 1. I think I encountered condition 2, but checking the change log (if any programmers are here, take note) I noticed that my fix which I was proud of was actually identical to code that had been put in place years ago.

It ended up being a real bear to more or less fix but this was in the dark ages and the two original programmers of the system were still with the company and we were all able to put our heads together and understand what was going on. It was one of those cases where the actual fix probably would have been to completely redesign the system but that was not an option (sound familiar?) Nowdays, I think the newly minted programmer contractor would simply fix the condition that they were presented with, the next newly minted programmer contractor would fix the next condition, and (as you experience) the end result would be bugs that pop in and out of the system at each release cycle. It would not surprise me all that much if all of the vaunted changes to MCAS get rolled back sometime in the future by newly minted programmers who never heard of Lion Air.

3rd Aug 2019, 06:21

#1718 (permalink)

BDAttitude

Join Date: Apr 2019

Location: EDSP

Posts: 334

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by Tomaski

The Seattle Times article made this important point:

When the first reports of this testing came out there was a tremendous amount of discussion and speculation about the FCC processors, how they got overwhelmed and the "fact" that the electric trim switches were slow to respond. Appears that discussion was all based on some reporters misunderstanding, so before the current discussion goes to far afield it might be worth remembering this. Even the best news outlets may not have the best information and it is very hard to understand deep technical details of this system without direct access to the experts and their data. Certainly frustrating as our understanding continues to evolve but a good caution against becoming to wedded to any particular theory.

Unfortunately the information blobs still don't line up. So the fault scenario seems to be a bit flip. I'm not sure about the number five. I dont't really believe they tested for five flips coinciding. Maybe a total of five scenarios involving a bit flip somewhere. Anyway, the AND trim output persits and does not seem to be stoppable w/o cutting off. This seems odd. One would expect that either the FCC power latches after detecting a memory parity error - taking a safe output state during power latch - or continues operating, overwriting the flipped bit with new information when the state changes, I.e. manual electric trim is applied. Neither obviously does happen so this could very well be the point where the "overwhelmed", "sluggish" part somehow comes into play.
The Seattle Times articles by Mr. Gates are quality wise way ahead of the products of other news outlets. However after this one I am not having less questions then before.

3rd Aug 2019, 08:37

#1719 (permalink)

GordonR_Cape

Join Date: Dec 2015

Location: Cape Town, ZA

Age: 62

Posts: 424

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by Notanatp

The Seattle Times article explains the story quite well, although we may never have all of the details. The link may be paywalled, though most of the content was posted previously (Edit: by Zeffy on 1 Aug) on this thread: https://www.seattletimes.com/busines...ight-controls/

The starting point for the retrospective FAA analysis is not what could go wrong, but rather that we already know something did go terribly wrong, so zoom on in each of the possible failure modes that could lead to this specific condition. Once we start from that perspective, no error in the FCC is ever acceptable, no matter how improbable. Fixing software does not add weight to an aircraft, and does not impact on the fuel efficiency, unlike hardware where there are permissible tradeoffs. AFAIK if there is a known catastrophic software fault, it must be eliminated.

IMO your estimates of bit-flipping probability are backwards. Since we are a-priori interested only in these 5 bits, the size of RAM is irrelevant. For this evaluation, the probability of each bit being flipped is the only important parameter. These may or may not be independent, and this scenario is still extremely unlikely, but not to the extent that you calculate.

AFAIK the 3 second and 1 second limits are not assumptions, but mandated by FAA regulations. Allowing the runaway to continue for more than 3 seconds seems an entirely rational test, knowing what we do about the recent crashes, and the difficulty of detecting runaway trim.

IMO this does not sound rigged at all, merely being tested properly, the way it should have been done. Boeing have not resisted, but seem to accept that this is the right way to fix the problem.

3rd Aug 2019, 08:40

#1720 (permalink)

yoganmahew

Join Date: Aug 2007

Location: Tullamore

Posts: 27

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by Water pilot

It was one of those cases where the actual fix probably would have been to completely redesign the system but that was not an option (sound familiar?) Nowdays, I think the newly minted programmer contractor would simply fix the condition that they were presented with, the next newly minted programmer contractor would fix the next condition, and (as you experience) the end result would be bugs that pop in and out of the system at each release cycle. It would not surprise me all that much if all of the vaunted changes to MCAS get rolled back sometime in the future by newly minted programmers who never heard of Lion Air.

What a depressingly familiar story Water pilot.

Add in a management that underestimates the limitations of software solutions (especially ones that need to ALWAYS make smart decisions) and way over-estimates the speed at which they *should* be implemented, and it begins to sound a bit like Boeing.