MAX’s Return Delayed by FAA Reevaluation of 737 Safety Procedures
Thread Starter
Join Date: Apr 2015
Location: Under the radar, over the rainbow
Posts: 788
Likes: 0
Received 0 Likes
on
0 Posts
What appears (from the outside) to be delaying a return to flight status isn't the complexity of the task, frankly. It's FAA now going into complete CYA mode and every other decision during the MAX certification being dragged out and placed under a microscope.
With the people looking through the microscope (who are not just the FAA, or even industry authorities, but every politician or journo sensing a news opportunity) sometimes having little conception of how the delegated/overseen certification process is supposed to work. (And has worked well for years)
Last edited by OldnGrounded; 2nd Aug 2019 at 15:54. Reason: Typo.
Join Date: Jul 2019
Location: London
Posts: 3
Likes: 0
Received 0 Likes
on
0 Posts
Thanks for this, thcrozier. Its an interesting connection to make. My understanding is that NASA officials said that the chance of failure of the shuttle was about 1 in 100,000; Feynman found that this number was closer to 1 in 100.
Join Date: Jun 2019
Location: VA
Posts: 210
Likes: 0
Received 0 Likes
on
0 Posts
The Seattle Times article made this important point:
When the first reports of this testing came out there was a tremendous amount of discussion and speculation about the FCC processors, how they got overwhelmed and the "fact" that the electric trim switches were slow to respond. Appears that discussion was all based on some reporters misunderstanding, so before the current discussion goes to far afield it might be worth remembering this. Even the best news outlets may not have the best information and it is very hard to understand deep technical details of this system without direct access to the experts and their data. Certainly frustrating as our understanding continues to evolve but a good caution against becoming to wedded to any particular theory.
“While it’s a theoretical failure mode that has never been known to occur, we cannot prove it can’t happen,” he said. “So we have to account for it in the design.”
He added that early published accounts of the fault suggesting that the microprocessor had been overwhelmed and its data-processing speed slowed, causing the pilot-control column thumb switches that move the stabilizer to respond slowly, were inaccurate.
Lemme said he was happy to learn this because those accounts hadn’t made sense technically.
He added that early published accounts of the fault suggesting that the microprocessor had been overwhelmed and its data-processing speed slowed, causing the pilot-control column thumb switches that move the stabilizer to respond slowly, were inaccurate.
Lemme said he was happy to learn this because those accounts hadn’t made sense technically.
Join Date: Jan 2008
Location: Weedon, UK
Age: 77
Posts: 125
Likes: 0
Received 0 Likes
on
0 Posts
"For a successful technology, reality must take precedence over public relations, for nature cannot be fooled."
Something which Boeing might do well to note today.
Join Date: Jun 2019
Location: VA
Posts: 210
Likes: 0
Received 0 Likes
on
0 Posts
If it a neutron makes a direct impact in the right spot, the kinetic energy gets deposited and can cause a change in the state of the bit. It doesn't need a charge to do so.
Join Date: Mar 2019
Location: French Alps
Posts: 326
Likes: 0
Received 0 Likes
on
0 Posts
But within an certain energy range, they can be captured by a nucleus in a semi-conductor, eventually provoking a nuclear fission with emission of charged particles (alpha, etc.) and gamma rays, which themselves may cause a change of electromagnetic state in a small scale semi-conductor.
If there has not been any incident related to this purported cosmic ray upset risk thus far in the 737 experience, it seems a very low risk indeed.
What is suggests is that the regulators are actually doing their job diligently, catching up after an extended period of business as usual.
Whether the MAX will survive the process remains to be seen. Imho it will probably become the ritual sacrifice burned to purify Boeing of its past sins.
What is suggests is that the regulators are actually doing their job diligently, catching up after an extended period of business as usual.
Whether the MAX will survive the process remains to be seen. Imho it will probably become the ritual sacrifice burned to purify Boeing of its past sins.
"As the FAA re-evaluates and recertifies the updated flight-control systems, it has specifically rejected Boeing’s assumption that the plane’s pilots can be relied upon as the backstop safeguard in scenarios such as the uncommanded movement of the horizontal tail involved in both the Indonesian and Ethiopian crashes. That notion was ruled out by FAA pilots in June when, during testing of the effect of a glitch in the computer hardware, one out of three pilots in a simulation failed to save the aircraft."
If this is fact - it puts a very large spotlight on what training will be required as part of the re-certification of the MAX. (another shadow on the NG?)
It will also be interesting to know the training/experience levels of these pilots that passed/failed to "save the aircraft" in the simulator.
Were there only 3 simulations carried out? or 6/9?
Use of the word "pilots" is not "crew" the word that should be used?
If this is fact - it puts a very large spotlight on what training will be required as part of the re-certification of the MAX. (another shadow on the NG?)
It will also be interesting to know the training/experience levels of these pilots that passed/failed to "save the aircraft" in the simulator.
Were there only 3 simulations carried out? or 6/9?
Use of the word "pilots" is not "crew" the word that should be used?
If the MAX gets back in the air, its pilots will have to know more about the trim system then anything else on the aircraft and will spend hours in the simulator practicing every conceivable NON NORMAL from a totally revised manual. This is assuming Boeing can modify the aircraft to a standard which satisfies several regulatory authorities.
After a few flights with the airlines CEO and families onboard, there will be a slow, phased reintroduction as aircraft are modified and crews trained. Passenger confidence will eventually return as long as there are no further incidents. Boeing will need to chop the MAX as soon as they can and switch future orders to an up to date design with modern safety features. Getting the MAX back into the air is a short term solution to give Boeing cash flow and breathing space while they come up with a new aircraft, I can't see 5000 of them being produced. As soon as a new type is ready the B737s days are over.
After a few flights with the airlines CEO and families onboard, there will be a slow, phased reintroduction as aircraft are modified and crews trained. Passenger confidence will eventually return as long as there are no further incidents. Boeing will need to chop the MAX as soon as they can and switch future orders to an up to date design with modern safety features. Getting the MAX back into the air is a short term solution to give Boeing cash flow and breathing space while they come up with a new aircraft, I can't see 5000 of them being produced. As soon as a new type is ready the B737s days are over.
Join Date: Jul 2019
Location: Mass
Posts: 23
Likes: 0
Received 0 Likes
on
0 Posts
Interestingly, none of the reporting has said just how likely the test scenario was to occur in flight, other than to say it was "esoteric," or "theoretical," or "extremely improbable."
The test simulated the effect of 5 bits being simultaneously flipped in the FCC's memory due to cosmic rays or some other unnamed cause. The 5 bits were independent. One bit was flipped to tell the FCC that MCAS was active when it wasn't (this disabled the yoke cut-off switches). Another bit told the FCC to incorrectly issue a nose-down trim command. The other 3 bits weren't described but apparently were necessary to create the runaway trim scenario.
Assuming a 5-bit event, the odds of these 5 specific bits being flipped depends on the size of the memory. If the FCC has 1-megabyte of RAM, then the odds of those particular 5 bits being flipped in a 5-bit event should be on the order 10**-32. I don't know how frequently the FAA estimates a 5-bit event will occur on an aircraft with a 1-meg memory, but even if it approaches 1 (i.e., it can be expected to happen each flight), the likelihood of this particular 5-bit event occurring on any single flight is on the order 10**-32. You could fly all the 737 Max's that will ever be made 3 times a day for a billion years, and you'd still be looking at probabilities on the order of 10**-16.
As for the rest of the reporting, I cannot find it now but I've read one account of the test that said all three test pilots recovered the airplane if it was assumed they recognized the problem within 3 seconds (the time to respond to an autopilot runaway pitch trim). The FAA wasn't satisfied with that so it ran additional tests were it allowed the failure to go longer before being recognized, and one of the three pilots either couldn't recover or didn't recovery fast enough. My understanding is that it was on the basis of this addition test where it was assumed that only an exceptional pilot would have recovered, therefore, the catastrophic classification. Has anyone else seen this account, and if it is correct, then doesn't it sound like this is rigged against Boeing?
The test simulated the effect of 5 bits being simultaneously flipped in the FCC's memory due to cosmic rays or some other unnamed cause. The 5 bits were independent. One bit was flipped to tell the FCC that MCAS was active when it wasn't (this disabled the yoke cut-off switches). Another bit told the FCC to incorrectly issue a nose-down trim command. The other 3 bits weren't described but apparently were necessary to create the runaway trim scenario.
Assuming a 5-bit event, the odds of these 5 specific bits being flipped depends on the size of the memory. If the FCC has 1-megabyte of RAM, then the odds of those particular 5 bits being flipped in a 5-bit event should be on the order 10**-32. I don't know how frequently the FAA estimates a 5-bit event will occur on an aircraft with a 1-meg memory, but even if it approaches 1 (i.e., it can be expected to happen each flight), the likelihood of this particular 5-bit event occurring on any single flight is on the order 10**-32. You could fly all the 737 Max's that will ever be made 3 times a day for a billion years, and you'd still be looking at probabilities on the order of 10**-16.
As for the rest of the reporting, I cannot find it now but I've read one account of the test that said all three test pilots recovered the airplane if it was assumed they recognized the problem within 3 seconds (the time to respond to an autopilot runaway pitch trim). The FAA wasn't satisfied with that so it ran additional tests were it allowed the failure to go longer before being recognized, and one of the three pilots either couldn't recover or didn't recovery fast enough. My understanding is that it was on the basis of this addition test where it was assumed that only an exceptional pilot would have recovered, therefore, the catastrophic classification. Has anyone else seen this account, and if it is correct, then doesn't it sound like this is rigged against Boeing?
Psychophysiological entity
Even the best news outlets may not have the best information and it is very hard to understand deep technical details of this system without direct access to the experts and their data.
Even the choice of valid English words can be a challenge; that definitive Seattle Times summation used the word 'instrument' for what was obviously the AoA detector vane. While this is broadly correct, of course in aviation we are strongly biased to imagine an instrument as being a display device. I for one would find writing and proof-reading an entire documentary more demanding than I care to imagine. Add to this, information pouring in anew.
In the last months we've agonised over recorder details and not been totally sure about some of the readouts. The ANU electric trim input pulses being a case in point. Now, all this time later, there is the suggestion that some of the electronic processing may, just possibly, be inhibiting some vital inputs. Just imagine if this turns out to be true of the electronics in use at the time of the accidents. The pilot-competence v Boeing argument would become frighteningly weighted with a new bias. When findings radically move goal posts, it's easy to see how an entire chapter can be rendered obsolete.
Join Date: Jun 2019
Location: Tdot
Posts: 48
Likes: 0
Received 0 Likes
on
0 Posts
"As the FAA re-evaluates and recertifies the updated flight-control systems, it has specifically rejected Boeing’s assumption that the plane’s pilots can be relied upon as the backstop safeguard in scenarios such as the uncommanded movement of the horizontal tail involved in both the Indonesian and Ethiopian crashes. That notion was ruled out by FAA pilots in June when, during testing of the effect of a glitch in the computer hardware, one out of three pilots in a simulation failed to save the aircraft."
If this is fact - it puts a very large spotlight on what training will be required as part of the re-certification of the MAX. (another shadow on the NG?)
It will also be interesting to know the training/experience levels of these pilots that passed/failed to "save the aircraft" in the simulator.
Were there only 3 simulations carried out? or 6/9?
Use of the word "pilots" is not "crew" the word that should be used?
If this is fact - it puts a very large spotlight on what training will be required as part of the re-certification of the MAX. (another shadow on the NG?)
It will also be interesting to know the training/experience levels of these pilots that passed/failed to "save the aircraft" in the simulator.
Were there only 3 simulations carried out? or 6/9?
Use of the word "pilots" is not "crew" the word that should be used?
It is in the Seattle Times article.
One would hope that identifying this now and rewriting most of the flight control software that these particular problems won’t occur anymore. Training could then be spent on more generic stabilizer cases and not the rare cases they are hopefully rectifying.
I think some people are overestimating the difficulty of a software design change.
Having been part of a significant software redesign of a flight control system for a part 25 aircraft, which addressed a multitude of failure cases (including some we found in the course of the redesign and the associated design reviews) and which included some fundamental architectural changes, easily of greater scope than going from flip-flop alternating single input to dual inputs, and which took us from incident, through grounding, return to test flight, (re)certification and EIS inside a 12 month period, with frankly an order of magnitude less resources than Boeing can put on this task, I have to say that the timescales are more than achievable.
What appears (from the outside) to be delaying a return to flight status isn't the complexity of the task, frankly. It's FAA now going into complete CYA mode and every other decision during the MAX certification being dragged out and placed under a microscope. With the people looking through the microscope (who are not just the FAA, or even industry authorities, but every politician or journo sensing a news opportunity) sometimes having little conception of how the delegated/overseen certification process is supposed to work. (And has worked well for years)
Having been part of a significant software redesign of a flight control system for a part 25 aircraft, which addressed a multitude of failure cases (including some we found in the course of the redesign and the associated design reviews) and which included some fundamental architectural changes, easily of greater scope than going from flip-flop alternating single input to dual inputs, and which took us from incident, through grounding, return to test flight, (re)certification and EIS inside a 12 month period, with frankly an order of magnitude less resources than Boeing can put on this task, I have to say that the timescales are more than achievable.
What appears (from the outside) to be delaying a return to flight status isn't the complexity of the task, frankly. It's FAA now going into complete CYA mode and every other decision during the MAX certification being dragged out and placed under a microscope. With the people looking through the microscope (who are not just the FAA, or even industry authorities, but every politician or journo sensing a news opportunity) sometimes having little conception of how the delegated/overseen certification process is supposed to work. (And has worked well for years)
Of course, considering the certification processes of the last several years, it would not surprise me if a way to avoid this complication by some sort of justification will be attempted by Boeing, in concert with FAA. The attention now given to this by other national/multi-national aviation authorities may prevent the FAA management from allowing such a shortcut, if indeed the FAA was tempted to try it.
Join Date: Mar 2015
Location: Washington state
Posts: 209
Likes: 0
Received 0 Likes
on
0 Posts
Some years ago, I discovered a way to reliably, and repeatedly, make a 767, with autopilot engaged in VNAV, fly through the MCP altitude. I reported this to our tech people, who passed it along to Boeing. Very quickly I heard that they’d been able to replicate it in a system sim, and a red bulletin was soon issued. It was fixed in an update a few months later.
Fast forward ten years, and I was now flying the 747. An update came out, and lo and behold, the MCP bug had reappeared. Apparently the software had simply been modified to bypass the offending code, and a later update, had removed the bypass.
The point is that the software fix itself was not permanent.
Fast forward ten years, and I was now flying the 747. An update came out, and lo and behold, the MCP bug had reappeared. Apparently the software had simply been modified to bypass the offending code, and a later update, had removed the bypass.
The point is that the software fix itself was not permanent.
On one machine that I worked on, there was a nexus of software bugs that had been revolving around each other for about a decade. There were three symptoms, all moderately difficult to fix. In condition 1, the machine would (very rarely) lock up. In condition 2, a critical table in the system was (very rarely) corrupted. In condition 3, the transaction log (used for distributed processing) was corrupted. I don't remember the exact relationship but it was something like if you fixed 1 & 2 you got 3, and if you fixed three you got 2, and if you fixed 2 and 3 you got 1. I think I encountered condition 2, but checking the change log (if any programmers are here, take note) I noticed that my fix which I was proud of was actually identical to code that had been put in place years ago.
It ended up being a real bear to more or less fix but this was in the dark ages and the two original programmers of the system were still with the company and we were all able to put our heads together and understand what was going on. It was one of those cases where the actual fix probably would have been to completely redesign the system but that was not an option (sound familiar?) Nowdays, I think the newly minted programmer contractor would simply fix the condition that they were presented with, the next newly minted programmer contractor would fix the next condition, and (as you experience) the end result would be bugs that pop in and out of the system at each release cycle. It would not surprise me all that much if all of the vaunted changes to MCAS get rolled back sometime in the future by newly minted programmers who never heard of Lion Air.
Join Date: Apr 2019
Location: EDSP
Posts: 334
Likes: 0
Received 0 Likes
on
0 Posts
The Seattle Times article made this important point:
When the first reports of this testing came out there was a tremendous amount of discussion and speculation about the FCC processors, how they got overwhelmed and the "fact" that the electric trim switches were slow to respond. Appears that discussion was all based on some reporters misunderstanding, so before the current discussion goes to far afield it might be worth remembering this. Even the best news outlets may not have the best information and it is very hard to understand deep technical details of this system without direct access to the experts and their data. Certainly frustrating as our understanding continues to evolve but a good caution against becoming to wedded to any particular theory.
When the first reports of this testing came out there was a tremendous amount of discussion and speculation about the FCC processors, how they got overwhelmed and the "fact" that the electric trim switches were slow to respond. Appears that discussion was all based on some reporters misunderstanding, so before the current discussion goes to far afield it might be worth remembering this. Even the best news outlets may not have the best information and it is very hard to understand deep technical details of this system without direct access to the experts and their data. Certainly frustrating as our understanding continues to evolve but a good caution against becoming to wedded to any particular theory.
The Seattle Times articles by Mr. Gates are quality wise way ahead of the products of other news outlets. However after this one I am not having less questions then before.
Join Date: Dec 2015
Location: Cape Town, ZA
Age: 62
Posts: 424
Likes: 0
Received 0 Likes
on
0 Posts
Interestingly, none of the reporting has said just how likely the test scenario was to occur in flight, other than to say it was "esoteric," or "theoretical," or "extremely improbable."
The test simulated the effect of 5 bits being simultaneously flipped in the FCC's memory due to cosmic rays or some other unnamed cause. The 5 bits were independent. One bit was flipped to tell the FCC that MCAS was active when it wasn't (this disabled the yoke cut-off switches). Another bit told the FCC to incorrectly issue a nose-down trim command. The other 3 bits weren't described but apparently were necessary to create the runaway trim scenario.
Assuming a 5-bit event, the odds of these 5 specific bits being flipped depends on the size of the memory. If the FCC has 1-megabyte of RAM, then the odds of those particular 5 bits being flipped in a 5-bit event should be on the order 10**-32. I don't know how frequently the FAA estimates a 5-bit event will occur on an aircraft with a 1-meg memory, but even if it approaches 1 (i.e., it can be expected to happen each flight), the likelihood of this particular 5-bit event occurring on any single flight is on the order 10**-32. You could fly all the 737 Max's that will ever be made 3 times a day for a billion years, and you'd still be looking at probabilities on the order of 10**-16.
As for the rest of the reporting, I cannot find it now but I've read one account of the test that said all three test pilots recovered the airplane if it was assumed they recognized the problem within 3 seconds (the time to respond to an autopilot runaway pitch trim). The FAA wasn't satisfied with that so it ran additional tests were it allowed the failure to go longer before being recognized, and one of the three pilots either couldn't recover or didn't recovery fast enough. My understanding is that it was on the basis of this addition test where it was assumed that only an exceptional pilot would have recovered, therefore, the catastrophic classification. Has anyone else seen this account, and if it is correct, then doesn't it sound like this is rigged against Boeing?
The test simulated the effect of 5 bits being simultaneously flipped in the FCC's memory due to cosmic rays or some other unnamed cause. The 5 bits were independent. One bit was flipped to tell the FCC that MCAS was active when it wasn't (this disabled the yoke cut-off switches). Another bit told the FCC to incorrectly issue a nose-down trim command. The other 3 bits weren't described but apparently were necessary to create the runaway trim scenario.
Assuming a 5-bit event, the odds of these 5 specific bits being flipped depends on the size of the memory. If the FCC has 1-megabyte of RAM, then the odds of those particular 5 bits being flipped in a 5-bit event should be on the order 10**-32. I don't know how frequently the FAA estimates a 5-bit event will occur on an aircraft with a 1-meg memory, but even if it approaches 1 (i.e., it can be expected to happen each flight), the likelihood of this particular 5-bit event occurring on any single flight is on the order 10**-32. You could fly all the 737 Max's that will ever be made 3 times a day for a billion years, and you'd still be looking at probabilities on the order of 10**-16.
As for the rest of the reporting, I cannot find it now but I've read one account of the test that said all three test pilots recovered the airplane if it was assumed they recognized the problem within 3 seconds (the time to respond to an autopilot runaway pitch trim). The FAA wasn't satisfied with that so it ran additional tests were it allowed the failure to go longer before being recognized, and one of the three pilots either couldn't recover or didn't recovery fast enough. My understanding is that it was on the basis of this addition test where it was assumed that only an exceptional pilot would have recovered, therefore, the catastrophic classification. Has anyone else seen this account, and if it is correct, then doesn't it sound like this is rigged against Boeing?
The starting point for the retrospective FAA analysis is not what could go wrong, but rather that we already know something did go terribly wrong, so zoom on in each of the possible failure modes that could lead to this specific condition. Once we start from that perspective, no error in the FCC is ever acceptable, no matter how improbable. Fixing software does not add weight to an aircraft, and does not impact on the fuel efficiency, unlike hardware where there are permissible tradeoffs. AFAIK if there is a known catastrophic software fault, it must be eliminated.
IMO your estimates of bit-flipping probability are backwards. Since we are a-priori interested only in these 5 bits, the size of RAM is irrelevant. For this evaluation, the probability of each bit being flipped is the only important parameter. These may or may not be independent, and this scenario is still extremely unlikely, but not to the extent that you calculate.
AFAIK the 3 second and 1 second limits are not assumptions, but mandated by FAA regulations. Allowing the runaway to continue for more than 3 seconds seems an entirely rational test, knowing what we do about the recent crashes, and the difficulty of detecting runaway trim.
IMO this does not sound rigged at all, merely being tested properly, the way it should have been done. Boeing have not resisted, but seem to accept that this is the right way to fix the problem.
Join Date: Aug 2007
Location: Tullamore
Posts: 27
Likes: 0
Received 0 Likes
on
0 Posts
It was one of those cases where the actual fix probably would have been to completely redesign the system but that was not an option (sound familiar?) Nowdays, I think the newly minted programmer contractor would simply fix the condition that they were presented with, the next newly minted programmer contractor would fix the next condition, and (as you experience) the end result would be bugs that pop in and out of the system at each release cycle. It would not surprise me all that much if all of the vaunted changes to MCAS get rolled back sometime in the future by newly minted programmers who never heard of Lion Air.
Add in a management that underestimates the limitations of software solutions (especially ones that need to ALWAYS make smart decisions) and way over-estimates the speed at which they *should* be implemented, and it begins to sound a bit like Boeing.