Go Back  PPRuNe Forums > Flight Deck Forums > Rumours & News
Reload this Page >

MAX’s Return Delayed by FAA Reevaluation of 737 Safety Procedures

Rumours & News Reporting Points that may affect our jobs or lives as professional pilots. Also, items that may be of interest to professional pilots.

MAX’s Return Delayed by FAA Reevaluation of 737 Safety Procedures

Old 5th Aug 2019, 22:23
  #1781 (permalink)  
 
Join Date: Jan 2008
Location: Medically Grounded
Posts: 95
Back in ancient times I designed memory systems to be resistant to bit flipping involving cosmic rays. It was in conjunction with a computer that used a 286 processor, the same one in use in the suspect flight systems. Our solution was to use error detecting and correcting memory. This architecture used extra memory bits that would allow any single bit error in a memory word to be corrected on the fly. The technology was mature at the time the flight control computers were developed. Does anyone know if memory correction technology was used on the Boeing flight control computers? If so it would rule out random bit flips as an error condition.
Piper_Driver is offline  
Old 5th Aug 2019, 23:17
  #1782 (permalink)  
 
Join Date: May 2008
Location: denmark
Posts: 1
Originally Posted by Piper_Driver View Post
Our solution was to use error detecting and correcting memory. This architecture used extra memory bits that would allow any single bit error in a memory word to be corrected on the fly.
…. Does anyone know if memory correction technology was used on the Boeing flight control computers? If so it would rule out random bit flips as an error condition.
No, this only protects against bit-flips in memory, there is also a risk that the CPU registers get corrupted.
For this you need two ordinary CPU’s, or a lock-step CPU like TMS570.
A lock-step CPU has Error CorreCtion on memory, and two CPU’s running as one.
The second CPU runs the same machine code instructions as the first one, but one clock cycle delayed.
HighWind is offline  
Old 5th Aug 2019, 23:24
  #1783 (permalink)  
 
Join Date: Dec 2015
Location: Cape Town, ZA
Age: 58
Posts: 415
Originally Posted by Piper_Driver View Post
Back in ancient times I designed memory systems to be resistant to bit flipping involving cosmic rays. It was in conjunction with a computer that used a 286 processor, the same one in use in the suspect flight systems. Our solution was to use error detecting and correcting memory. This architecture used extra memory bits that would allow any single bit error in a memory word to be corrected on the fly. The technology was mature at the time the flight control computers were developed. Does anyone know if memory correction technology was used on the Boeing flight control computers? If so it would rule out random bit flips as an error condition.
Please read the link quoted above, it gives a lot of useful details:
Originally Posted by BDAttitude View Post
http://www.cs.toronto.edu/~bianca/pa...gmetrics09.pdf
For those interested in memory corruption.
Specifically:
There are error correction schemes that can detect and correct single bit flips (and schemes for multiple bit errors).
DRAM errors come in many types, some correctable, others not, and no scheme is foolproof.
In some cases there may be permanently stuck bits (on or off).
There is a substantial correlation between errors, such that they are neither random nor independent.
Cosmic rays are only one potential cause, simple ageing of hardware is also a factor.

Given these facts, it seems unreasonable to rely on statistical improbability, to excuse a single point of failure, with catastrophic consequences.

All of the news reports indicate that Boeing have accepted the FAA ruling on this. IMO further argument is futile, though more details might be enlightening.
GordonR_Cape is offline  
Old 6th Aug 2019, 00:27
  #1784 (permalink)  
 
Join Date: May 2011
Location: NEW YORK
Posts: 567
Before everyone gets side tracked on fixing hugely unlikely problems such as 5 bits flipping exactly wrong, should one not focus on the elephants in the room, such as the trim wheel that is too hard a to turn and too slow to act or the prospect of sensor failure/misinstallation/miswiring etc?
I don't believe there is any indication that either crash was due to a flight computer upset, so it is irrelevant whether the flight computer will be improved until the rest of the system has been brought up to snuff.
Yet there is thus far no substantial public comment on the proposed fixes of the lethal defects demonstrated to exist.
If Boeing wants to ensure the plane does not return to service, they are going about it the right way.
etudiant is offline  
Old 6th Aug 2019, 02:48
  #1785 (permalink)  
 
Join Date: Mar 2015
Location: antipodies
Posts: 62
as far as i am concerned the "elephant in the room" has been so long obscured that it has been forgotten about ... that is the state of alarms and displays when computer gives up. why is no one talking about this? why is this not regulated? this matter underlies so many major accidents both A and B and yet it is not being addressed. there NEEDS to be a STANDARD display and alarm state for computer disconnect . it is ridiculous to expect pilots to troubleshoot IT systems before they can start on aviate navigate communicate. how is it possible that aircraft are allowed to dump control without silencing every alarm that might possibly be spurious and blanking every display that may be incorrect. what is going on !
phylosocopter is online now  
Old 6th Aug 2019, 02:55
  #1786 (permalink)  
 
Join Date: Jun 2019
Location: VA
Posts: 210
Interesting technical discussion as to how the faults happen but as an operator I'm more interested in what I'm likely to see if the FCC starts going sideways and what I can do about it.

Okay first stipulating that the recent test failure had nothing to do with what happened to the accident aircraft, it does bring up an interesting issue. The test simulated a runaway trim with the A/P engaged. The first sign of this problem would likely be a "Stab Out of Trim" light illuminating on the forward instrument panel. This light indicates that the elevator deflection has exceeded a certain amount because the stabilizer is not properly trimmed. The structure of the non-normal checklist seems to assume that the problem may be that the A/P is not trimming sufficiently for the airspeed changes and NOT that there is a runaway trim. Specifically, if the trim wheel is moving the checklist basically says do nothing because it assumes that the A/P is catching up with the trim changes. Otherwise if the trim is not moving, then disengage the A/P and trim with the yoke switches. This action assumes the A/P was not trimming when it should have, like a change in speed, turn, etc., rather than trimming when it shouldn't have as in a runaway trim.

Thus the current procedures actually lead down the path of not intervening if the trim wheel is moving with the A/P engaged and a "Stab Out of Trim" light illuminated which would actually delay the pilot response to a runaway trim. At some point the pilot would have to make the determination that it is actually a runaway OR the A/P will eventually disconnect with a gross out of trim condition. I could see how this could create a potentially hazardous condition that would actually be harder to recover from than a runaway with the A/P off. Perhaps this NNC now needs to be reevaluated.
Tomaski is offline  
Old 6th Aug 2019, 02:57
  #1787 (permalink)  
 
Join Date: Jun 2019
Location: VA
Posts: 210
Originally Posted by phylosocopter View Post
as far as i am concerned the "elephant in the room" has been so long obscured that it has been forgotten about ... that is the state of alarms and displays when computer gives up. why is no one talking about this? why is this not regulated? this matter underlies so many major accidents both A and B and yet it is not being addressed. there NEEDS to be a STANDARD display and alarm state for computer disconnect . it is ridiculous to expect pilots to troubleshoot IT systems before they can start on aviate navigate communicate. how is it possible that aircraft are allowed to dump control without silencing every alarm that might possibly be spurious and blanking every display that may be incorrect. what is going on !
^^^^^^^^^^^^
Absolutely needs to be addressed. Nothing more annoying than a false alert that cannot be silenced.
Tomaski is offline  
Old 6th Aug 2019, 07:48
  #1788 (permalink)  
 
Join Date: Dec 2002
Location: UK
Posts: 1,892
#1786, “At some point the pilot would have to make the determination that it is actually a runaway…”
This is central to the concern about piloting contributions in alleviating malfunctions.
How is that point determined, time-wise how long, what is the severity of the malfunction during that period, and by what means is the failure detected, explicit alerting, deduction, knowledge, … experience.
Training does not guarantee that the required activity will be recalled or actioned, particularly with rare and surprising situations.
It is difficult to change the human condition, thus eliminate the extreme situations which pilots could be exposed to.
safetypee is offline  
Old 6th Aug 2019, 08:40
  #1789 (permalink)  
 
Join Date: Jul 2019
Location: Berlin
Posts: 6
This might be a stupid question, anyways: As far as I understand the two FCCs of any 737 in service do not control each other. Boeing now wants to change this, so that both FCCs are constantly checking each other. If so, won´t they have twice the workload, respectively: Won´t they need to be checked / serviced twice as often as before (twice the running time)?

There are so many more aspects which would scare me, just because they now need to change a proofed system. And changing things always inhibits the danger of new faults.
Seamless is offline  
Old 6th Aug 2019, 08:45
  #1790 (permalink)  
 
Join Date: Aug 2017
Location: London
Posts: 86
Originally Posted by phylosocopter View Post
as far as i am concerned the "elephant in the room" has been so long obscured that it has been forgotten about ... that is the state of alarms and displays when computer gives up. why is no one talking about this? why is this not regulated? this matter underlies so many major accidents both A and B and yet it is not being addressed. there NEEDS to be a STANDARD display and alarm state for computer disconnect . it is ridiculous to expect pilots to troubleshoot IT systems before they can start on aviate navigate communicate. how is it possible that aircraft are allowed to dump control without silencing every alarm that might possibly be spurious and blanking every display that may be incorrect. what is going on !
You have hit a particularly important nail right on the head Sir. Ever seen any stage magicians work? Everything they do is misdirection. They can fool people right up close to them, or even an entire audience of hundreds. Humans will always fall for it. We are terrible at noticing the gorrilla in the room.

With ET and Lion Air, the misdirection was the alarms and stick shaker. While MCAS sneakily ran the trim down in the background. If the computers had dumped control with a sensible, calm "AOA disagree" message, I think there's a good chance those crews would have saved the aircraft.

This does not absolve Boeing in any way of responsibility for the MCAS (and other systems) design and implementation shambles. But it is a very good point that phylosocopter makes, and one that urgently needs addressing.
PerPurumTonantes is offline  
Old 6th Aug 2019, 12:01
  #1791 (permalink)  
 
Join Date: Jun 1999
Location: Queensland
Posts: 383
Is anybody actually considering the full outcome of the MAX never returning to service?
autoflight is offline  
Old 6th Aug 2019, 12:09
  #1792 (permalink)  

Keeping Danny in Sandwiches
 
Join Date: May 1999
Location: UK
Age: 71
Posts: 1,274
Originally Posted by autoflight View Post
Is anybody actually considering the full outcome of the MAX never returning to service?
Airbus perhaps?
sky9 is offline  
Old 6th Aug 2019, 12:29
  #1793 (permalink)  
 
Join Date: Mar 2006
Location: Vance, Belgium
Age: 57
Posts: 165
Originally Posted by etudiant View Post
Before everyone gets side tracked on fixing hugely unlikely problems such as 5 bits flipping exactly wrong,
If you re-read the Seattle Times article, you'll notice that it is only 2 bits flipping that are required for enacting the scenario.
Five bits have been actually flipped because this is the standard procedure for this category of tests ; flipping what is considered the most extreme and improbable data corruption : 5 bits simultaneously.
Luc Lion is offline  
Old 6th Aug 2019, 12:29
  #1794 (permalink)  
 
Join Date: Jun 2008
Location: Cambridge UK
Posts: 148
Originally Posted by GordonR_Cape View Post
AFAIK the dual FCC redesign on the MAX, would revert to the NG (grandfather) scenario with the trim wheels. What the fix should do, is stop any computer generated runaway trim (whether from MCAS or autopilot).
As SLF and retired s/w engineer I find the discussion of high-reliability computer systems interesting, but feel that in this case it is misplaced.
MCAS's use of trim to provide feel may have "seemed like a good idea at the time", a cheap and cheerful sticking plaster. But once you consider
the costs of implementing it to certifiable levels of safety it's simply a blind ally.

If you don't use trim to solve a feel requirement, them MCAS triggered trim runaway is impossible. Use a stick-pusher type mechanism and
surely inappropriate MCAS activation becomes a minor embarrassment. (All the crews seem to have managed the situation for several
minutes, if all they had to do was pull back strongly on the stick...)

Regards, Peter

PS Of course this would still leave a few issues that need attention, such as:
- Due process for any criminal/professional derelictions of duty.
- Reducing the myriad of warnings triggered by a failing AoA probe, or at least simplifying their handling.
- Telling the pilots about MCAS (and providing at least minimally adequate training).
- Fix the non-functioning AoA-disagree warning (and ensure that maintenance protocols would actually test that it works?).
- Discover why MAX AoA probes seem to be failing so frequently (and if this high rate invalidates any probability-based safety assessments).
- Fix the emergency-trim system so it can be operated in emergencies.

PPS If you want to argue that the MAX need high-reliability computers to run any software handling of the stab, surely this argument
also applies to other variants of the 737.
Peter H is offline  
Old 6th Aug 2019, 12:57
  #1795 (permalink)  
 
Join Date: May 2010
Location: Boston
Age: 68
Posts: 430
Originally Posted by Luc Lion View Post
If you re-read the Seattle Times article, you'll notice that it is only 2 bits flipping that are required for enacting the scenario.
Five bits have been actually flipped because this is the standard procedure for this category of tests ; flipping what is considered the most extreme and improbable data corruption : 5 bits simultaneously.
Also before getting deep into probabilities of N bits flipping it seems to me that the '5 bit flip' test is also a rational stand in for "something happens we have not though about",

In other words it is impossible to contemplate model and test every failure path but it is still vital to know outcome of failures no matter what triggering event.
There could be any number of paths that result in the Seattle times scenario most of which would be hard or impossible to predict in advance.

---
On the dual FCC fix:
Reading between the lines I sense that this might be implemented with simple "both say move or nothing happens" logic, either internal or could even be done with external HW (relay or other) close to the trim motor inputs.
This would be a lot less work than truly coupling the 2 in a high reliability cross checking mode.


MurphyWasRight is online now  
Old 6th Aug 2019, 13:33
  #1796 (permalink)  
 
Join Date: Jul 2019
Location: Mass
Posts: 21
Originally Posted by Luc Lion View Post
If you re-read the Seattle Times article, you'll notice that it is only 2 bits flipping that are required for enacting the scenario.
Five bits have been actually flipped because this is the standard procedure for this category of tests ; flipping what is considered the most extreme and improbable data corruption : 5 bits simultaneously.
That's not what the article says:

"For these simulations, the five bits flipped were chosen in light of the two deadly crashes to create the worst possible combinations of failures to test if the pilots could cope. For these simulations, the five bits flipped were chosen in light of the two deadly crashes to create the worst possible combinations of failures to test if the pilots could cope."

"In one scenario, the bits chosen first told the computer that MCAS was engaged when it wasn’t. This had the effect of disabling the cut-off switches inside the pilot-control column, . . . A second bit was chosen to make the horizontal tail, also known as the stabilizer, swivel upward uncommanded by the pilot, which has the effect of pitching the plane’s nose down. Other bits were flipped to add three more complications.'

So the article clearly says that all five bits were specifically selected to create the simulated scenario. The three bits were not selected at random.

What is the basis for your statement that flipping five bits "is the standard procedure for this category of tests"?

As for MurphyWasRight's statement that the five-bit failure was a "rational stand in for 'something happens we have not though about'," my understanding is that the regulatory issue is that you cannot have a single failure that is catastrophic, and the FAA considered five simultaneous bit flips a single failure. If the only way to create the Seattle Times-reported scenario is to fail multiple components, then is it still a single failure?
Notanatp is offline  
Old 6th Aug 2019, 13:48
  #1797 (permalink)  
 
Join Date: Feb 2015
Location: The woods
Posts: 1
Originally Posted by Peter H View Post
As SLF and retired s/w engineer I find the discussion of high-reliability computer systems interesting, but feel that in this case it is misplaced.
MCAS's use of trim to provide feel may have "seemed like a good idea at the time", a cheap and cheerful sticking plaster. But once you consider
the costs of implementing it to certifiable levels of safety it's simply a blind ally.

If you don't use trim to solve a feel requirement, them MCAS triggered trim runaway is impossible. Use a stick-pusher type mechanism and
surely inappropriate MCAS activation becomes a minor embarrassment. (All the crews seem to have managed the situation for several
minutes, if all they had to do was pull back strongly on the stick...)

Regards, Peter

PS Of course this would still leave a few issues that need attention, such as:
- Due process for any criminal/professional derelictions of duty.
- Reducing the myriad of warnings triggered by a failing AoA probe, or at least simplifying their handling.
- Telling the pilots about MCAS (and providing at least minimally adequate training).
- Fix the non-functioning AoA-disagree warning (and ensure that maintenance protocols would actually test that it works?).
- Discover why MAX AoA probes seem to be failing so frequently (and if this high rate invalidates any probability-based safety assessments).
- Fix the emergency-trim system so it can be operated in emergencies.

PPS If you want to argue that the MAX need high-reliability computers to run any software handling of the stab, surely this argument
also applies to other variants of the 737.
Been said many times on these threads Peter. The logic of a separate dedicated feel trim compensator is indisputable.
But the software specialists are keeping their claws in deep.
Way back, British Airways helped certify the first European Cat 111 system. It took many actual approaches and a lot of flight hours until the reliability was proved (and incidentally held many folk up behind their min speed approaches) and the whole thing ran on relays.
Surprising what can/could be achieved without software - even in this day and age...
bill fly is offline  
Old 6th Aug 2019, 15:49
  #1798 (permalink)  
 
Join Date: Aug 2013
Location: Washington.
Age: 69
Posts: 494
Originally Posted by bill fly View Post


Been said many times on these threads Peter. The logic of a separate dedicated feel trim compensator is indisputable.
But the software specialists are keeping their claws in deep.
Way back, British Airways helped certify the first European Cat 111 system. It took many actual approaches and a lot of flight hours until the reliability was proved (and incidentally held many folk up behind their min speed approaches) and the whole thing ran on relays.
Surprising what can/could be achieved without software - even in this day and age...
Bottomline? MCAS could have been designed with satisfactory architecture, integrity and reliability. It was not. A CAT I Approach system cannot be made CAT III capable, certainly not by a few software changes or even the addition of CPU or sensor. It has to be designed for the task from the ground up, with qualified subsystems, redundancies, and appropriate design assurance levels of software and complex hardware. Not implying that MCAS is equivalent to a CAT III autoland system, either, but neither will MCAS become what it must be by a few tweaks.
GlobalNav is offline  
Old 6th Aug 2019, 18:03
  #1799 (permalink)  
Thread Starter
 
Join Date: Apr 2015
Location: Under the radar, over the rainbow
Posts: 483
Originally Posted by Notanatp View Post
As for MurphyWasRight's statement that the five-bit failure was a "rational stand in for 'something happens we have not though about'," my understanding is that the regulatory issue is that you cannot have a single failure that is catastrophic, and the FAA considered five simultaneous bit flips a single failure. If the only way to create the Seattle Times-reported scenario is to fail multiple components, then is it still a single failure?
Well, it's fairly likely that five simultaneous bit flips would have a single cause (cosmic rays/EMP), so yes, it should be considered a single failure. Pretty rare (at least), of course.

OldnGrounded is offline  
Old 6th Aug 2019, 18:45
  #1800 (permalink)  
 
Join Date: May 2008
Location: denmark
Posts: 1
Originally Posted by Peter H View Post
If you don't use trim to solve a feel requirement, them MCAS triggered trim runaway is impossible.
PPS If you want to argue that the MAX need high-reliability computers to run any software handling of the stab, surely this argument
also applies to other variants of the 737.
I you remove the code for MCAS then it can't generate a runaway
But the Flight Control System have been connected to the trim for decades, and a bit-flip in other functions of the Flight Control Computers might be able to generate a 'continues' runaway. (Way easier to diagnose quickly, than a intermittent runaway)
I see MCAS as a change that changed the reliability of the FCS from being several magnitudes better than specified by DAL C, to about the limit of DAL C.

You can't complain when a DAL C design, changes from e.g. DAL B to DAL C reliability following a design change. (The same could have happened due to e.g. a die shrink )
Using one AoA sensor might sufficient for DAL C, but not for the required DAL A.
HighWind is offline  

Thread Tools
Search this Thread

Contact Us Archive Advertising Cookie Policy Privacy Statement Terms of Service

Copyright © 2018 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.