PPRuNe Forums - View Single Post - MAX’s Return Delayed by FAA Reevaluation of 737 Safety Procedures
Old 6th Aug 2019, 23:44
  #1806 (permalink)  
fergusd
 
Join Date: Jan 2008
Location: Wintermute
Posts: 76
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Notanatp
Interestingly, none of the reporting has said just how likely the test scenario was to occur in flight, other than to say it was "esoteric," or "theoretical," or "extremely improbable."

The test simulated the effect of 5 bits being simultaneously flipped in the FCC's memory due to cosmic rays or some other unnamed cause. The 5 bits were independent. One bit was flipped to tell the FCC that MCAS was active when it wasn't (this disabled the yoke cut-off switches). Another bit told the FCC to incorrectly issue a nose-down trim command. The other 3 bits weren't described but apparently were necessary to create the runaway trim scenario.

Assuming a 5-bit event, the odds of these 5 specific bits being flipped depends on the size of the memory. If the FCC has 1-megabyte of RAM, then the odds of those particular 5 bits being flipped in a 5-bit event should be on the order 10**-32. I don't know how frequently the FAA estimates a 5-bit event will occur on an aircraft with a 1-meg memory, but even if it approaches 1 (i.e., it can be expected to happen each flight), the likelihood of this particular 5-bit event occurring on any single flight is on the order 10**-32. You could fly all the 737 Max's that will ever be made 3 times a day for a billion years, and you'd still be looking at probabilities on the order of 10**-16.
No competent implementor of high safety software (or hardware) does not use hardware AND software protection against memory corruption (where hardware protection is available - sometimes it is not depending on your hardware limitations), unless the hardware and software on this aircraft are being audited against grandfathered safety standards from the 1960's then the failure you describe must be a failure which would be deemed unacceptable . . .

Bit level corruption in any part of memory would be detected and the corrupt data not acted upon, the action taken when the corruption is detected being defined by the system requirements and the safety rating (largely), e.g. is the system fail safe, fail functional for example. There are many well understood mechanisms which are used to perform this function to varying levels of integrity (because not all safety cases require the most computationally and physically expensive solution).

No competent implementor of high safety software will create the software in which critical data is co-located in the manner described. This kind of design would be explicitly illegal and that set of rules would be strictly enforced, usually both by manual independent review of the software and by automated software analysis. Given that observation the likelyhood of an external stimulus causing this kind of issue must be well outside the statistical risk where it is in any way likely, ever, to happen (not that that would stop the protection being used under ALARP, and bearing in mind memory corruption happens for many reasons which are not as 'sci fi' as high energy particle bit flips (you guys - really - back to reality eh ?))

All of the high safety critical software work I have done assumes data corruption (by whatever means) is possible and must be protected against, in all types of memory, within the processor control registers themselves, within off chip peripheral devices, everything. In the type of software my team creates there is no 'human' to stop the system from killing people. Perhaps my expectations of the quality that the aviation industry claims to enforce is very misplaced . . . from where I'm sitting today it's looking fairly shoddy and second rate at best . . .

The FAA should turn Boeings safety and design records over to a genuinely independent review body who are not part of the aviation cabal to see what they think . . . Clearly the FAA are not capable of this kind of work, and clearly Boeing do not do it (well enough) to be allowed to self certify.

When the Chinook FADEC software was handed over for independent analysis the company doing the analysis thought they had been supplied with the wrong software and stopped analysing it, the quality was unacceptable. The same happened to Toyota, and . . . and . . . and . . .

I'd put a shiny tenner on the table betting that this is exactly what would happen in this case, I might even stretch to a crisp twenty, but that will never happen, Money is more important than peoples lives.

Lastly, the means by which complex software fails are often very, very subtle and complex, and with the greatest respect way, way, way beyond anything the masses on here are even vaguely capable of even conceptualising from what I can see.
fergusd is offline