PPRuNe Forums - View Single Post - Ethiopian airliner down in Africa

20th Apr 2019, 20:55

#4165 (permalink)

TryingToLearn

Join Date: Mar 2019

Location: Bavaria

Posts: 20

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by MemberBerry

Just like there seems to be a deficit of pilots in the aviation world, I think that generally there is a deficit of good software developers, and it's getting worse. I think the quality of software took a nosedive during the last decade. Software from a decade ago was way more polished than what I see today, and this is very frustrating.

Please, don't make the mistake and put your best programmers on functional safety coding. Trust me, they will quit!

That's the whole point about safety: Even if a brain-dead, a greenpeace airplane-hating terrorist or an ape would program the code, you would find out before the first passenger boards the plane. There is a complete description of the functionality within several layers of requirements and every requirement has it's own validation criterion, test cases and in the end you have 100% test coverage on software module, software system, system, item (flight control) and vehicle/airplane/machine/... level.
As sad as it seems, the MCAS software did work exactly as specified. No programmer to blame.

At the time of the initial safety analysis someone defined 0.6° as the max. impact this system is allowed to have.
At this time someone wrote or should have written a requirement specifying that MCAS must never turn the trim by more than this (together with a minimum time in case of repeated action).
Every final software, every MCAS SW-module, every configuration should have been checked against this requirement and validation criterion by appropriate automated and documented test cases.
That's how safety works!

Now how did it happen? I don't know:
a) They knew but management forced them -> safety culture problem, on purpose...
b) They never wrote the tests -> process problem, nobody should release something for series without finishing all test runs
c) They ignored the test results -> safety culture problem, process problem
d) They changed the requirement after changing the SW but did not touch the safety analysis -> tracability problem, that's what ALM software systems are made for. Of course you have a problem if 99% of your requirements are blueprints from 1968... (-> grandfather rights, which should not apply to anything which is not 100% proven in use within exactly the old configuration)
e) The trim motor was supposed to turn slowly (10 sec -> 0.6°), instead it turned 4 times faster -> Item level testing, hardware in the loop testing?
f) They never wrote down their analysis assumptions as requirements -> fire the system requirement engineers, not the programmer. Ask yourself what your reviews are worth, are your reviewers just interested in the cookies/donuts?
etc.

Safety programming is brain-dead translation of UML into code. There is absolutely no room for creativity or interpretation. The main job within safety is sitting (+thinking, not just physical presence) in reviews and questioning every single line somebody wrote as a requirement (on functional, system, SW system or SW module level) several times. Do never rely on the genius of a requirement author, programmer or test engineer. Everything goes thru several reviews, accessments etc.

Second problem, which worries me the most, is the use of just one input. There are 2 sensors, use them! Relying one only one probe with very low diagnostic coverage is just bad. Safety-critical systems should be single-point-fault tolerant. But this is also a technical system requirement. Such a decision is made 6 months before coding. Nobody questioned this?
If this was, as claimed earlier within the thread, a commercial decision to avoid training on a simple diagnostic message, then the safety culture went down the drain, flushed by commercial interest (since it was also claimed that the safety level was estimated high enough to require redundancy). Such a finding would put a question mark on every difference between NG and MAX.
The very sad thing is that this AoA sensor comparison seems to be implemented and working, but sold as an extra. Maybe Boeing just wanted to earn extra money but on the other hand there is this strange coincidence that this feature compromises the sales argument ('no training') on one hand and does rescue airplanes on the other ('If you buy the sensor comparison for peanuts, it's your fault if you need to train your pilots'). -> Before talking about 'better' US/European pilots, better check if they simply had the option installed and where surprisingly not overwhelmed by a simple message instead of a spinning trim wheel

Safety engineers are not very popular within companies because this process takes time and often delays a development a lot if done as required. There is no space for agile programming, scrub etc. It's a V-Model lifecycle at it's best.
Guess what Boeing didn't have during the MAX development?

Oh, and one fun fact about european rail safety: Signals are a fail-safe system. In case something goes wrong, all signals are red, every train stops. Then there are operator which can manually override signals after making sure (by phone) that only one train is on one rail a a time. Fine...
Still, the automatic system and collision avoidance have availability requirements, they are required to work most of the time. The reason is simple: humans are error-prone, in case the signals would be operated manually for more than a very limited time, the average error-probability would be too high.
An autopilot is far more reliable than a pilot, pilots would make more mistakes (That's why there are two). Still the pilot has to be trained to every situation and manual flying. Maybe (non-pilot assumption) there is just not enough costly simulator training? Why would you (risk to) do the training with passengers on board?
Oh, I forgot, simulator time is expensive...