PPRuNe Forums - View Single Post - U.K. NATS Systems Failure

15th Mar 2024, 11:49

#404 (permalink)

CBSITCB

Join Date: Mar 2016

Location: Location: Location

Posts: 59

Likes: 1

Received 0 Likes on 0 Posts

There are many errors, obfuscations, and contradictions in this Interim Report. I hope some of the comments in this whole thread get back to the panel so they can address them in the final report, and not sweep them under the carpet. I am only concerned with the technical aspects relating to the failure.

1 – The very first substantive sentence in the report shows a lack of understanding of the technical ‘system’. “The cause of the failure of the NERL flight plan processing system (FPRSA-R)”. The flight plan processing system is the NAS FPPS – not FPRSA-R.

2 – “Critical exception errors being generated which caused each system to place itself into maintenance mode”. Is there really a documented and intentional FPRSA-R Maintenance Mode? Or is it just a euphemism for “it crashed” (ie, encountered a situation it had not been programmed for and executed some random code or a catch-all “WTF just happened?” dead stop).

Such euphemisms are not uncommon. The NAS FPPS has (or at least did have) a fancy-sounding documented state called Functional Lapse of the Operational Program, or FLOP. Of course, we operational engineers just said it had crashed. More recently there is SpaceX’s “Rapid Unscheduled Disassembly”.

3 – If there was an intentional Maintenance State why on earth did the system allow both processors to deliberately enter that state at the same time? Even so, as it was foreseeable there should have been a documented procedure to recover from it.

4 – IFPS adds supplementary way points, please explain why. Presumably, inter alia, to identify boundary crossing points. If so why does FRPRSA-R not identify it as an exit point if that is what it was inserted for?

5 – “Recognising this as being not credible, a critical exception error was generated, and the primary FPRSA-R system, as it is designed to do, disconnected itself from NAS and placed itself into maintenance mode”. So it recognised the problem, and was designed to react to it, yet didn’t output a message such as “Non-credible route for FPXXXX at DVL”. Pull the other one – it crashed!

6 – “Processed flight data is presented to NAS four hours in advance of the data being required” and “The repeated cycle that occurred each time a connection was re-established between the AMS-UK and FPRSA-R ended with the assistance of system supplier, Frequentis, four hours after the event.”

So the Frequentis guys fixed the problem at the exact time the FP was due to be activated – what a coincidence! Could it be that the AMS-UK system recognized that the errant FP was now history (ie stale) and purged it from its Pending Queue without human intervention?

7 – At para 2.22 the report states Frequentis fixed the problem. In the timeline it states it was Comsoft that fixed the problem.

8 – “Adherence to escalation protocols meant that the assistance of Frequentis was not sought for more than four hours after the initial failure.” But you’ve already stated that Frequentis fixed the problem at the four hour mark??? And they would have needed time to diagnose the problem.

9 – At elapsed time 00:27 “Level 1 engineer attempts reboot FPRSA-R software”. Attempts? Presumably successfully as the report says it continues to fail when it repeatedly gets the errant FP. But the report says it needs a Level 2 engineer to do a restart. What is a reboot if its not a restart?

Reply