U.K. NATS Systems Failure
Join Date: Oct 2004
Location: Southern England
Posts: 486
Likes: 0
Received 0 Likes
on
0 Posts
It seems that internationally there is no requirement for waypoints to be uniquely named if I've understood
https://aviation.stackexchange.com/q...supposed-to-be
correctly
What can possibly go wrong? It's not like it is an unknown issue.
https://aviation.stackexchange.com/q...supposed-to-be
correctly
What can possibly go wrong? It's not like it is an unknown issue.
Join Date: Oct 2004
Location: Southern England
Posts: 486
Likes: 0
Received 0 Likes
on
0 Posts
This answer (aka 'excuse') has been repeated so often, ie 'testing's soooo complicated, we couldn't possibly capture every permutation (which actually means not process an incorrectly formatted message). Really??
'at what point do you stop?'. Er, when you've built one fall back, and preferably two - if, as you say, this is really a mission critical system.
'at what point do you stop?'. Er, when you've built one fall back, and preferably two - if, as you say, this is really a mission critical system.
Thanks - but in my, admittedly simplistic, world "I've found a bad thing" should lead automatically to a rejection, no?
Ecce Homo! Loquitur...
Thread Starter
I don5 see why this is such a problem, in software terms.
Assuming the database holds all the waypoints - including duplicates - it would seem simple to have the software select the one closest to the next waypoint in the flight plan. If those that exist are all so far apart I can’t see a flight plan where there won’t be multiple other waypoints between the two instances.
They must do something similar to identify which of the duplicates to use even when the identifier only appears once in a flight plan.
Assuming the database holds all the waypoints - including duplicates - it would seem simple to have the software select the one closest to the next waypoint in the flight plan. If those that exist are all so far apart I can’t see a flight plan where there won’t be multiple other waypoints between the two instances.
They must do something similar to identify which of the duplicates to use even when the identifier only appears once in a flight plan.
Join Date: Aug 2003
Location: FR
Posts: 234
Likes: 0
Received 0 Likes
on
0 Posts
https://publicapps.caa.co.uk/modalap...etail&id=12321
CAP2582: NERL Major Incident Preliminary Report
https://publicapps.caa.co.uk/docs/33...y%20Report.pdf
CAP2582: NERL Major Incident Preliminary Report
https://publicapps.caa.co.uk/docs/33...y%20Report.pdf
Join Date: Oct 2004
Location: Southern England
Posts: 486
Likes: 0
Received 0 Likes
on
0 Posts
The problem is the real world isn't simplistic. FPRS-A will encounter bad things in its processing. Some of those it will fix automatically, some it will understand why its bad, flag that for the attention of a human operator and carry on processing everything else. There may be many different cases of that all which will have to be programmed. In cases where the system knows it's bad but not why it usually means that the designer of the software didn't envisage that specific event happening. The problem with systems of this type is when that happens the safest thing to do is to stop everything so the global exception event will normally do that.
Join Date: Oct 2004
Location: Southern England
Posts: 486
Likes: 0
Received 0 Likes
on
0 Posts
So, it was not a failiure that depended on huge amount of variable / dynamic data to happen. The exception would have occurred in isolation even in an otherwise empty airspace. It seems that an extra check could be carried out before the problematic data is allowed into the "live" real-time system.
The problem is the real world isn't simplistic. FPRS-A will encounter bad things in its processing. Some of those it will fix automatically, some it will understand why its bad, flag that for the attention of a human operator and carry on processing everything else. There may be many different cases of that all which will have to be programmed. In cases where the system knows it's bad but not why it usually means that the designer of the software didn't envisage that specific event happening. The problem with systems of this type is when that happens the safest thing to do is to stop everything so the global exception event will normally do that.
Join Date: Oct 2004
Location: Southern England
Posts: 486
Likes: 0
Received 0 Likes
on
0 Posts
Reading between the lines in Section 5 it looks as though they thought that whatever happened to FPRSA they would be able to fix it before that failure became a business issue. They were obviously wrong but sadly exactly why isn't in this report and left for later.
Join Date: Nov 2004
Location: EGCC
Age: 56
Posts: 69
Likes: 0
Received 0 Likes
on
0 Posts
Doubling-back?
"Once the ADEXP file had been received, the FPRSA-R software commenced searching for the UK airspace entry point in the waypoint information per the ADEXP flight plan, commencing at the first line of that waypoint data. FPRSA-R was able to specifically identify the character string as it appeared in the ADEXP flight plan text. Having correctly identified the entry point, the software moved on to search for the exit point from UK airspace in the waypoint data. Having completed those steps, FPRSA-R then searches the ICAO4444 section of the ADEXP file. It initially searches from the beginning of that data, to find the identified UK airspace entry point. This was successfully found. Next, it searches backwards, from the end of that section, to find the UK airspace exit point. This did not appear in that section of the flight plan so the search was unsuccessful. As there is no requirement for a flight plan to contain an exit waypoint from a Flight Information Region (FIR) or a country’s airspace, the software is designed to cope with this scenario. Therefore, where there is no UK exit point explicitly included, the software logic utilises the waypoints as detailed in the ADEXP file to search for the next nearest point beyond the UK exit point. This was also not present. The software therefore moved on to the next waypoint. This search was successful as a duplicate identifier appeared in the flight plan. Having found an entry and exit point, with the latter being the duplicate and therefore geographically incorrect, the software could not extract a valid UK portion of flight plan between these two points. This is the root cause of the incident."
Did NAS simply find the duplicate waypoint and double-back the entire FPL to the ENTRY waypoint, from which it couldn't determine an onwards route?
I worked with NAS when I was in FPRS and it was always a mither. Given that it was one FPL out of 5,000-odd, if it WAS doubling-back surely it would just go round and round in circles indefinitely... with bare effect on the majority of unaffected sectors.
Did NAS simply find the duplicate waypoint and double-back the entire FPL to the ENTRY waypoint, from which it couldn't determine an onwards route?
I worked with NAS when I was in FPRS and it was always a mither. Given that it was one FPL out of 5,000-odd, if it WAS doubling-back surely it would just go round and round in circles indefinitely... with bare effect on the majority of unaffected sectors.
Last edited by SATCO; 6th Sep 2023 at 09:09. Reason: Addendum
Join Date: Oct 2004
Location: Southern England
Posts: 486
Likes: 0
Received 0 Likes
on
0 Posts
That explanation in the report isn't clear but it seems to have found the Entry Point but not the Exit Point. If it can't find the Exit Point it apparently then searches for Waypoints beyond the border, presumably stepping out a step each time it doesn't find any of those in the plan. It found one of those in the plan but, because the waypoint in the plan was a duplicate of one surrounding the UK FIR, became confused because that waypoint wasn't correct in a geographical context.
Any system that just goes round in circles would be an issue, if NAS did nothing else would get processed so it would affect everybody. NAS will only go around the loop a limited number of time before stopping itself. Stopping the whole system is usually preferable to being stuck in a loop which is why most ATC systems will do that as the last resort. FPRSA doesn't seem to have been stuck in a loop but still applied that approach.
Last edited by eglnyt; 6th Sep 2023 at 10:01. Reason: respond to addendum
Join Date: Nov 2004
Location: EGCC
Age: 56
Posts: 69
Likes: 0
Received 0 Likes
on
0 Posts
It wasn't NAS, it's a completely different system.
That explanation in the report isn't clear but it seems to have found the Entry Point but not the Exit Point. If it can't find the Exit Point it apparently then searches for Waypoints beyond the border, presumably stepping out a step each time it doesn't find any of those in the plan. It found one of those in the plan but, because the waypoint in the plan was a duplicate of one surrounding the UK FIR, became confused because that waypoint wasn't correct in a geographical context.
That explanation in the report isn't clear but it seems to have found the Entry Point but not the Exit Point. If it can't find the Exit Point it apparently then searches for Waypoints beyond the border, presumably stepping out a step each time it doesn't find any of those in the plan. It found one of those in the plan but, because the waypoint in the plan was a duplicate of one surrounding the UK FIR, became confused because that waypoint wasn't correct in a geographical context.
And the report is indeed not terribly clear, I couldn't work out whether it was looking FORWARDS from the last known or looking BACKWARDS towards it, thus my question about 'doubling-back.' Easy to see therefore if it was reading forwards how it might find something and then realise upon finding it that it had already found it once before, in this case at the start of route. Still not sure how it would impact unaffected sectors so broadly on the basis of a single aspect in a single flight plan.
Join Date: Sep 2007
Location: UK
Posts: 78
Likes: 0
Received 0 Likes
on
0 Posts
What I still don’t understand is just because the FDPS couldn’t deal with 1 FPL out of the thousands in the system at that time, it decided that forcing the whole system to revert to manual mode is preferable to just displaying the potentially incorrect flight data to ATCOs for that 1 flight. Bearing in mind that ATCOs safely deal with dodgy flight plans or flights with no FPL tactically on a daly basis anyway.
Join Date: Oct 2004
Location: Southern England
Posts: 486
Likes: 0
Received 0 Likes
on
0 Posts
What I still don’t understand is just because the FDPS couldn’t deal with 1 FPL out of the thousands in the system at that time, it decided that forcing the whole system to revert to manual mode is preferable to just displaying the potentially incorrect flight data to ATCOs for that 1 flight. Bearing in mind that ATCOs safely deal with dodgy flight plans or flights with no FPL tactically on a daly basis anyway.
However in context you have at least 30 minutes to fix it before the data starts to become stale and probably 4 hours before you really start to impact on the dependent ATCO facing systems so that system design might be appropriate if you can deal with it quickly. In the event we know that they didn't but exactly why is also left until later.
Join Date: Aug 2023
Location: England
Posts: 7
Likes: 0
Received 0 Likes
on
0 Posts
Section 3.2 says “If the submitted flight plan is accepted by IFPS (on the continent), i.e. it is compliant with IFPS defined parameters, it will inform the airline that filed the flight plan that it has been accepted. This is sufficient for a flight to depart with local ATC approval. The flight plan will be sent from IFPS to all relevant ANSPs who need to manage the flight. For the UK this data is received at NATS”
I thought NATS had the ability to say to a plane at say Rome airport “I accept your flight plan but don't take off at the requested time but a bit later at xxx” In that way you can fly smoothly straight to your UK airport without say stacking, thus saving you fuel.” But this para is saying it has taken off before NATS gets to approve it. Is what I say a future plan that is not currently implemented (SESAR?)?
Secondly, note that due to national interests, it’s common for many ANSPs to have bespoke systems not found elsewhere. ANSPs don’t like to be told “Buy Joe Blogg’s off the shelf system” if only because they have some legacy systems they need to work around or like to favour their national suppliers. So NAS is bespoke to NATS (I think) for legacy reasons and so needs some special software in FPRSA to interface with it. Some here seem to be blaming NATS for needing a non-standard FPRSA but I suspect no others/few have a standard one either.
I thought NATS had the ability to say to a plane at say Rome airport “I accept your flight plan but don't take off at the requested time but a bit later at xxx” In that way you can fly smoothly straight to your UK airport without say stacking, thus saving you fuel.” But this para is saying it has taken off before NATS gets to approve it. Is what I say a future plan that is not currently implemented (SESAR?)?
Secondly, note that due to national interests, it’s common for many ANSPs to have bespoke systems not found elsewhere. ANSPs don’t like to be told “Buy Joe Blogg’s off the shelf system” if only because they have some legacy systems they need to work around or like to favour their national suppliers. So NAS is bespoke to NATS (I think) for legacy reasons and so needs some special software in FPRSA to interface with it. Some here seem to be blaming NATS for needing a non-standard FPRSA but I suspect no others/few have a standard one either.
... it decided that forcing the whole system to revert to manual mode is preferable to just displaying the potentially incorrect flight data to ATCOs for that 1 flight.
Likely a question for eglynt, the FPRSA-R seems to be planned on falling over, since there is manual input system. Can the 2 work alongside each other? i.e. could the FPRSA-R throw up an error message (rather than full exception) saying "I cannot deal with Flt Plan XYZ, meanwhile I will carry on". The an ATCO can get Flt Plan XYZ and manually input it.
Martin Rolfe seems keen to say this has never happened before in 15 million flights. Someone inclined to believe him might understand that to mean it has flawlessly processed every Flt Plan since 2018 without a single hiccup. From this thread, it seems it has always been temperamental, just the 4 hour buffer has, to date proved adequate to solve the issue.
I couldn't work out whether it was looking FORWARDS from the last known or looking BACKWARDS towards it, thus my question about 'doubling-back.'
I am not entirely convinced by "However, since flight data is safety critical information that is passed to ATCOs the system must be sure it is correct and could not do so in this case. It therefore stopped operating, avoiding any opportunity for incorrect data being passed to a controller. The change to the software will now remove the need for a critical exception to be raised in these specific circumstances." - since the software correctly identified the dodgy Flt Plan, and could just not have passed it on, letting someone manually do it.
My guess is this now becomes the work of spin doctors who can only state "safety - worked as designed" and will require leaks from inside NATS as to how "wonderful" FPRSA-R really was, and whether MR is speaking the truth in that it had never got confused or stopped since 2018?? If it was temperamental, requiring manual interventions, then how these were reported investigated solved would be interesting...
Join Date: Oct 2004
Location: Southern England
Posts: 486
Likes: 0
Received 0 Likes
on
0 Posts
Sorry, I meant FPRSRA, not NAS... but I make no apology for the latter being a menace to this day!
And the report is indeed not terribly clear, I couldn't work out whether it was looking FORWARDS from the last known or looking BACKWARDS towards it, thus my question about 'doubling-back.' Easy to see therefore if it was reading forwards how it might find something and then realise upon finding it that it had already found it once before, in this case at the start of route. Still not sure how it would impact unaffected sectors so broadly on the basis of a single aspect in a single flight plan.
And the report is indeed not terribly clear, I couldn't work out whether it was looking FORWARDS from the last known or looking BACKWARDS towards it, thus my question about 'doubling-back.' Easy to see therefore if it was reading forwards how it might find something and then realise upon finding it that it had already found it once before, in this case at the start of route. Still not sure how it would impact unaffected sectors so broadly on the basis of a single aspect in a single flight plan.
Join Date: Oct 2004
Location: Southern England
Posts: 486
Likes: 0
Received 0 Likes
on
0 Posts
Section 3.2 says “If the submitted flight plan is accepted by IFPS (on the continent), i.e. it is compliant with IFPS defined parameters, it will inform the airline that filed the flight plan that it has been accepted. This is sufficient for a flight to depart with local ATC approval. The flight plan will be sent from IFPS to all relevant ANSPs who need to manage the flight. For the UK this data is received at NATS”
I thought NATS had the ability to say to a plane at say Rome airport “I accept your flight plan but don't take off at the requested time but a bit later at xxx” In that way you can fly smoothly straight to your UK airport without say stacking, thus saving you fuel.” But this para is saying it has taken off before NATS gets to approve it. Is what I say a future plan that is not currently implemented (SESAR?)?
Secondly, note that due to national interests, it’s common for many ANSPs to have bespoke systems not found elsewhere. ANSPs don’t like to be told “Buy Joe Blogg’s off the shelf system” if only because they have some legacy systems they need to work around or like to favour their national suppliers. So NAS is bespoke to NATS (I think) for legacy reasons and so needs some special software in FPRSA to interface with it. Some here seem to be blaming NATS for needing a non-standard FPRSA but I suspect no others/few have a standard one either.
I thought NATS had the ability to say to a plane at say Rome airport “I accept your flight plan but don't take off at the requested time but a bit later at xxx” In that way you can fly smoothly straight to your UK airport without say stacking, thus saving you fuel.” But this para is saying it has taken off before NATS gets to approve it. Is what I say a future plan that is not currently implemented (SESAR?)?
Secondly, note that due to national interests, it’s common for many ANSPs to have bespoke systems not found elsewhere. ANSPs don’t like to be told “Buy Joe Blogg’s off the shelf system” if only because they have some legacy systems they need to work around or like to favour their national suppliers. So NAS is bespoke to NATS (I think) for legacy reasons and so needs some special software in FPRSA to interface with it. Some here seem to be blaming NATS for needing a non-standard FPRSA but I suspect no others/few have a standard one either.
Where flow management is in play it isn't NATS which sorts that out. NATS tells the European System how many flights it can accept in a sector in a proscribed period and the Network Operator sorts out which flights can go with respect to that and any other restrictions elsewhere in Europe.
These systems will have modules which are completely unchanged in every system, modules whose action is changed through adaptation data unique to each ANSP, and modules which have been changed to meet the unique requirements of the ANSP because those requirements couldn't be handled with adaptation. If asked to guess I would expect more of the latter in the NATS system just because NATS is NATS.