Go Back  PPRuNe Forums > Flight Deck Forums > Rumours & News
Reload this Page >

U.K. NATS Systems Failure

Wikiposts
Search
Rumours & News Reporting Points that may affect our jobs or lives as professional pilots. Also, items that may be of interest to professional pilots.

U.K. NATS Systems Failure

Thread Tools
 
Search this Thread
 
Old 6th Sep 2023, 08:22
  #261 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 485
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by ExRR
It seems that internationally there is no requirement for waypoints to be uniquely named if I've understood

https://aviation.stackexchange.com/q...supposed-to-be

correctly

What can possibly go wrong? It's not like it is an unknown issue.
Until recently it wasn't an issue as long as the duplicates were far apart. Two things have led to it becoming an issue. Improvements in aircraft flight data management systems which mean they may cover a wider area and much longer flights which, as in this case, mean an aircraft may fly through both. ICAO have been trying to eliminate the duplicates but States are reluctant to change theirs because it means changing maps and often the waypoint name has some local meaning, AVANT for Havant for example. Reading between the lines, possibly wrongly, it looks as though simple duplication wasn't an issue here, duplication where the duplicated waypoint is in a certain place on the route was.
eglnyt is offline  
Old 6th Sep 2023, 08:25
  #262 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 485
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Expatrick
That's the bit I don't understand.
​​​​​​​I suspect probably better written as the system had no specific exception programmed for this case so the "I've found a bad thing I don't understand" catch all others exception was followed.
eglnyt is offline  
Old 6th Sep 2023, 08:27
  #263 (permalink)  
 
Join Date: Aug 2010
Location: UK
Age: 67
Posts: 170
Received 36 Likes on 21 Posts
Originally Posted by Neo380
This answer (aka 'excuse') has been repeated so often, ie 'testing's soooo complicated, we couldn't possibly capture every permutation (which actually means not process an incorrectly formatted message). Really??

'at what point do you stop?'. Er, when you've built one fall back, and preferably two - if, as you say, this is really a mission critical system.
Seems to me you 're confusing software testing (which is what I've tried to describe) with business continuity testing, which I haven't described but which seems to have been sorely lacking.
golfbananajam is offline  
Old 6th Sep 2023, 08:30
  #264 (permalink)  
 
Join Date: Dec 2015
Location: Budapest
Posts: 320
Received 238 Likes on 141 Posts
Originally Posted by eglnyt
I suspect probably better written as the system had no specific exception programmed for this case so the "I've found a bad thing I don't understand" catch all others exception was followed.
​​​​​​​Thanks - but in my, admittedly simplistic, world "I've found a bad thing" should lead automatically to a rejection, no?
Expatrick is online now  
Old 6th Sep 2023, 08:36
  #265 (permalink)  
Ecce Homo! Loquitur...
Thread Starter
 
Join Date: Jul 2000
Location: Peripatetic
Posts: 17,495
Received 1,637 Likes on 749 Posts
I don5 see why this is such a problem, in software terms.

Assuming the database holds all the waypoints - including duplicates - it would seem simple to have the software select the one closest to the next waypoint in the flight plan. If those that exist are all so far apart I can’t see a flight plan where there won’t be multiple other waypoints between the two instances.

They must do something similar to identify which of the duplicates to use even when the identifier only appears once in a flight plan.
ORAC is offline  
Old 6th Sep 2023, 08:36
  #266 (permalink)  
 
Join Date: Aug 2003
Location: FR
Posts: 234
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by ORAC
So, it was not a failiure that depended on huge amount of variable / dynamic data to happen. The exception would have occurred in isolation even in an otherwise empty airspace. It seems that an extra check could be carried out before the problematic data is allowed into the "live" real-time system.
pax2908 is offline  
Old 6th Sep 2023, 08:42
  #267 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 485
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Expatrick
Thanks - but in my, admittedly simplistic, world "I've found a bad thing" should lead automatically to a rejection, no?
The problem is the real world isn't simplistic. FPRS-A will encounter bad things in its processing. Some of those it will fix automatically, some it will understand why its bad, flag that for the attention of a human operator and carry on processing everything else. There may be many different cases of that all which will have to be programmed. In cases where the system knows it's bad but not why it usually means that the designer of the software didn't envisage that specific event happening. The problem with systems of this type is when that happens the safest thing to do is to stop everything so the global exception event will normally do that.
eglnyt is offline  
Old 6th Sep 2023, 08:46
  #268 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 485
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by pax2908
So, it was not a failiure that depended on huge amount of variable / dynamic data to happen. The exception would have occurred in isolation even in an otherwise empty airspace. It seems that an extra check could be carried out before the problematic data is allowed into the "live" real-time system.
Once you understand the issue yes. Although if the system is the one you normally use to check for problematic data before it gets to the live system it isn't necessarily easy. The report suggests that such flight plans are now being trapped outside of the UK by IFPS until FPRSA has been "fixed".
eglnyt is offline  
Old 6th Sep 2023, 08:46
  #269 (permalink)  
 
Join Date: Dec 2015
Location: Budapest
Posts: 320
Received 238 Likes on 141 Posts
Originally Posted by eglnyt
The problem is the real world isn't simplistic. FPRS-A will encounter bad things in its processing. Some of those it will fix automatically, some it will understand why its bad, flag that for the attention of a human operator and carry on processing everything else. There may be many different cases of that all which will have to be programmed. In cases where the system knows it's bad but not why it usually means that the designer of the software didn't envisage that specific event happening. The problem with systems of this type is when that happens the safest thing to do is to stop everything so the global exception event will normally do that.
Again, thank you! But "if the system knows it's bad" you are halfway there...
Expatrick is online now  
Old 6th Sep 2023, 08:54
  #270 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 485
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by golfbananajam
Seems to me you 're confusing software testing (which is what I've tried to describe) with business continuity testing, which I haven't described but which seems to have been sorely lacking.
Reading between the lines in Section 5 it looks as though they thought that whatever happened to FPRSA they would be able to fix it before that failure became a business issue. They were obviously wrong but sadly exactly why isn't in this report and left for later.
eglnyt is offline  
Old 6th Sep 2023, 09:03
  #271 (permalink)  
 
Join Date: Nov 2004
Location: EGCC
Age: 56
Posts: 69
Likes: 0
Received 0 Likes on 0 Posts
Doubling-back?

"Once the ADEXP file had been received, the FPRSA-R software commenced searching for the UK airspace entry point in the waypoint information per the ADEXP flight plan, commencing at the first line of that waypoint data. FPRSA-R was able to specifically identify the character string as it appeared in the ADEXP flight plan text. Having correctly identified the entry point, the software moved on to search for the exit point from UK airspace in the waypoint data. Having completed those steps, FPRSA-R then searches the ICAO4444 section of the ADEXP file. It initially searches from the beginning of that data, to find the identified UK airspace entry point. This was successfully found. Next, it searches backwards, from the end of that section, to find the UK airspace exit point. This did not appear in that section of the flight plan so the search was unsuccessful. As there is no requirement for a flight plan to contain an exit waypoint from a Flight Information Region (FIR) or a country’s airspace, the software is designed to cope with this scenario. Therefore, where there is no UK exit point explicitly included, the software logic utilises the waypoints as detailed in the ADEXP file to search for the next nearest point beyond the UK exit point. This was also not present. The software therefore moved on to the next waypoint. This search was successful as a duplicate identifier appeared in the flight plan. Having found an entry and exit point, with the latter being the duplicate and therefore geographically incorrect, the software could not extract a valid UK portion of flight plan between these two points. This is the root cause of the incident."

Did NAS simply find the duplicate waypoint and double-back the entire FPL to the ENTRY waypoint, from which it couldn't determine an onwards route?

I worked with NAS when I was in FPRS and it was always a mither. Given that it was one FPL out of 5,000-odd, if it WAS doubling-back surely it would just go round and round in circles indefinitely... with bare effect on the majority of unaffected sectors.
​​​​



​​

Last edited by SATCO; 6th Sep 2023 at 09:09. Reason: Addendum
SATCO is offline  
Old 6th Sep 2023, 09:23
  #272 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 485
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by SATCO

Did NAS simply find the duplicate waypoint and double-back the entire FPL to the ENTRY waypoint, from which it couldn't determine an onwards route?

​​
It wasn't NAS, it's a completely different system.
That explanation in the report isn't clear but it seems to have found the Entry Point but not the Exit Point. If it can't find the Exit Point it apparently then searches for Waypoints beyond the border, presumably stepping out a step each time it doesn't find any of those in the plan. It found one of those in the plan but, because the waypoint in the plan was a duplicate of one surrounding the UK FIR, became confused because that waypoint wasn't correct in a geographical context.
Any system that just goes round in circles would be an issue, if NAS did nothing else would get processed so it would affect everybody. NAS will only go around the loop a limited number of time before stopping itself. Stopping the whole system is usually preferable to being stuck in a loop which is why most ATC systems will do that as the last resort. FPRSA doesn't seem to have been stuck in a loop but still applied that approach.

Last edited by eglnyt; 6th Sep 2023 at 10:01. Reason: respond to addendum
eglnyt is offline  
Old 6th Sep 2023, 10:00
  #273 (permalink)  
 
Join Date: Nov 2004
Location: EGCC
Age: 56
Posts: 69
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by eglnyt
It wasn't NAS, it's a completely different system.
That explanation in the report isn't clear but it seems to have found the Entry Point but not the Exit Point. If it can't find the Exit Point it apparently then searches for Waypoints beyond the border, presumably stepping out a step each time it doesn't find any of those in the plan. It found one of those in the plan but, because the waypoint in the plan was a duplicate of one surrounding the UK FIR, became confused because that waypoint wasn't correct in a geographical context.
Sorry, I meant FPRSRA, not NAS... but I make no apology for the latter being a menace to this day!

And the report is indeed not terribly clear, I couldn't work out whether it was looking FORWARDS from the last known or looking BACKWARDS towards it, thus my question about 'doubling-back.' Easy to see therefore if it was reading forwards how it might find something and then realise upon finding it that it had already found it once before, in this case at the start of route. Still not sure how it would impact unaffected sectors so broadly on the basis of a single aspect in a single flight plan.
SATCO is offline  
Old 6th Sep 2023, 10:13
  #274 (permalink)  
 
Join Date: Sep 2007
Location: UK
Posts: 78
Likes: 0
Received 0 Likes on 0 Posts
What I still don’t understand is just because the FDPS couldn’t deal with 1 FPL out of the thousands in the system at that time, it decided that forcing the whole system to revert to manual mode is preferable to just displaying the potentially incorrect flight data to ATCOs for that 1 flight. Bearing in mind that ATCOs safely deal with dodgy flight plans or flights with no FPL tactically on a daly basis anyway.
callum91 is offline  
Old 6th Sep 2023, 10:31
  #275 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 485
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by callum91
What I still don’t understand is just because the FDPS couldn’t deal with 1 FPL out of the thousands in the system at that time, it decided that forcing the whole system to revert to manual mode is preferable to just displaying the potentially incorrect flight data to ATCOs for that 1 flight. Bearing in mind that ATCOs safely deal with dodgy flight plans or flights with no FPL tactically on a daly basis anyway.
Indeed. Unfortunately the design of that part is one of the questions left till later. Understandable as that is on the Design Authority not NATS.
However in context you have at least 30 minutes to fix it before the data starts to become stale and probably 4 hours before you really start to impact on the dependent ATCO facing systems so that system design might be appropriate if you can deal with it quickly. In the event we know that they didn't but exactly why is also left until later.
eglnyt is offline  
Old 6th Sep 2023, 10:49
  #276 (permalink)  
 
Join Date: Aug 2023
Location: England
Posts: 7
Likes: 0
Received 0 Likes on 0 Posts
Section 3.2 says “If the submitted flight plan is accepted by IFPS (on the continent), i.e. it is compliant with IFPS defined parameters, it will inform the airline that filed the flight plan that it has been accepted. This is sufficient for a flight to depart with local ATC approval. The flight plan will be sent from IFPS to all relevant ANSPs who need to manage the flight. For the UK this data is received at NATS”

I thought NATS had the ability to say to a plane at say Rome airport “I accept your flight plan but don't take off at the requested time but a bit later at xxx” In that way you can fly smoothly straight to your UK airport without say stacking, thus saving you fuel.” But this para is saying it has taken off before NATS gets to approve it. Is what I say a future plan that is not currently implemented (SESAR?)?

Secondly, note that due to national interests, it’s common for many ANSPs to have bespoke systems not found elsewhere. ANSPs don’t like to be told “Buy Joe Blogg’s off the shelf system” if only because they have some legacy systems they need to work around or like to favour their national suppliers. So NAS is bespoke to NATS (I think) for legacy reasons and so needs some special software in FPRSA to interface with it. Some here seem to be blaming NATS for needing a non-standard FPRSA but I suspect no others/few have a standard one either.
Engineer39 is offline  
Old 6th Sep 2023, 11:00
  #277 (permalink)  
 
Join Date: May 2016
Location: UK
Posts: 6
Likes: 0
Received 1 Like on 1 Post
... it decided that forcing the whole system to revert to manual mode is preferable to just displaying the potentially incorrect flight data to ATCOs for that 1 flight.
Well, "it" the software did not decide (that would be AI ), the designers / specifiers did.

Likely a question for eglynt, the FPRSA-R seems to be planned on falling over, since there is manual input system. Can the 2 work alongside each other? i.e. could the FPRSA-R throw up an error message (rather than full exception) saying "I cannot deal with Flt Plan XYZ, meanwhile I will carry on". The an ATCO can get Flt Plan XYZ and manually input it.

Martin Rolfe seems keen to say this has never happened before in 15 million flights. Someone inclined to believe him might understand that to mean it has flawlessly processed every Flt Plan since 2018 without a single hiccup. From this thread, it seems it has always been temperamental, just the 4 hour buffer has, to date proved adequate to solve the issue.

I couldn't work out whether it was looking FORWARDS from the last known or looking BACKWARDS towards it, thus my question about 'doubling-back.'
(p9) looks forward through flight to identify UK entry point. Then goes to end of whole route and works back to find UK exit point, which it could not find because it was not (no need for it to be) specified. It then looked for points near to UK airspace to try and work out an exit - but seems it found the duplicate name point and picked the wrong one, and now "the software could not extract a valid UK portion of flight plan between these two points" at which point it threw a wobbly.

I am not entirely convinced by "However, since flight data is safety critical information that is passed to ATCOs the system must be sure it is correct and could not do so in this case. It therefore stopped operating, avoiding any opportunity for incorrect data being passed to a controller. The change to the software will now remove the need for a critical exception to be raised in these specific circumstances." - since the software correctly identified the dodgy Flt Plan, and could just not have passed it on, letting someone manually do it.

My guess is this now becomes the work of spin doctors who can only state "safety - worked as designed" and will require leaks from inside NATS as to how "wonderful" FPRSA-R really was, and whether MR is speaking the truth in that it had never got confused or stopped since 2018?? If it was temperamental, requiring manual interventions, then how these were reported investigated solved would be interesting...
Gupeg is offline  
Old 6th Sep 2023, 11:03
  #278 (permalink)  
 
Join Date: Sep 2007
Location: UK
Posts: 78
Likes: 0
Received 0 Likes on 0 Posts
Tail wagging the dog comes to mind when it comes to systems/automation.
callum91 is offline  
Old 6th Sep 2023, 12:03
  #279 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 485
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by SATCO
Sorry, I meant FPRSRA, not NAS... but I make no apology for the latter being a menace to this day!

And the report is indeed not terribly clear, I couldn't work out whether it was looking FORWARDS from the last known or looking BACKWARDS towards it, thus my question about 'doubling-back.' Easy to see therefore if it was reading forwards how it might find something and then realise upon finding it that it had already found it once before, in this case at the start of route. Still not sure how it would impact unaffected sectors so broadly on the basis of a single aspect in a single flight plan.
FPRSA stopped, it seems to have been designed to do so when it found something like this. That meant the UK effectively stopped receiving IFR Flight Plans, all of them regardless of the planned routing within the UK. No problem initially as the downstream systems have a buffer. As that buffer came close to depletion and with no fix in sight flow was imposed to limit the number of incoming flights to the UK to that number which could be manually created beyond FPRSA.
eglnyt is offline  
Old 6th Sep 2023, 12:27
  #280 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 485
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Engineer39
Section 3.2 says “If the submitted flight plan is accepted by IFPS (on the continent), i.e. it is compliant with IFPS defined parameters, it will inform the airline that filed the flight plan that it has been accepted. This is sufficient for a flight to depart with local ATC approval. The flight plan will be sent from IFPS to all relevant ANSPs who need to manage the flight. For the UK this data is received at NATS”

I thought NATS had the ability to say to a plane at say Rome airport “I accept your flight plan but don't take off at the requested time but a bit later at xxx” In that way you can fly smoothly straight to your UK airport without say stacking, thus saving you fuel.” But this para is saying it has taken off before NATS gets to approve it. Is what I say a future plan that is not currently implemented (SESAR?)?

Secondly, note that due to national interests, it’s common for many ANSPs to have bespoke systems not found elsewhere. ANSPs don’t like to be told “Buy Joe Blogg’s off the shelf system” if only because they have some legacy systems they need to work around or like to favour their national suppliers. So NAS is bespoke to NATS (I think) for legacy reasons and so needs some special software in FPRSA to interface with it. Some here seem to be blaming NATS for needing a non-standard FPRSA but I suspect no others/few have a standard one either.
Long distance flights, as this one seems to be, fall outside flow management processes because there is no practical way to include them. NATS receives flight data from IFPS about 4 hours in advance so for lengthy flights they have been in the air for some time before NATS receives the flight plan.
Where flow management is in play it isn't NATS which sorts that out. NATS tells the European System how many flights it can accept in a sector in a proscribed period and the Network Operator sorts out which flights can go with respect to that and any other restrictions elsewhere in Europe.
These systems will have modules which are completely unchanged in every system, modules whose action is changed through adaptation data unique to each ANSP, and modules which have been changed to meet the unique requirements of the ANSP because those requirements couldn't be handled with adaptation. If asked to guess I would expect more of the latter in the NATS system just because NATS is NATS.
eglnyt is offline  


Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.