Go Back  PPRuNe Forums > Flight Deck Forums > Rumours & News
Reload this Page >

U.K. NATS Systems Failure

Wikiposts
Search
Rumours & News Reporting Points that may affect our jobs or lives as professional pilots. Also, items that may be of interest to professional pilots.

U.K. NATS Systems Failure

Thread Tools
 
Search this Thread
 
Old 6th Sep 2023, 12:40
  #281 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 479
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Gupeg
Well, "it" the software did not decide (that would be AI ), the designers / specifiers did.

Likely a question for eglynt, the FPRSA-R seems to be planned on falling over, since there is manual input system. Can the 2 work alongside each other? i.e. could the FPRSA-R throw up an error message (rather than full exception) saying "I cannot deal with Flt Plan XYZ, meanwhile I will carry on". The an ATCO can get Flt Plan XYZ and manually input it.

Martin Rolfe seems keen to say this has never happened before in 15 million flights. Someone inclined to believe him might understand that to mean it has flawlessly processed every Flt Plan since 2018 without a single hiccup. From this thread, it seems it has always been temperamental, just the 4 hour buffer has, to date proved adequate to solve the issue.

(p9) looks forward through flight to identify UK entry point. Then goes to end of whole route and works back to find UK exit point, which it could not find because it was not (no need for it to be) specified. It then looked for points near to UK airspace to try and work out an exit - but seems it found the duplicate name point and picked the wrong one, and now "the software could not extract a valid UK portion of flight plan between these two points" at which point it threw a wobbly.

I am not entirely convinced by "However, since flight data is safety critical information that is passed to ATCOs the system must be sure it is correct and could not do so in this case. It therefore stopped operating, avoiding any opportunity for incorrect data being passed to a controller. The change to the software will now remove the need for a critical exception to be raised in these specific circumstances." - since the software correctly identified the dodgy Flt Plan, and could just not have passed it on, letting someone manually do it.

My guess is this now becomes the work of spin doctors who can only state "safety - worked as designed" and will require leaks from inside NATS as to how "wonderful" FPRSA-R really was, and whether MR is speaking the truth in that it had never got confused or stopped since 2018?? If it was temperamental, requiring manual interventions, then how these were reported investigated solved would be interesting...
FPRSA probably could, and should, have been programmed to isolate that plan and carry on. It wasn't, why it wasn't will hopefully be covered in the fuller report provided the supplier is happy to help. We know that plans were entered manually so somebody in NATS was able to access that raw flight plan data and type in the details. Could you have bypasssed the system and automatically process that flight plan data. Yes but that takes us back to the difficulty in procuring two different systems to do the same job discussed at length earlier with two entrenched points of view.

To be fair to all involved as far as I know FPRSA has not been temperamental in either its current or pre 2018 forms. NAS has but not for a very long time, ironically lots of similar issues are controlled for NAS purely because it is so old and they occurred previously. I would be surprised if all the "issues" associated with NAS in the past did not form the basis of requirements and subsequent test placed on FPRSA in 2018.
eglnyt is offline  
Old 6th Sep 2023, 13:12
  #282 (permalink)  
 
Join Date: Jun 2008
Location: Cambridge UK
Posts: 192
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by eglnyt
FPRSA stopped, it seems to have been designed to do so when it found something like this. That meant the UK effectively stopped receiving IFR Flight Plans, all of them regardless of the planned routing within the UK. No problem initially as the downstream systems have a buffer. As that buffer came close to depletion and with no fix in sight flow was imposed to limit the number of incoming flights to the UK to that number which could be manually created beyond FPRSA.
Non-aviation lurker & retired software engineer.

I'm finding it hard to understand why no attempt was made to continue with the the offending flight-plan handled by the manual system. For arguments sake, say after a cold restart of the system in case there had been any corruption.

Presumably some sort of fix or all-clear was eventually issued (I assume "continue but handle the offending flight-plan by hand"). I'm having difficulty imaging what type of issue(s) would require 4 hours to find or exclude.

PS I'm in some of the outer circles of confusion shown in the latest New Scientist cartoon.
Peter H is offline  
Old 6th Sep 2023, 13:18
  #283 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 479
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Peter H
Non-aviation lurker & retired software engineer.

I'm finding it hard to understand why no attempt was made to continue with the the offending flight-plan handled by the manual system. For arguments sake, say after a cold restart of the system in case there had been any corruption.

Presumably some sort of fix or all-clear was eventually issued issued (I assume "continue but handle the offending flight-plan by hand"). I'm having difficulty imaging what type of issue(s) would require 4 hours to find or exclude.

PS I'm in some of the outer circles of confusion shown in the latest New Scientist cartoon.
The expectation would seem to have been that the would do exactly that. Identify the errant data, isolate it and move on. They seem to have struggled to identify the errant data and possibly then struggled to remove it, exactly why isn't really covered in the prelim report, hopefully it will come later.

If you can't isolate that data then it's going to fail every time. If you take all the data out then your system is equally ineffective.

eglnyt is offline  
Old 6th Sep 2023, 13:25
  #284 (permalink)  
 
Join Date: Mar 2006
Location: England
Posts: 995
Likes: 0
Received 4 Likes on 2 Posts
"I'm finding it hard to understand why no attempt was made to continue with the the offending flight-plan handled by the manual system."

Possibly that the errant flight plan could not be identified - if it could then why didn't the software throw it back.

Some very odd numbers / usage in the public domain.
"1 in 15m flight plans over five years" could be 1 in 3m per year; either value is meaningless because they would be forecasts - technical guess, objectives - which were not met.

interesting views in https://snafucatchers.github.io/ #111
PEI_3721 is online now  
Old 6th Sep 2023, 13:36
  #285 (permalink)  
 
Join Date: Oct 2004
Location: KORR somewhere
Posts: 378
Received 1 Like on 1 Post
Have details of the offending duplicated waypoint name been released? I cant seem to find any information about it/them.
plans123 is offline  
Old 6th Sep 2023, 13:41
  #286 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 479
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by plans123
Have details of the offending duplicated waypoint name been released? I cant seem to find any information about it/them.
Not in the report which is annoying for those of us interested in detail but not actually necessary to understand the report.

It may be a deliberate omission to avoid the "blame the French" type headlines we saw last week.
eglnyt is offline  
Old 6th Sep 2023, 13:56
  #287 (permalink)  
 
Join Date: Jan 2008
Location: LONDON
Posts: 199
Received 21 Likes on 12 Posts
Originally Posted by eglnyt
The expectation would seem to have been that the would do exactly that. Identify the errant data, isolate it and move on. They seem to have struggled to identify the errant data and possibly then struggled to remove it, exactly why isn't really covered in the prelim report, hopefully it will come later.

If you can't isolate that data then it's going to fail every time. If you take all the data out then your system is equally ineffective.
A solution to this sort of problem was proposed as long ago as 1978

Telnet randomly lose option

...which addressed the following:

Several hosts appear to provide random lossage, such as system crashes, lost data, incorrectly functioning programs, etc., as part of their services. These services are often undocumented and are in general quite confusing to the novice user.

A general means is needed to allow the user to disable these features.
netstruggler is offline  
Old 6th Sep 2023, 16:16
  #288 (permalink)  
 
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes on 0 Posts
I see that Simon Calder speculates on the identity of the flight with the flight plan that triggered the failure :

"The report does not reveal the airline or route, merely saying: “The flight was planned to depart at around 4am [BST] on 28 August, and arrive at around 3pm.”

The service that most closely matches these timings, and passed over UK airspace, is Air France flight AF85 from San Francisco to Paris CDG. It is scheduled to depart daily at 4am British time and arrive in the French capital at 2.50pm, BST. This is speculation by The Independent and has not been confirmed."

Is that flight plan accessible to determine the duplicated waypoints ?
118.70 is offline  
Old 6th Sep 2023, 17:09
  #289 (permalink)  
 
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes on 0 Posts
FlightAware gives https://www.flightaware.com/live/fli...310Z/KSFO/LFPG

And there appears to be the possibility of duplicated MORAG waypoints (at least according to the OpenNav database)


118.70 is offline  
Old 6th Sep 2023, 17:28
  #290 (permalink)  
 
Join Date: Jan 2008
Location: Reading, UK
Posts: 15,819
Received 201 Likes on 93 Posts
Those are simply two references to the same waypoint on the UIR boundary.



The other intersections on the boundary similarly appear under both IE and UK
DaveReidUK is offline  
Old 6th Sep 2023, 17:45
  #291 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 479
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by DaveReidUK
Those are simply two references to the same waypoint on the UIR boundary.

The other intersections on the boundary similarly appear under both IE and UK
As Dave says MORAG isn't a duplicate just based on that evidence. It is also a UK Entry point which doesn't match the scenario suggested in the report. If it is that flight then BIBAX or a following waypoint would be the most likely duplicate.

It may be that flight but only if it had never filed that route recently. Dave might have info on how long it has been operating and where it normally goes.
eglnyt is offline  
Old 6th Sep 2023, 19:18
  #292 (permalink)  
 
Join Date: Jan 2008
Location: Reading, UK
Posts: 15,819
Received 201 Likes on 93 Posts
AFR085 has been operating at least as far back as 2011. Routeing varies all the way from overflying the whole of the UK from the Hebrides south, to not overflying at all, dependent obviously on the day's NAT tracks.

So doesn't really help, I'm afraid.
DaveReidUK is offline  
Old 6th Sep 2023, 21:51
  #293 (permalink)  
 
Join Date: Mar 2016
Location: Location: Location
Posts: 59
Received 0 Likes on 0 Posts
The airline sent the plan to IFPS which checked it was in the correct format (which it was) and accepted it. IFPS passed it to Swanwick at the appropriate time. There was an anomaly in the route (duplicate fixes) which by NATS' admission the FPRSA-R program logic couldn't handle. NATS says this "led to a 'critical exception' whereby both the primary system and its backup entered a fail-safe mode". Personally I find this hard to believe. If the FPRSA-R was still in control of itself surely it would say "Look chaps - this route looks rather weird. There appears to be a duplicate fix. I'll ignore it for now until you guys figure it out. In the meantime I'll carry on processing all the other flight plans". IMHO 'critical exception' and 'fail-safe mode' are spin for "it crashed".

Last edited by CBSITCB; 6th Sep 2023 at 22:40. Reason: Typo
CBSITCB is offline  
Old 6th Sep 2023, 22:28
  #294 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 479
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by CBSITCB
The airline sent the plan to IFPS which checked it was in the correct format (which it was) and accepted it. IFPS passed it to Swanwick at the appropriate time. There was an anomaly in the route (duplicate fixes) which by NATS' admission the FPRSA-R program logic couldn't handle. NATS says this "led to a ‘critical exception’ whereby both the primary system and its backup entered a fail-safe mode". Personally I find this hard to believe. If the FPRSA-R was still in control of itself surely it would say "Look chaps - this route looks rather weird. There appears to be a duplicate fix. I'll ignore it for now until you guys figure it out. In the meantime I'll carry on processing all the other flight plans". IMHO 'critical exception' and 'fail-safe mode' are spin for "it crashed".
I'm not sure it ever identified that there was a duplicate. It processed the route & got a result it didn't understand. I think the reason it got a result it didn't understand was identified later. But even if it didn't know why it got a strange result it should have been able to identify which plan gave that strange result, isolate it and move on.

Only the design authority can explain why it handled things the way it did. The nuclear option is appropriate for some errors but rather overkill in this case.

As you say describing a crash of your system as failsafe is a rather desperate spin.
eglnyt is offline  
Old 6th Sep 2023, 22:38
  #295 (permalink)  
 
Join Date: Mar 2016
Location: Location: Location
Posts: 59
Received 0 Likes on 0 Posts
Ryanair’s Michael O’Leary said [[url]https://corporate.ryanair.com/news/ryanair-rejects-nats-whitewash-report/?market=en&fbclid=IwAR3prT8auG3wrZGt_sHH1fFvA0lQJAyJWUKm1HLp ioYJBwVTP5bez-46-xU]:

“Finally, we do not accept NATS claim that it is “not within remit” to provide cost reimbursement to customers. Ryanair pays NATS almost €100m p.a. for an ATC service that is repeatedly short staffed and on the 28thAug, collapsed altogether. The least NATS could and should do is to reimburse its airline customers for the tens of millions of pounds they have spent reimbursing passengers for their hotel, meals and transport expenses, which were entirely due to NATS system failure, and NATS backup system failure on Mon 28th Aug last.

If NATS fail to reimburse its customers for these expenses, then Secretary of Transport Mark Harper should intervene (as the largest shareholder in NATS), and instruct NATS to reimburse NATS airline customers for these right to care expenses.

This Report, which is full of false figures about flight cancellations and delays, and avoids any explanation of why NATS backup system failed so spectacularly will not solve this problem unless NATS accepts responsibility for its incompetence and reimburses airlines and passengers for the avoidable right to care expenses they suffered due to NATS failure on Mon 28th Aug last.”


I bet the contract and legal people at Frequentis are keeping their heads well down...
CBSITCB is offline  
Old 7th Sep 2023, 03:18
  #296 (permalink)  
 
Join Date: Dec 2011
Location: UK
Posts: 965
Received 1 Like on 1 Post
He has a very strong argument in my opinion, however all that will happen is the taxpayer footing the bill eventually. The Treasury paying out £10Ms to a company already making several billion in profit isn’t going to look great optically.
Dannyboy39 is offline  
Old 7th Sep 2023, 07:50
  #297 (permalink)  
 
Join Date: Apr 2018
Location: Sudbury, Suffolk
Posts: 256
Received 0 Likes on 0 Posts
Originally Posted by Dannyboy39
He has a very strong argument in my opinion, however all that will happen is the taxpayer footing the bill eventually. The Treasury paying out £10Ms to a company already making several billion in profit isn’t going to look great optically.
And that is a good point. Why SHOULD the general tax payer subsidise the commercial activities of providers of air transport?

I suspect the answer to this question lies in the revenue raised by taxation on these activities, by analogy with road fund licence and fuel duty BUT a) none of these revenue streams are hypothecated for the spend and b) the sorts of sums being claimed in compensation are beyond anything already budgeted for and so WOULD fall on the public purse.

Maninthebar is offline  
Old 7th Sep 2023, 07:58
  #298 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 479
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Dannyboy39
He has a very strong argument in my opinion, however all that will happen is the taxpayer footing the bill eventually. The Treasury paying out £10Ms to a company already making several billion in profit isn’t going to look great optically.
Does he have a strong argument?

Currently the "system" places the obligation to look after customers onto the airline, that obligation is limited in cases such as this one. Airlines know they have that obligation and have the opportunity to price that risk and factor it into the ticket price. Whether they do is entirely a business decision on their part, ATC failures are not unheard of so they can hardly claim them to be a total surprise when they happen.

In the UK at least the ANSP doesn't have that obligation and hasn't had the opportunity to price that risk and factor it into the fees.

I don't think it is right to change the rules of the game half way through even if you think the rules are wrong. If you want to change the rules you have to allow the parties to change their game plan based on the new rules. That would mean a discussion on what ATC fees should be to allow the ANSP to price in that risk.

Discuss
eglnyt is offline  
Old 7th Sep 2023, 13:15
  #299 (permalink)  
 
Join Date: Jun 2023
Location: EU
Posts: 13
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Flying Wild
12+ hour slot delays in some cases.
I had an 11 hours delay recently
motardos is offline  
Old 7th Sep 2023, 14:00
  #300 (permalink)  
 
Join Date: Nov 2018
Location: UK
Posts: 82
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by eglnyt
Does he have a strong argument?

Currently the "system" places the obligation to look after customers onto the airline, that obligation is limited in cases such as this one. Airlines know they have that obligation and have the opportunity to price that risk and factor it into the ticket price. Whether they do is entirely a business decision on their part, ATC failures are not unheard of so they can hardly claim them to be a total surprise when they happen.

In the UK at least the ANSP doesn't have that obligation and hasn't had the opportunity to price that risk and factor it into the fees.

I don't think it is right to change the rules of the game half way through even if you think the rules are wrong. If you want to change the rules you have to allow the parties to change their game plan based on the new rules. That would mean a discussion on what ATC fees should be to allow the ANSP to price in that risk.

Discuss
Michael O'Leary hates the fact he is forced to pay tens of millions for ATC services he believes are substandard, and he has no choice. Ryanair also has consistently had over 50% of flights into London delayed, BA, a NATS shareholder suffers almost no (ATC related) delays. If Ryanair could they would buy from abroad, but they can't, and the work practices eg the French, are arguably much worse. Hardly a great customer experience.
Neo380 is offline  


Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.