U.K. NATS Systems Failure
I once had to deal with a major retail park where all the stores were reporting a similar problem at the same time. Their card transactions were not going through . I got called out at silly o'clock to go there and have a look as the techy boys off site could not see the problem....
it was a wireless link to a 3G mast that someone built a building near and blocked the signal...
Solution was a Fibre Cable run of about 3 miles... So we dug two different routes in case someone else descided to put a digger through one...and yes some one did 😊
it was a wireless link to a 3G mast that someone built a building near and blocked the signal...
Solution was a Fibre Cable run of about 3 miles... So we dug two different routes in case someone else descided to put a digger through one...and yes some one did 😊
Multiple fibre routes to mission critical buildings are prety much the norm for many years now, even if the building owner doesnt spec it the comms provider is likely to insist. Sometimes there are more than two routes but they do need to be celalry seperated internally and externally.
CBSITCB gave a very eleoquant and concise descritpiton of non hrdware fail modes and software is useallya more difficult fix than hardware. However in a really mission critical situation then a huge amount of effort has to go into software resilience-think Airbus- and for that reason it would seem close to nonsensicle that misentered flight plans could cause such chaos.
I have worked with telecoms network managemnt and switching systems that had very high levels of resilience national or regional level of the telecoms network including in todays world intenet hardware reallycannot be allowed to fail. . To be fair they do not have the functional complexity of a system like NATs has msotly just switch paths around and maanging digital data flows. We did have one susyem that used a comparator function where two processors controlling one system constantly compared operating states. Ifa failure mode occured this would determien which system had in someway been corrupted and switched it out leaving the unchanged processor in charge.. However we did have a short outage due to the comparator failing so nothings completley foolproof . Personally i think the Daily Mail and Express sabotaged it to stop people travelling to Europe or vice versa
CBSITCB gave a very eleoquant and concise descritpiton of non hrdware fail modes and software is useallya more difficult fix than hardware. However in a really mission critical situation then a huge amount of effort has to go into software resilience-think Airbus- and for that reason it would seem close to nonsensicle that misentered flight plans could cause such chaos.
I have worked with telecoms network managemnt and switching systems that had very high levels of resilience national or regional level of the telecoms network including in todays world intenet hardware reallycannot be allowed to fail. . To be fair they do not have the functional complexity of a system like NATs has msotly just switch paths around and maanging digital data flows. We did have one susyem that used a comparator function where two processors controlling one system constantly compared operating states. Ifa failure mode occured this would determien which system had in someway been corrupted and switched it out leaving the unchanged processor in charge.. However we did have a short outage due to the comparator failing so nothings completley foolproof . Personally i think the Daily Mail and Express sabotaged it to stop people travelling to Europe or vice versa
DRUK said he doesn't believe the reason provided by NATS. I too find it inconceivable in this instance. See ATC Watcher's post #52 for an explanation of why NATS statement is at least highly suspect. Many of us who have significant experience in this realm (as does ATC Watcher) would call "BS" on NATS in this case.
Join Date: Oct 2004
Location: Southern England
Posts: 466
Likes: 0
Received 0 Likes
on
0 Posts
If we assume the problem was with NAS then the explanation of an "unusual" flight plan as the initiator is not inconceivable, that system has a 40 year history of similar events. Read the lengthy contributions above from those with an understanding of the system to see why. What that doesn't explain is why the controls & processes which have controlled that risk for the last 20 years didn't yesterday.
Join Date: Sep 2007
Location: Surrey
Posts: 2,251
Likes: 0
Received 0 Likes
on
0 Posts
I worked on a system at a large airport that crashed every time the QNH reached something like 1037.5. A rare enough event and I am full of admiration for whoever it was who spotted the link between all the failures. I would have said it was a system that had no need to know about QNH.

Join Date: Oct 2004
Location: Southern England
Posts: 466
Likes: 0
Received 0 Likes
on
0 Posts
Correct. I have no idea yet whether it was an issue with NAS or a French flight plan. But if that was the case it is a, probably unusual, flight plan that has been through multiple layers of validation that happens to expose an issue in the FDP system. The flight plan is the initiator and, using some methods, a cause but not the real issue.
Government,immediate response 'Its not a cyber attack'
NATS somewhat later response 'We dont know what caused it yet'
Perhaps HMG jumping the gun , ie nothing to do with us .
Can out of pocket pax sue NATS ?? Are memebers of the 'Airline Group' owning 42% of NATS liable even if the airlines are not ;. after all they are a significant part of the the management
NATS somewhat later response 'We dont know what caused it yet'
Perhaps HMG jumping the gun , ie nothing to do with us .
Can out of pocket pax sue NATS ?? Are memebers of the 'Airline Group' owning 42% of NATS liable even if the airlines are not ;. after all they are a significant part of the the management
Join Date: Oct 2004
Location: Southern England
Posts: 466
Likes: 0
Received 0 Likes
on
0 Posts
Can out of pocket pax sue NATS ?? Are memebers of the 'Airline Group' owning 42% of NATS liable even if the airlines are not ;. after all they are a significant part of the the management
This is all quite deliberate. NATS was privatised just after the Hatfield rail crash where penalties for service standards had produced an adverse effect on safety. The Government of the day wanted to avoid any financial imperative that would have any effect on day to day operational decisions affecting safety. Hence yesterday the flow regulation could be imposed purely with regard to safety.
Note that the Airline Group is a separate entity from the Airlines who hold shares in it, indeed not all shareholders in the Airline Group are airlines. The Airline Group is a PLC limiting the liability of its shareholders and in turn NATS is a collection of Limited Companies limiting the liability of its shareholders.
Join Date: Nov 2001
Location: Delta of Venus
Posts: 2,383
Likes: 0
Received 0 Likes
on
0 Posts
As I recall, Eurocontrol had then/has now, their system totally backed up, duplicated in two different control centres. As others have said, deteriorating infrastructure in the UK (or rather lack of it from the get go), but who pays to improve it & why should they pay???
This is all quite deliberate. NATS was privatised just after the Hatfield rail crash where penalties for service standards had produced an adverse effect on safety. The Government of the day wanted to avoid any financial imperative that would have any effect on day to day operational decisions affecting safety. Hence yesterday the flow regulation could be imposed purely with regard to safety.
Join Date: Dec 2007
Location: uk
Posts: 96
Likes: 0
Received 0 Likes
on
0 Posts
Some interesting discussion on this forum: https://forums.theregister.com/forum...hts_disrupted/
Join Date: Nov 2000
Location: Canada
Posts: 594
Likes: 0
Received 0 Likes
on
0 Posts
UK air travel disruption may last for days, says British transport minister Mark Harper
https://www.firstpost.com/world/uk-air-travel-disruption-may-last-for-days-says-british-transport-minister-mark-harper-13053662.htmlJoin Date: Oct 2004
Location: Southern England
Posts: 466
Likes: 0
Received 0 Likes
on
0 Posts
As I recall, Eurocontrol had then/has now, their system totally backed up, duplicated in two different control centres. As others have said, deteriorating infrastructure in the UK (or rather lack of it from the get go), but who pays to improve it & why should they pay???
NATS has a lot of redundancy and duplication and is currently investing in even more but please read the great explanations by others above as to why it might not always help.
Join Date: Nov 2001
Location: Delta of Venus
Posts: 2,383
Likes: 0
Received 0 Likes
on
0 Posts
Would that be the system that failed in 2018 when they thought it had switched to the other centre but hadn't, and because the phones didn't switch over as expected, nobody could ring them to tell them?
NATS has a lot of redundancy and duplication and is currently investing in even more but please read the great explanations by others above as to why it might not always help.
NATS has a lot of redundancy and duplication and is currently investing in even more but please read the great explanations by others above as to why it might not always help.
Some interesting discussion on this forum: https://forums.theregister.com/forum...hts_disrupted/

Pegase Driver
Join Date: May 1997
Location: Europe
Age: 73
Posts: 3,555
Likes: 0
Received 0 Likes
on
0 Posts
As far as I know at this time , no-one really know what caused the system to crash , the root cause I mean , not yet but likely by tomorrow or in the next coming days we'll know. From my experience in the last 40 years or so the Flight plan processing systems (FDPs) are generally crashing following a system update. one line of programming is wrong ,and when an external factor comes in . it causes the issue. This could happen at any time , generally when the system is peaking . Typically system updates are done at night, tested for a few hours , then if OK put on line in the morning ,At least that how we do it in most centers. Done it for years in my own center. If it crashed later in the day we just reverted to the previous level which is on stand by on the back up computers, whole thing takes no more that minutes of an hour max to be back to normal. When if takes half a day or more then something is wrong on your processes or your system architecture. Could also simply be the result of cost cutting measures, like not replacing back up computers, or outsourcings maintenance and code writing ,to far away countries with cheaper labor , etc.. I am not saying that this was the case here in NATS, but I have seen this happening in other places recently .
Finally FDP system failures are not a unique UK/NATS issue, Geneva had a major failure some time ago, , Brussels, a couple of years back etc. even Roma had one also yesterday.at the same time as London , so we feared a wider cyber attack. But so fat it looks like the 2 were not connected. But if the investigation later shows they were, then we really are in the sh*t .
Finally FDP system failures are not a unique UK/NATS issue, Geneva had a major failure some time ago, , Brussels, a couple of years back etc. even Roma had one also yesterday.at the same time as London , so we feared a wider cyber attack. But so fat it looks like the 2 were not connected. But if the investigation later shows they were, then we really are in the sh*t .