All London airspace closed
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes
on
0 Posts
CAA announce independent enquiry :
Independent inquiry into air traffic control failure announced | CAA Newsroom | About the CAA
The UK Civil Aviation Authority (CAA) and NATS have agreed to the establishment of an independent inquiry following the disruption caused by the failure in air traffic management systems on the afternoon of Friday 12th December 2014.
The CAA will, in consultation with NATS, appoint an independent chair of the panel which will consist of NATS technical experts, a board member from the CAA and independent experts on information technology, air traffic management and operational resilience. The full terms of reference will be published following consultation with interested parties including airlines and consumer groups but it is expected that the review will cover, as a minimum:
1. The root causes of the incident on Friday
2. NATS’ handling of that incident to minimise disruption without compromising safety
3. Whether the lessons identified in the review of the disruption in December 2013 have been fully embedded and were effective in this most recent incident
4. A review of the levels of resilience and service that should be expected across the air traffic network taking into account relevant international benchmarks
5. Further measures to avoid technology or process failures in this critical national infrastructure and reduce the impact of any unavoidable disruption
For more information, please contact the CAA Press Office, on [email protected], or 020 7453 6030 out of hours 07789745636
The CAA will, in consultation with NATS, appoint an independent chair of the panel which will consist of NATS technical experts, a board member from the CAA and independent experts on information technology, air traffic management and operational resilience. The full terms of reference will be published following consultation with interested parties including airlines and consumer groups but it is expected that the review will cover, as a minimum:
1. The root causes of the incident on Friday
2. NATS’ handling of that incident to minimise disruption without compromising safety
3. Whether the lessons identified in the review of the disruption in December 2013 have been fully embedded and were effective in this most recent incident
4. A review of the levels of resilience and service that should be expected across the air traffic network taking into account relevant international benchmarks
5. Further measures to avoid technology or process failures in this critical national infrastructure and reduce the impact of any unavoidable disruption
For more information, please contact the CAA Press Office, on [email protected], or 020 7453 6030 out of hours 07789745636
Everyone seems to blame it on "the computer" but there is no really understandable technical description being offered so difficult to comment. And "it's old" is certainly not a technical description - what is needed for such is an explanation of why it has worked satisfactorily for so long before encountering an issue, and what caused the issue to manifest itself now.
But that's tech. As I understand it there was an outage for an hour or so. Aviation is of course well used to hour-long holdups for a wide variety of reasons. What REALLY needs to be investigated is why it then took so long for normality to be restored - there were still significant BA cancellations the following day.
This is something which increasingly afflicts not only aviation but also other transport modes like rail or road, the length of time taken to recover the service from an incident going ever upwards. It seems that NATS have been on a substantial staff reduction exercise; it is moments like this when you find out that those staff were actually doing something. Likewise for the airlines, the inability by some (not all) to have the resilience to come back from the various situations is one for them, not something to be just stuck on the ATC provider. The ability to blame it on "knock-on effect" is a glorious excuse for slowness and inertia rather than trying really hard to get things back straight again quickly. And that's nothing to do with computers.
Most notable of all is all the calls in the press for "investment" in replacement computers. Goodness me, the IT salesmen () must be smacking their lips at this early Christmas present, and some little placings by their PR teams with the media contacts whilst this is Hot News doubtless works wonders as well. Time and again the high-level know-nothings get themselves talked into spending money on new kit rather than dealing with the operational procedures and management which are the real issue. It's just like airport security. 10 security stations provided of which only 3 are staffed even at peak times, and a 30-minute queue. After many complaints, what's the solution ? More security stations, of course.
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes
on
0 Posts
Doesn't "The Register" article imply that there was a combination of circumstances that prevented the usual responses to the failure of the flight data processing system (holding the database of filed flight plans) getting linked back to the central flight server (holding the radar data) within a critcally short period ? It sounds as though the failures of the flight data system are by no means uncommon but NATS is well rehearsed in getting it back on the road. Unfortunately a separate problem with the link resulted in busting the deadline to prevent the radar system complaining it was only holding stale data and forcing procedures with lower capacity to start.
I wonder which system has the "delinquent" line of code ?
I wonder which system has the "delinquent" line of code ?
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes
on
0 Posts
A previous failure of the link between the National Airspace System and the National Flight Data Processing System in 2008 seems to have been reported in Computer Weekly :
Failure of Swanwick comms link leads to flight delays
Failure of Swanwick comms link leads to flight delays
Join Date: Mar 2001
Location: etha
Posts: 300
Likes: 0
Received 0 Likes
on
0 Posts
In any exceptional event in the UK which affects flow control, the knock on effect normally does take days to recover. When Heathrow loses both runways for 15 minutes and then is single runway for a further hour then the delays can be felt up to 3 days after. Exactly this occurred following the emergency return of the BA flight to Oslo after it lost the engine cowlings on departure. It wouldn't have made any difference if Heathrow had any more runways in operation at the time, purely due to SAFETY landings and departures were stopped. That is exactly what happened at Swanwick, measures were taken to provide ultimate safety, and then as the system recovered, the traffic was gradually increased again.
Who is at fault? Anyone who has an open mind will see exactly what the main UK ATC union has to say on the subject here on the Prospect website
Who is at fault? Anyone who has an open mind will see exactly what the main UK ATC union has to say on the subject here on the Prospect website
Join Date: Mar 2007
Location: In my head
Posts: 694
Likes: 0
Received 0 Likes
on
0 Posts
Originally Posted by DelPrado
Slip and Turn, you have proven time and again on ATC threads that you don't know what you're talking about.
I recall in the past you trying to argue separation standards with a Heathrow tower controller.
Originally Posted by Gonzo
s&t,
Any shortfall in the pension, and there has been such over the past few years, has been met by increased employer contributions and changes in staff T&Cs.
Any shortfall in the pension, and there has been such over the past few years, has been met by increased employer contributions and changes in staff T&Cs.
Not exactly the best transparency for a publicly owned entity, is it? Bits here and there and no-one really volunteering the big picture ... ?
A paragraph written 4½ years ago in that Government Actuary's Dept. document said this:
Originally Posted by (elsewhere) in a GAD document for CAA
The NATS scheme’s benefits are more generous than those provided by typical UK private sector DB schemes. Approximate calculations suggest that, if the NATS scheme’s benefits were to be more typical, the employer’s standard contribution rate could be around 25% of pay, compared to the actual rate of 37% of pay. The purpose of this calculation is solely to illustrate the broad effect of the level of the NATS scheme’s benefits on NERL’s projected contributions. We have not been asked to comment on the reasonableness of the level of the scheme’s benefits. We recognise that the NATS scheme’s benefits reflect the scheme’s public sector origins and protections put in place at privatisation.
And is the NERL pension expenditure the lion's share of NATS overall pension commitments or is there more buried elsewhere in the various books?
Last edited by slip and turn; 15th Dec 2014 at 22:40.
Join Date: Jan 2008
Location: The foot of Mt. Belzoni.
Posts: 2,001
Likes: 0
Received 0 Likes
on
0 Posts
Slip and WHBM,
Last Friday, the folks wearing headsets and their immediate operational managers gave it their best shot.
Something happened that wasn't supposed to happen, and as a result, no-one died.
That's what it's all about.
Last Friday, the folks wearing headsets and their immediate operational managers gave it their best shot.
Something happened that wasn't supposed to happen, and as a result, no-one died.
That's what it's all about.
Folks, this whole stupid thread has been a trolling expediton, with a troublemaker knowing how to tweak sensibilities here.
Don't feed the troll! I don't think I have read such stupidity on this board before. It is designed to cause a hysterical reaction. Ignore.
Don't feed the troll! I don't think I have read such stupidity on this board before. It is designed to cause a hysterical reaction. Ignore.
Join Date: Feb 2006
Location: Hants
Posts: 2,295
Likes: 0
Received 0 Likes
on
0 Posts
S&T
Far be it for me to feed a troll but:
You claim to be able to read accounts
you claim to understand pensions
You fail to understand that HMG 49% holding does not mean that we receive money from the taxpayer to bolster pension or for anything else, even for investment in new equipment.
NATS pays for the pension contributions through employee payments and from company gross earnings.
Any investment in future equipment etc is either paid for directly from earnings, or from business loans.
Instead of 'propping up NATS' HMG shouldered NATS with a large loan which meant that HM treasury pocketed over £600M during PPP, but NATS have to service the debt.
As for knowledge of equipment etc... it was the modern, new equipment that caused this failure. The oldests technology, found in TC, was completely unaffected.
I'm sure you won't understand how this could be if we had to reduce flow, but I've fed you enough, someone else might like to explain to you why we needed the restrictions even if TC was able to operate normally... I'm certain you won't know despite your protestations about you wealth of knowledge.
Far be it for me to feed a troll but:
You claim to be able to read accounts
you claim to understand pensions
You fail to understand that HMG 49% holding does not mean that we receive money from the taxpayer to bolster pension or for anything else, even for investment in new equipment.
NATS pays for the pension contributions through employee payments and from company gross earnings.
Any investment in future equipment etc is either paid for directly from earnings, or from business loans.
Instead of 'propping up NATS' HMG shouldered NATS with a large loan which meant that HM treasury pocketed over £600M during PPP, but NATS have to service the debt.
As for knowledge of equipment etc... it was the modern, new equipment that caused this failure. The oldests technology, found in TC, was completely unaffected.
I'm sure you won't understand how this could be if we had to reduce flow, but I've fed you enough, someone else might like to explain to you why we needed the restrictions even if TC was able to operate normally... I'm certain you won't know despite your protestations about you wealth of knowledge.
Join Date: Apr 2014
Location: London
Posts: 148
Likes: 0
Received 0 Likes
on
0 Posts
Select committee to hold Deakin's feet to the fire:
Committee to question NATS and CAA over failure in air traffic management system - News from Parliament - UK Parliament
It might be on the UK "Parliament Channel", Freeview 131.
http://www.bbc.co.uk/parliament/prog...les/2014/12/19
Committee to question NATS and CAA over failure in air traffic management system - News from Parliament - UK Parliament
It might be on the UK "Parliament Channel", Freeview 131.
http://www.bbc.co.uk/parliament/prog...les/2014/12/19
Last edited by Downwind Lander; 16th Dec 2014 at 15:52.
NATS does not currently benefit from Govt support for its pension plan.
Most of the actual pensioners in the pre-privatisation phase were put in the CAA's part of the plan.
The real cost was in the privatisation process, when HMG stuffed the CAA and the NATS pension funds full of taxpayers money. That was because prior to the privatisation, the liabilities had been HMGs.
However, if either the NATS or the CAA's pension plans hit problems, they will rush to HMG for more money. It's one of those "Is the Pope a Catholic" sort of questions.
Most of the actual pensioners in the pre-privatisation phase were put in the CAA's part of the plan.
The real cost was in the privatisation process, when HMG stuffed the CAA and the NATS pension funds full of taxpayers money. That was because prior to the privatisation, the liabilities had been HMGs.
However, if either the NATS or the CAA's pension plans hit problems, they will rush to HMG for more money. It's one of those "Is the Pope a Catholic" sort of questions.
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes
on
0 Posts
It's called error handling and it is an absolutely critical part of any computer program. If a line of code receives unanticipated data (which may not be 'bad' per se), that unforeseen use case needs to have been foreseen by whomever put together the program spec, whomever agreed the program spec, whomever designed the logic that was intended to handle it flawlessly in the code, whomever checked it, whomever tested it, and whomever signed off on the project or module or upgrade, but one or all six or sixty of whom we all now know was mistaken. And there's the rub
So now we are told that a single line of code stopped the machine, what actually was it in the real time real life world that was unforeseen? That would be the real story.
If I was anotherthing or Gonzo or Zooker or eglnyt et al, I'd have asked that one at the office by now
So now we are told that a single line of code stopped the machine, what actually was it in the real time real life world that was unforeseen? That would be the real story.
If I was anotherthing or Gonzo or Zooker or eglnyt et al, I'd have asked that one at the office by now
So the 9020D had 3 input output processing IBM360's and 3 compute element IBM360's - all running at an impressive 300,000 integer instructions a second.
Now the architecture is what made the system reliable. The system was a multiprocessor mufti-programming system and any program that was pre-empted could be picked up by another processor. The system repeatedly recorded checkpoint recovery data from once a second out to a few minutes. So if an error was found by the computer (what would give a BSOD in a PC) the IBM360 involved would stop all the other processors and give them the checkpoint data and all the processors would rerun precisely the same program and data. If only one of the processors got the error then the error must be hardware in that processor and it put itself offline. If all the processors got the error then the error must be software and the 9020 did a core dump (a large hexadecimal printout) threw away all its input messages then restarted (startover) from a clean checkpoint say 3 minutes before. As software faults in a real time system are normally timing/preemption related or caused by a broken input message, the system would normally startover successfully. Controllers would receive a message 'STARTOVER at time - please re-input any messages" (or words to that effect.) If Gork put in the broken message again then it could cause the startover again. However, the Data systems specialist would be looking at the last messages in and identify Gork's message and somewhat testily suggest that he did not re-enter the message next time.
OK so now the system is rehosted as a virtual machine inside a nice shiny new machine. A lot of the automated recovery that was built in may not work quite that way (I don't know how that is now implemented) So I rather think that it may take more manual intervention if the Host software has a glitch.
Join Date: Apr 2014
Location: London
Posts: 148
Likes: 0
Received 0 Likes
on
0 Posts
Typescript of Mondays Transport Select Comittee meeting with McLoughlin.
http://data.parliament.uk/writtenevi...oral/16712.pdf
Video of interviews with Deakin, Rolfe and Haines.
London air traffic control failure examined - News from Parliament - UK Parliament
Report to be out by the beginning of April.
http://data.parliament.uk/writtenevi...oral/16712.pdf
Video of interviews with Deakin, Rolfe and Haines.
London air traffic control failure examined - News from Parliament - UK Parliament
Report to be out by the beginning of April.
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes
on
0 Posts
Draft terms of reference for the inquiry have appeared at
Terms of reference | Regulatory Policy | About the CAA
It mentions
and
which is the first time I have seen links to system changes as being a potential contributory cause.
Terms of reference | Regulatory Policy | About the CAA
It mentions
including the measures that had been put in place to prepare for routine changes to systems that had occurred on the 11 December 2014 date for the regulated changes to aeronautical information (the AIRAC date) and for the move of additional workstations to support the military task that was re-locating from Prestwick.
- The preparation and testing of premeditated operational/engineering changes to systems and procedures planned to take place on or about regular AIRAC dates or in association with particular infrastructure changes.
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes
on
0 Posts
The Flight Data Processing software uses an 'Adaptation Controlled Environment System' - that describes the airspace and a huge number of other parameters used by the FDPS as it is running. It is feasible that an error in the Adaptation update could lead to a program crash, but these changes are usually very carefully controlled. NATS has had a lot of practice doing them since the seventies.
Join Date: May 2002
Location: Manchester uk
Posts: 2
Likes: 0
Received 0 Likes
on
0 Posts
LATCC NAS Software
As the Software Engineer responsible for acceptance of the 9020 software by NATS in 1974 , I am concerned if it is still in use. Constant modification in dead computer languages and re-hosting over 30 years are not conducive to reliability .Part of the problem was and maybe still is NATS managements inability to successfully plan the next generation of systems whilst implementing the previous generations.
I think this whole incident shows up something of the over reaction from the media of anything to do with the aviation industry.
Of course it was a serious problem and off course there will be management failings because that's life -management effectively means making do not being perfect because perfect is always unrealistically expensive.
Since the incident occurred I have heard every single day train services in the London area disrupted by signal failures between x and y. Signalling is the ATC of the rail industry , it is an essential safety feature and it is complicated-a lot of it is also quite old.
If you add all the signal failures on the National and urban rail networks tis year I would bet that they caused more inconvenience and more delay to many more people than the other weeks problem.
Is there a call for an inquiry, are the heads of the relevant service providers summoned to Westminster ? no they were not but they probably should have been because they caused at least as much chaos just over a longer time span
Of course it was a serious problem and off course there will be management failings because that's life -management effectively means making do not being perfect because perfect is always unrealistically expensive.
Since the incident occurred I have heard every single day train services in the London area disrupted by signal failures between x and y. Signalling is the ATC of the rail industry , it is an essential safety feature and it is complicated-a lot of it is also quite old.
If you add all the signal failures on the National and urban rail networks tis year I would bet that they caused more inconvenience and more delay to many more people than the other weeks problem.
Is there a call for an inquiry, are the heads of the relevant service providers summoned to Westminster ? no they were not but they probably should have been because they caused at least as much chaos just over a longer time span