Go Back  PPRuNe Forums > Ground & Other Ops Forums > ATC Issues
Reload this Page >

All London airspace closed

Wikiposts
Search
ATC Issues A place where pilots may enter the 'lions den' that is Air Traffic Control in complete safety and find out the answers to all those obscure topics which you always wanted to know the answer to but were afraid to ask.

All London airspace closed

Thread Tools
 
Search this Thread
 
Old 18th Dec 2014, 16:16
  #121 (permalink)  
 
Join Date: Apr 2014
Location: London
Posts: 148
Likes: 0
Received 0 Likes on 0 Posts
stanprice says: "As the Software Engineer responsible for acceptance of the 9020 software by NATS in 1974..."

Who says there isn't a God? This is what I asked for in #56, page 3.

It might be a good idea, Stanprice, if you were to approach the Transport Select Committee (#117. page 6) with a view to sending in evidence, or possibly attending a session.

What other little surprises could this software have in store?
Downwind Lander is offline  
Old 18th Dec 2014, 16:40
  #122 (permalink)  
 
Join Date: Oct 2002
Location: London UK
Posts: 7,659
Likes: 0
Received 19 Likes on 16 Posts
We're still going down the wrong route.

This failure, a unique one, lasted less than an hour before it was overcome. Someone must have understood it and overcome it in that time. A 60 minute hangup is standard stuff in commercial aviation (on BA at a Heathrow gate you might still be waiting for engineering to make their way across after this time). What was the real issue was the lack of resilience in infrastructure and operations, and poor information dissemination, which let the whole thing just drag on and on. As I have pointed out previously, BA was still doing gross cancellations, 14 short haul departures from Heathrow alone, the following day, after this fleet had stood all night. This just shows an operation planned with insufficient flexibility to any disruption from any source.
WHBM is offline  
Old 18th Dec 2014, 17:02
  #123 (permalink)  
 
Join Date: Jan 2008
Location: Reading, UK
Posts: 15,822
Received 206 Likes on 94 Posts
Originally Posted by Downwind Lander
It might be a good idea, Stanprice, if you were to approach the Transport Select Committee (#117. page 6) with a view to sending in evidence, or possibly attending a session.
In the Transport Committee hearings on Monday it was stated that the independent [sic] joint CAA/NATS inquiry would take evidence from any interested parties.

Originally Posted by WHBM
What was the real issue was the lack of resilience in infrastructure and operations, and poor information dissemination, which let the whole thing just drag on and on. As I have pointed out previously, BA was still doing gross cancellations, 14 short haul departures from Heathrow alone, the following day, after this fleet had stood all night. This just shows an operation planned with insufficient flexibility to any disruption from any source.
That's a little hard on BA, though maybe you didn't intend it to sound like that.

Any network carrier that suffers a prolonged period of greatly reduced capacity at its principal shorthaul hub is going to struggle to get its operation back on schedule by the end of the day.

Last edited by DaveReidUK; 18th Dec 2014 at 19:44. Reason: typo
DaveReidUK is offline  
Old 18th Dec 2014, 17:21
  #124 (permalink)  
 
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by WHBM
We're still going down the wrong route.

This failure, a unique one, lasted less than an hour before it was overcome. Someone must have understood it and overcome it in that time. A 60 minute hangup is standard stuff in commercial aviation (on BA at a Heathrow gate you might still be waiting for engineering to make their way across after this time). What was the real issue was the lack of resilience in infrastructure and operations, and poor information dissemination, which let the whole thing just drag on and on. As I have pointed out previously, BA was still doing gross cancellations, 14 short haul departures from Heathrow alone, the following day, after this fleet had stood all night. This just shows an operation planned with insufficient flexibility to any disruption from any source.
Lack of flexibility isn't the cause, it's running a system at 105% which gives no leeway to recover from the smallest delays. System engineers would normally design overcapacity into a system so the loading is not normally more than 30% which allows for peak loads in excess of that to be coped with and the system recover. For various reasons ATM systems worldwide tend to be loaded to the level where the smallest event causes a cascade of disruption across the network. It is a tribute to the work of dispatchers, controllers and flow controllers that the system recovers as well as it does; theoretically it should take a lot longer.
Ian W is offline  
Old 18th Dec 2014, 18:45
  #125 (permalink)  
 
Join Date: Dec 1999
Location: LHR/EGLL
Age: 45
Posts: 4,392
Likes: 0
Received 0 Likes on 0 Posts
I stand to be corrected but there are many on this thread who appear to know what the cause of the problem was, and/or the piece of software involved.

Not sure that's been released yet, has it?
Gonzo is offline  
Old 18th Dec 2014, 19:28
  #126 (permalink)  
 
Join Date: Mar 2001
Location: etha
Posts: 300
Likes: 0
Received 0 Likes on 0 Posts
I think it is very obvious Gonzo that those wishing to stir up trouble regarding the failure have very little knowledge of what they are talking about. If any of them spent the time reading or watching factual evidence then they would come across better informed.

For those that don't feel it necessary to watch the hour long parliament meeting linked above, then it was stated in that meeting that the software that caused the issue was developed in the 1990's. I'm sorry if that fact gets in the way of your scaremongering.
zonoma is offline  
Old 18th Dec 2014, 19:43
  #127 (permalink)  
 
Join Date: Jan 2008
Location: Reading, UK
Posts: 15,822
Received 206 Likes on 94 Posts
Not sure that's been released yet, has it?
The investigation report will be out "by March", according to Deakin.
DaveReidUK is offline  
Old 18th Dec 2014, 21:06
  #128 (permalink)  
 
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes on 0 Posts
There is a term that used to be in vogue called 'software corrosion'. Loosely this is the effect of multiple patches on software leading to unexpected errors - which of course lead to more patches. There was concern that the NAS Host software - the Flight Data Processing software - could suffer from this in the late 1980's. A consultancy company was called in which did a thorough audit of the FDPS and NAS Host software. It was found that the approach taken to software maintenance by the Support and Development Organization (as it was then known) had actually improved the software design and there was no evidence of overlapping patches causing issues. Indeed, from the statistics it is normally interfacing systems that cause issues not the Host software itself.
Jovial may be a rather antediluvian language but it is extremely powerful in many ways better than C or C++, and for the processing done in NAS Host it is probably unmatched in suitability. The Host software has now bedded in over forty years and is an extremely reliable set of programs. Any software house that thinks it will be simple to replace should not apply for the job as they do not understand it.
Ian W is offline  
Old 18th Dec 2014, 21:16
  #129 (permalink)  
 
Join Date: Feb 2006
Location: southampton
Posts: 228
Likes: 0
Received 0 Likes on 0 Posts
NAS didn't fail. If it had then Scottish and TC would have also have broken. It was part of the NERC kit in LAC that failed. Which is part of the 1990s software.
1985 is offline  
Old 18th Dec 2014, 23:43
  #130 (permalink)  
 
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by 1985
NAS didn't fail. If it had then Scottish and TC would have also have broken. It was part of the NERC kit in LAC that failed. Which is part of the 1990s software.
I would guess it was the interface to NAS Host from that NERC (New En-Route Centre) kit that was the problem sending some unexpected broken message, one of a set that cannot be rejected or referred rejected.

But let's wait and see. Although it would be interesting to be there working it out, I would expect that all the system engineers already know precisely what happened and who wrote the code responsible.
Ian W is offline  
Old 19th Dec 2014, 08:05
  #131 (permalink)  
 
Join Date: May 2002
Location: uk
Posts: 314
Likes: 0
Received 0 Likes on 0 Posts
I always enjoy reading PPRuNe and though I am a mere PPL I always get a chuckle out of some of the posts by people who obviously have no understanding of how things work but feel the need to post anyway. And now we have a thread about my great love aviation and my lifetime career subject of software development - boy have I had some laughs.

I don't know anything about NATS software or the specific incident but I have worked with big computer systems for 40 years. Of course they'll go wrong - it is absolutely impossible to test for all scenarios and have every single combination of events covered so there is never an outage. A well written system will handle these occurrences with the minimum impact to anyone or anything around them.

It appears NATS had a 45 minute outage, during which time no plane plummeted onto a school/hospital (delete as appropriate). I would say this is a success. The fact that so many people were late for their holidays is not related to this in any real sense - it is, as many people have posted, related to the fact that there is no capacity to absorb the inevitable delays as a result of this outage. Could be a software error, could be a MAYDAY, could be an ash cloud.

I don't know anything about the head of NATS either, but hauling him in front of MPs to explain a 45 minute software outage is one of the most ridiculous things I've heard for a long time. If London had been littered with the carcasses of 747s then maybe he would have something to explain.

Too many people these days install Windows on a laptop and think they know as much about software development as professional programmers - they don't.
vancouv is offline  
Old 19th Dec 2014, 08:28
  #132 (permalink)  
 
Join Date: Jan 2009
Location: HANTS
Posts: 193
Likes: 0
Received 0 Likes on 0 Posts
Well said.The system fell down,the system kept things safe.
GAPSTER is offline  
Old 19th Dec 2014, 08:43
  #133 (permalink)  
 
Join Date: Feb 2006
Location: Hants
Posts: 2,295
Likes: 0
Received 0 Likes on 0 Posts
Ian W, your guess is wrong...

WHBM - lack of information? The airlines have applauded NATS for the flow of information... maybe that didn't filter down to the shop floor.

Friday was a slow news day, which is why the media had a field day. There was more disruption yesterday due to bad weather, but that doesn't get a mention....
anotherthing is offline  
Old 19th Dec 2014, 09:52
  #134 (permalink)  
 
Join Date: Mar 1999
Location: big green wheely bin
Posts: 905
Likes: 0
Received 18 Likes on 1 Post
There could be a job for you, Stan, want to move to Swanwick?
Sounds like they could do with the help!
Jonty is offline  
Old 19th Dec 2014, 13:55
  #135 (permalink)  
 
Join Date: Apr 2010
Location: London
Posts: 7,072
Likes: 0
Received 0 Likes on 0 Posts
National Rail have had 5 major signalling issues out of Paddington in 3 months.................. and it doesn't get a thousandth of the coverage
Heathrow Harry is offline  
Old 19th Dec 2014, 14:02
  #136 (permalink)  
 
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes on 0 Posts
Viewing the Transport Select C'tee meeting, I was amazed at Graham Stringer's longevity in active membership and his ability to recall the "Computer Weekly" criticism of the Swanwick development and their call for an independent inquiry.

Links to that 1997 evidence session at

House of Commons - Environment, Transport and Regional Affairs - Minutes of Evidence

The Computer Weekly submission on the early-warning tests for project failure were good :

Also, having given the matter further thought I believe the following early warning signs of a disaster are particularly pertinent in this case:

— A failure to meet revised deadlines shortly after assurances that all is going well.

— Late changes to the system to accommodate end-users who were not consulted adequately at the beginning.

— A strong resistance to an independent audit.

— A desire to push ahead or even rush to complete the acceptance tests before all the bugs are removed.

— Lack of goodwill among end-users (air traffic controllers are alleged to have walked out recently during a trial of the training and development unit system).

— The original requirements and the technology being superseded because of the repeated delays. The Swanwick centre's requirements are already nearly seven years old.
House of Commons - Environment, Transport and Regional Affairs - Minutes of Evidence
118.70 is offline  
Old 19th Dec 2014, 14:49
  #137 (permalink)  
 
Join Date: Aug 2012
Location: Wales
Posts: 532
Likes: 0
Received 0 Likes on 0 Posts
Would the knock on effects have lasted so long if LHR were allowed to have 'Night-Time Ops' in emergency situations?

Perhaps the other infrastructure of London would not be available... Taxis, Rail Transport, Hotels etc.
phiggsbroadband is offline  
Old 19th Dec 2014, 15:24
  #138 (permalink)  
 
Join Date: Jan 2008
Location: Reading, UK
Posts: 15,822
Received 206 Likes on 94 Posts
Would the knock on effects have lasted so long if LHR were allowed to have 'Night-Time Ops' in emergency situations?
LHR is allowed to handle emergency situations at night. This wasn't one.
DaveReidUK is offline  
Old 19th Dec 2014, 15:24
  #139 (permalink)  
 
Join Date: Jul 2001
Location: London
Posts: 90
Likes: 0
Received 0 Likes on 0 Posts
Would the knock on effects have lasted so long if LHR were allowed to have 'Night-Time Ops' in emergency situations?


I believe the night time curfew was removed on this occasion, I am not sure how many flights operated during the curfew.


I totally agree it was a slow news day and the politicians felt compelled to react for fear of being seen to be doing nothing. A total overreaction. the only thing I think that NATS did wrong was to announce the delay would be a huge outage before it was really clear how long it was actually going to be.
STN Ramp Rat is offline  
Old 19th Dec 2014, 16:25
  #140 (permalink)  
 
Join Date: May 2002
Location: Manchester uk
Posts: 2
Likes: 0
Received 0 Likes on 0 Posts
Swanwick Outage

Three major points.
1) ATC systems failure rates are specified in terms of one failure every hundreds if not thousands of years. Obviously unrealistic but this does not mean the recent outage can be trivialised as some posters seem to suggest.

2) Ian W states "Jovial may be a rather antediluvian language but it is extremely powerful in many ways better than C or C++, and for the processing done in NAS Host it is probably unmatched in suitability. The Host software has now bedded in over forty years and is an extremely reliable set of programs."
Whether C or C++ is better than JOVIAL is debatable. Incidentally the JOVIAL NAS is written in was not a mainstream version. What makes a software language antediluvian ( strange word) depends on a number of factors, not just its technical features, including the availability of a pool of programmers competent in it. As for the reliability of NAS I repeat 30 (actually 40) years of modification and rehosting is not conducive to reliability. Having been instrumental in identifying the need for a UK NAS Support and Development Organisation and then playing a major role in setting up and its work in early years I believe I have the knowledge to express concern about NAS`s life expectancy.
3) I do not know if the recent Swanwick outage was caused by NAS software, some of which predates by over a decade its introduction to the UK. What I do know is that if Swanwick is still dependant on it, this represents a failure of several generations of NATS management to engineer its total replacement.
I will of course, if appropriate, be putting these views with supporting documentation to the relevant bodies.
stanprice is offline  


Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.