Go Back  PPRuNe Forums > Ground & Other Ops Forums > ATC Issues
Reload this Page >

All London airspace closed

Wikiposts
Search
ATC Issues A place where pilots may enter the 'lions den' that is Air Traffic Control in complete safety and find out the answers to all those obscure topics which you always wanted to know the answer to but were afraid to ask.

All London airspace closed

Thread Tools
 
Search this Thread
 
Old 13th Dec 2014, 18:15
  #61 (permalink)  
 
Join Date: Nov 2002
Location: Tapping the Decca, wondering why it's not working.
Age: 75
Posts: 166
Received 0 Likes on 0 Posts
What is needed is a serious level IT expert to be ready to comment on technical explanations as they come through. Any offers?

This article in The Register seems to have expert comments from within NATS. Anonymously of course.

REVEALED: Titsup flight plan mainframe borks UK air traffic control ? The Register
aerobelly is offline  
Old 13th Dec 2014, 18:20
  #62 (permalink)  
 
Join Date: Jan 2008
Location: Reading, UK
Posts: 15,822
Received 206 Likes on 94 Posts
a responsible delinquent line of code has been discovered in amongst 4 million lines of code
So we can rest assured that the next time the Swanwick system goes t*ts-up, it will have been down to one of the remaining 3,999,999 lines ...
DaveReidUK is offline  
Old 13th Dec 2014, 19:44
  #63 (permalink)  
 
Join Date: Jan 2011
Location: Seattle
Posts: 715
Likes: 0
Received 3 Likes on 2 Posts
From The Register:

"Invariably someone puts a flight plan wrong and it borks* the system," one source told El Reg on condition of anonymity.
That should never happen. And (hopefully) the bits that fly the planes are built to a higher standard. One would expect a well written application to skip over the bad data, raise an alarm, log it, but keep going. Something like dividing by zero (Sunk by Windows NT one of my favorite SNAFU examples of poor system implementation) should never lock up an operating system. In a well written app, it shouldn't even slow down processing of the remaining good data.

Part of the problem is cultural. The people who do mainframes have historically guarded their domain from systems engineers and real time software experts. For people who did things like payroll systems, it was acceptable to print the crash report, go through the data and re-punch the defective card. And then run it again. That's just not going to cut it in real time.

*And quit picking on poor Judge Bork.
EEngr is offline  
Old 13th Dec 2014, 19:53
  #64 (permalink)  
 
Join Date: Feb 2003
Location: BHX LXR ASW
Posts: 2,272
Received 5 Likes on 3 Posts
So why on earth should airlines have to fork out compensation claims when it was nothing to do with them? Perhaps they should sue NATS for all the extra night stops crews will 'enjoy'

Passengers will be able to sue airlines over three-hour delays | Money | The Guardian
crewmeal is offline  
Old 13th Dec 2014, 20:00
  #65 (permalink)  
 
Join Date: Mar 2007
Location: In my head
Posts: 694
Likes: 0
Received 0 Likes on 0 Posts
Richard Deakin, CEO, NATS, ... muttered stuff about upgrading resilience.
Upgrade ? Yes my first thought.
Resilience? Yes my second thought. Resilience to previously unanticipated airspace incursions perhaps?

Lots of those making the headlines recently.

Flight planning server fell over, did it? What sort of significant traffic provides no flight plans?

Flight_Plan=Null would need some error handling in the code, throughout the entire 4 million lines, not just the bit some bright spark upgraded in isolation yesterday

Or maybe they hadn't been upgrading resilience, but it's done now

Just a couple of thoughts ...
slip and turn is offline  
Old 13th Dec 2014, 20:05
  #66 (permalink)  
 
Join Date: Oct 2004
Location: Southern England
Posts: 483
Likes: 0
Received 0 Likes on 0 Posts
That should never happen. And (hopefully) the bits that fly the planes are built to a higher standard.
They are actually built to the same standard but it's a standard that allows different degrees of rigour depending upon the criticality of the component being produced.

One would expect a well written application to skip over the bad data, raise an alarm, log it, but keep going.
Exactly how does that work in a near real time system when the data actually represents a real life event? That bit of data is bad so I'll just skip over it , send an e-mail to someone and keep going. Do you tell the controller? If so how long do you think it will be before he decides I'm not sure I can trust this I'd better put some traffic restrictions in? If not who becomes accountable when something bad happens because the information the controller is seeing doesn't actually reflect real life?
eglnyt is offline  
Old 13th Dec 2014, 20:13
  #67 (permalink)  
 
Join Date: Mar 2007
Location: In my head
Posts: 694
Likes: 0
Received 0 Likes on 0 Posts
Exactly how does that work in a near real time system when the data actually represents a real life event? That bit of data is bad so I'll just skip over it , send an e-mail to someone and keep going. Do you tell the controller? If so how long do you think it will be before he decides I'm not sure I can trust this I'd better put some traffic restrictions in? If not who becomes accountable when something bad happens because the information the controller is seeing doesn't actually reflect real life?
It's called error handling and it is an absolutely critical part of any computer program. If a line of code receives unanticipated data (which may not be 'bad' per se), that unforeseen use case needs to have been foreseen by whomever put together the program spec, whomever agreed the program spec, whomever designed the logic that was intended to handle it flawlessly in the code, whomever checked it, whomever tested it, and whomever signed off on the project or module or upgrade, but one or all six or sixty of whom we all now know was mistaken. And there's the rub

So now we are told that a single line of code stopped the machine, what actually was it in the real time real life world that was unforeseen? That would be the real story.

If I was anotherthing or Gonzo or Zooker or eglnyt et al, I'd have asked that one at the office by now

Last edited by slip and turn; 13th Dec 2014 at 20:35.
slip and turn is offline  
Old 13th Dec 2014, 20:37
  #68 (permalink)  
 
Join Date: Jul 2007
Location: Auckland, NZ
Age: 79
Posts: 722
Likes: 0
Received 0 Likes on 0 Posts
In the report on the BBC website on this event, it was said that the ATC systems were operating at 98%, or 99%, of capacity. That is surely most of the problem. If stuff happens--and stuff happens, it's an axiom--it's much easier to cope with if you have spare capacity.

And why don't we have spare capacity, in all sorts of systems? Cost reduction, of course. So if, and only if, the head of NATS is responsible for running the system at maximum capacity, he should resign. But I expect it's the paymasters who really are responsible, and the fact that it's a public-private partnership, one of the advantages of which is private sector financial discipline. That is, running everything at maximum capacity all the time. Why have such big engines? Run them at take off power all the time. Why plan for engine failure? Plan for success.
FlightlessParrot is offline  
Old 13th Dec 2014, 20:45
  #69 (permalink)  
 
Join Date: Jan 2010
Location: UK
Posts: 182
Received 0 Likes on 0 Posts
Originally Posted by Downwind Lander
BBC News Channel. Saturday. 1700 GMT.

Richard Deakin, CEO, NATS, says that a responsible delinquent line of code has been discovered in amongst 4 million lines of code.
IIRC Richard Deakin also said that the code had been corrected. Given the short space of time since the original fault I have my doubts about how well the "correction" has been tested. I just hope that there won't be a further problem down the line in a few months as a result of the "correction".
SamYeager is offline  
Old 13th Dec 2014, 21:12
  #70 (permalink)  
 
Join Date: Jan 2010
Location: France
Posts: 527
Received 13 Likes on 7 Posts
Is there a simulation exercise to explore 'What if there was a complete situation failure' and what actions could be taken to keep the situation stable rather than immediately go into lockdown?

And if there isn't one, or a regular rethinking of this situation, surely there should be ... or are we just so confident (or that blinkered) that such a situation cannot be envisaged as a potential reality?
Alsacienne is offline  
Old 13th Dec 2014, 22:36
  #71 (permalink)  
 
Join Date: Oct 2007
Location: 30 Miles from the A1
Posts: 488
Likes: 0
Received 10 Likes on 5 Posts
The NATS system was down for 36 minutes or so - for the rest of the minutes in the year everything was fine. The reason the recovery, which led to the misery, expense and frustration, took so long is largely due to the whole aviation sector in the London area at running at 98% (I'm not sure the BBC post about NATS running at 98% is strictly accurate) and it cannot cope with any interference. That interference may be a runway closure due to a BA returning with an engine cowl, a security scare in a terminal or a drop in temperature of a degree that wasn't forecast leading to snow instead of sleet or thick fog resulting in LVPs. It is unacceptable that NATS had a failure with such severe consequences, but as previous posters have pointed out nothing is infallible and the UK's continued dallying over airport expansion is one of the real culprits in this incident. The Government and the opposition have all been bumping their gums over this issue whilst they collectively dance round their handbags over aviation infrastructure improvements in the SE. And so it will continue........
2Planks is offline  
Old 14th Dec 2014, 00:16
  #72 (permalink)  
 
Join Date: Sep 1999
Location: Lincolnshire
Age: 73
Posts: 18
Likes: 0
Received 0 Likes on 0 Posts
All London airspace closed

http://www.telegraph.co.uk/news/worldnews/europe/sweden/11292095/Foreign-military-aircraft-nearly-collides-with-passenger-plane-over-Sweden.html

Hmm same day... What a coincidence.
The Privateer is offline  
Old 14th Dec 2014, 07:59
  #73 (permalink)  
 
Join Date: Jul 2013
Location: currently unsure
Posts: 13
Likes: 0
Received 0 Likes on 0 Posts
It looks to me like the training in unusual circumstances and emergencies (TRUCE) paid off and this major outage was handled calmly, smoothly and professionally .

Quote:
That should never happen. And (hopefully) the bits that fly the planes are built to a higher standard.
They are actually built to the same standard but it's a standard that allows different degrees of rigour depending upon the criticality of the component being produced.
This is an interesting point. Although the software standards for avionics and air traffic control have the same origin there are some subtle but important differences.

The most significant in my mind is the prolific use of so called Commercial off the Shelf - COTS - software that includes things like general purpose operating systems (Windows, Linux etc.) and device drivers. We can also refer to much of this as Software of an Unknown Pedigree (SOUP).

Historically not much COTS/SOUP is used in aircraft avionic systems but is widely used in air traffic control systems. This is because a COTS/SOUP license is usually less than 10% the cost of a special OS licence and ATC systems are generally thought to be less critical.

I don't think we have got to a point where we really understand the risks posed by SOUP. The analysis is done blindfolded so its a bit like playing the lottery (or if I want to be dramatic Russian roulette). The worry is that more SOUP is finding its way into aircraft systems and AFAIK the analysis techniques are the same as for the ground ones.

Last edited by wasthatit; 5th Feb 2015 at 22:38.
wasthatit is offline  
Old 14th Dec 2014, 08:38
  #74 (permalink)  
 
Join Date: Aug 2009
Location: wimborne
Posts: 30
Likes: 0
Received 0 Likes on 0 Posts
2 planks has hit the nail on the head. How many flights will airlines cancel on an
Unexpectedly foggy morning or if a runway is blocked or if a foreign atc unit strikes . Let's get this into some perspective shall we . Whilst unfortunate and deeply embarrassing for NATS it is just a minor blip in a system which operates well for 99.999 % of the time. The problem with this site now is it is populated by the Facebook / Twitter generation who seem to have little ability to think critically or look beyond a two line headline! And our politicians pander to this by issuing ridiculous statements to the press. And I do wonder why NATS is not a little more robust in its defence of the situation.
blueskythinking is offline  
Old 14th Dec 2014, 08:42
  #75 (permalink)  
Guest
 
Join Date: May 2008
Location: Somewhere between E17487 and F75775
Age: 80
Posts: 725
Likes: 0
Received 0 Likes on 0 Posts
Is there a simulation exercise to explore 'What if there was a complete situation failure'

AS one who has sat through more spacecraft launch/early orbit phase simulations than I can remember, may I point out that the anomalous scenario which actually happens is never the one you spent hours and hours rehearsing for.

Furthermore, having a back-up or simulation facility means synchronising it with the in-use facility so that both are running on identical hardware and software. Given human nature, this is extremely difficult, more so with the software than the hardware. Providing systems so that the backup facility is continuously updated from the in-use facility can trigger yet more problems.

Personally, given the Government dithering mentioned here and the lack of resources, I think NATS do an excellent job most of the time.
OFSO is offline  
Old 14th Dec 2014, 09:30
  #76 (permalink)  
 
Join Date: Nov 2008
Location: Wales
Age: 44
Posts: 34
Likes: 0
Received 0 Likes on 0 Posts
I heard a radio presenter, Petrie Hoskins on LBC this morning, stating her incredulity that NATS software is from the 1960's therefore her mobile phone has more computer power than the system that run our air traffic service.

The radio is now in the garden and I have a large hole in the window.
Norman.D.Landing is offline  
Old 14th Dec 2014, 09:38
  #77 (permalink)  
 
Join Date: Jul 2000
Location: London
Posts: 1,256
Likes: 0
Received 0 Likes on 0 Posts
I say again and again the extra runway will overload ATC even more than it is already.
4Greens is offline  
Old 14th Dec 2014, 09:51
  #78 (permalink)  
 
Join Date: Mar 2007
Location: In my head
Posts: 694
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by OFSO
Personally, given the Government dithering mentioned here and the lack of resources, I think NATS do an excellent job most of the time.
Lack of resources? At NATS ? That's an interesting one.

What was it someone said earlier that the "new" centre at Swanwick cost?

A couple of billion ?

And how much is salted away in the taxpayer funded NATS pension fund ?

And who bought a 21% stake of NATS in September 2013 from a bundle of airlines who bailed out ? Aviation people ? Nope, pensions people - specialists in pensions funded by another other great UK gravy train/feed trough, the great British higher education system (sic)! And now they (USS Sherwood - with the starship sounding name that you can use to deflect your own curiosity at this point and look no further if you wish) seem to own half of the Airlines Group 42% share of NATS!

I think I read somewhere before Swanwick was operational that the NATS pension scheme held funds of over £3BN. Its probably nearer double that now.

Anyone who knows here who might tell us?

Control of a heap of money like that sitting sweetly doing nothing except endlessly attracting more and more taxpayer contributions makes the everyday business of finding resources to push tin look a bit dulled now in comparison.
slip and turn is offline  
Old 14th Dec 2014, 10:06
  #79 (permalink)  
 
Join Date: Mar 2008
Location: Orpington
Posts: 47
Likes: 0
Received 0 Likes on 0 Posts
Back in the early 70's, the telegram service was automated. To tell the computer that the end of message had been reached four N's (NNNN) was sent.

This was OK until someone sent lower case N four times (,,,,) in part of the telegram which was the same code as N. The difference being the lower shift code was sent prior to the ,,,,.
To overcome the problem, all operators connected to the system were instructed to put the lower case code in between the ,,,, which would not show up on the telegram and stopped the message ending prematurely.

It is often the simple things that catch you out.
SLF-Flyer is offline  
Old 14th Dec 2014, 10:10
  #80 (permalink)  
 
Join Date: Jan 2004
Location: London
Posts: 654
Received 9 Likes on 5 Posts
Airlines recognise the need for asset replacement in RP2 but expect NERL to sweat the assets harder given the present financial constraints
- Taken from the RP2 consultation working group.

Costs are falling year on year. Are the delays still below the agreed targets?


Ps. 2planks, spot on.
Del Prado is offline  


Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.