PPRuNe Forums

PPRuNe Forums (https://www.pprune.org/)
-   ATC Issues (https://www.pprune.org/atc-issues-18/)
-   -   All London airspace closed (https://www.pprune.org/atc-issues/552783-all-london-airspace-closed.html)

aerobelly 13th Dec 2014 18:15

What is needed is a serious level IT expert to be ready to comment on technical explanations as they come through. Any offers?

This article in The Register seems to have expert comments from within NATS. Anonymously of course.

REVEALED: Titsup flight plan mainframe borks UK air traffic control ? The Register

DaveReidUK 13th Dec 2014 18:20


a responsible delinquent line of code has been discovered in amongst 4 million lines of code
So we can rest assured that the next time the Swanwick system goes t*ts-up, it will have been down to one of the remaining 3,999,999 lines ...

EEngr 13th Dec 2014 19:44

From The Register:


"Invariably someone puts a flight plan wrong and it borks* the system," one source told El Reg on condition of anonymity.
That should never happen. And (hopefully) the bits that fly the planes are built to a higher standard. One would expect a well written application to skip over the bad data, raise an alarm, log it, but keep going. Something like dividing by zero (Sunk by Windows NT one of my favorite SNAFU examples of poor system implementation) should never lock up an operating system. In a well written app, it shouldn't even slow down processing of the remaining good data.

Part of the problem is cultural. The people who do mainframes have historically guarded their domain from systems engineers and real time software experts. For people who did things like payroll systems, it was acceptable to print the crash report, go through the data and re-punch the defective card. And then run it again. That's just not going to cut it in real time.

*And quit picking on poor Judge Bork.

crewmeal 13th Dec 2014 19:53

So why on earth should airlines have to fork out compensation claims when it was nothing to do with them? Perhaps they should sue NATS for all the extra night stops crews will 'enjoy'

Passengers will be able to sue airlines over three-hour delays | Money | The Guardian

slip and turn 13th Dec 2014 20:00


Richard Deakin, CEO, NATS, ... muttered stuff about upgrading resilience.
Upgrade ? Yes my first thought.
Resilience? Yes my second thought. Resilience to previously unanticipated airspace incursions perhaps?

Lots of those making the headlines recently.

Flight planning server fell over, did it? What sort of significant traffic provides no flight plans?

Flight_Plan=Null would need some error handling in the code, throughout the entire 4 million lines, not just the bit some bright spark upgraded in isolation yesterday :hmm:

Or maybe they hadn't been upgrading resilience, but it's done now :ok:

Just a couple of thoughts ...

eglnyt 13th Dec 2014 20:05


That should never happen. And (hopefully) the bits that fly the planes are built to a higher standard.
They are actually built to the same standard but it's a standard that allows different degrees of rigour depending upon the criticality of the component being produced.


One would expect a well written application to skip over the bad data, raise an alarm, log it, but keep going.
Exactly how does that work in a near real time system when the data actually represents a real life event? That bit of data is bad so I'll just skip over it , send an e-mail to someone and keep going. Do you tell the controller? If so how long do you think it will be before he decides I'm not sure I can trust this I'd better put some traffic restrictions in? If not who becomes accountable when something bad happens because the information the controller is seeing doesn't actually reflect real life?

slip and turn 13th Dec 2014 20:13


Exactly how does that work in a near real time system when the data actually represents a real life event? That bit of data is bad so I'll just skip over it , send an e-mail to someone and keep going. Do you tell the controller? If so how long do you think it will be before he decides I'm not sure I can trust this I'd better put some traffic restrictions in? If not who becomes accountable when something bad happens because the information the controller is seeing doesn't actually reflect real life?
It's called error handling and it is an absolutely critical part of any computer program. If a line of code receives unanticipated data (which may not be 'bad' per se), that unforeseen use case needs to have been foreseen by whomever put together the program spec, whomever agreed the program spec, whomever designed the logic that was intended to handle it flawlessly in the code, whomever checked it, whomever tested it, and whomever signed off on the project or module or upgrade, but one or all six or sixty of whom we all now know was mistaken. And there's the rub :ooh:

So now we are told that a single line of code stopped the machine, what actually was it in the real time real life world that was unforeseen? That would be the real story.

If I was anotherthing or Gonzo or Zooker or eglnyt et al, I'd have asked that one at the office by now :}

FlightlessParrot 13th Dec 2014 20:37

In the report on the BBC website on this event, it was said that the ATC systems were operating at 98%, or 99%, of capacity. That is surely most of the problem. If stuff happens--and stuff happens, it's an axiom--it's much easier to cope with if you have spare capacity.

And why don't we have spare capacity, in all sorts of systems? Cost reduction, of course. So if, and only if, the head of NATS is responsible for running the system at maximum capacity, he should resign. But I expect it's the paymasters who really are responsible, and the fact that it's a public-private partnership, one of the advantages of which is private sector financial discipline. That is, running everything at maximum capacity all the time. Why have such big engines? Run them at take off power all the time. Why plan for engine failure? Plan for success.

SamYeager 13th Dec 2014 20:45


Originally Posted by Downwind Lander (Post 8781436)
BBC News Channel. Saturday. 1700 GMT.

Richard Deakin, CEO, NATS, says that a responsible delinquent line of code has been discovered in amongst 4 million lines of code.

IIRC Richard Deakin also said that the code had been corrected. Given the short space of time since the original fault I have my doubts about how well the "correction" has been tested. I just hope that there won't be a further problem down the line in a few months as a result of the "correction".

Alsacienne 13th Dec 2014 21:12

Is there a simulation exercise to explore 'What if there was a complete situation failure' and what actions could be taken to keep the situation stable rather than immediately go into lockdown?

And if there isn't one, or a regular rethinking of this situation, surely there should be ... or are we just so confident (or that blinkered) that such a situation cannot be envisaged as a potential reality?

2Planks 13th Dec 2014 22:36

The NATS system was down for 36 minutes or so - for the rest of the minutes in the year everything was fine. The reason the recovery, which led to the misery, expense and frustration, took so long is largely due to the whole aviation sector in the London area at running at 98% (I'm not sure the BBC post about NATS running at 98% is strictly accurate) and it cannot cope with any interference. That interference may be a runway closure due to a BA returning with an engine cowl, a security scare in a terminal or a drop in temperature of a degree that wasn't forecast leading to snow instead of sleet or thick fog resulting in LVPs. It is unacceptable that NATS had a failure with such severe consequences, but as previous posters have pointed out nothing is infallible and the UK's continued dallying over airport expansion is one of the real culprits in this incident. The Government and the opposition have all been bumping their gums over this issue whilst they collectively dance round their handbags over aviation infrastructure improvements in the SE. And so it will continue........

The Privateer 14th Dec 2014 00:16

All London airspace closed
 
http://www.telegraph.co.uk/news/worldnews/europe/sweden/11292095/Foreign-military-aircraft-nearly-collides-with-passenger-plane-over-Sweden.html

Hmm same day... What a coincidence.

wasthatit 14th Dec 2014 07:59

It looks to me like the training in unusual circumstances and emergencies (TRUCE) paid off and this major outage was handled calmly, smoothly and professionally :D.


Quote:
That should never happen. And (hopefully) the bits that fly the planes are built to a higher standard.
They are actually built to the same standard but it's a standard that allows different degrees of rigour depending upon the criticality of the component being produced.
This is an interesting point. Although the software standards for avionics and air traffic control have the same origin there are some subtle but important differences.

The most significant in my mind is the prolific use of so called Commercial off the Shelf - COTS - software that includes things like general purpose operating systems (Windows, Linux etc.) and device drivers. We can also refer to much of this as Software of an Unknown Pedigree (SOUP).

Historically not much COTS/SOUP is used in aircraft avionic systems but is widely used in air traffic control systems. This is because a COTS/SOUP license is usually less than 10% the cost of a special OS licence and ATC systems are generally thought to be less critical.

I don't think we have got to a point where we really understand the risks posed by SOUP. The analysis is done blindfolded so its a bit like playing the lottery (or if I want to be dramatic Russian roulette). The worry is that more SOUP is finding its way into aircraft systems and AFAIK the analysis techniques are the same as for the ground ones.

blueskythinking 14th Dec 2014 08:38

2 planks has hit the nail on the head. How many flights will airlines cancel on an
Unexpectedly foggy morning or if a runway is blocked or if a foreign atc unit strikes . Let's get this into some perspective shall we . Whilst unfortunate and deeply embarrassing for NATS it is just a minor blip in a system which operates well for 99.999 % of the time. The problem with this site now is it is populated by the Facebook / Twitter generation who seem to have little ability to think critically or look beyond a two line headline! And our politicians pander to this by issuing ridiculous statements to the press. And I do wonder why NATS is not a little more robust in its defence of the situation.

OFSO 14th Dec 2014 08:42

Is there a simulation exercise to explore 'What if there was a complete situation failure'

AS one who has sat through more spacecraft launch/early orbit phase simulations than I can remember, may I point out that the anomalous scenario which actually happens is never the one you spent hours and hours rehearsing for.

Furthermore, having a back-up or simulation facility means synchronising it with the in-use facility so that both are running on identical hardware and software. Given human nature, this is extremely difficult, more so with the software than the hardware. Providing systems so that the backup facility is continuously updated from the in-use facility can trigger yet more problems.

Personally, given the Government dithering mentioned here and the lack of resources, I think NATS do an excellent job most of the time.

Norman.D.Landing 14th Dec 2014 09:30

I heard a radio presenter, Petrie Hoskins on LBC this morning, stating her incredulity that NATS software is from the 1960's therefore her mobile phone has more computer power than the system that run our air traffic service. :mad:

The radio is now in the garden and I have a large hole in the window. :}

4Greens 14th Dec 2014 09:38

I say again and again the extra runway will overload ATC even more than it is already.

slip and turn 14th Dec 2014 09:51


Originally Posted by OFSO
Personally, given the Government dithering mentioned here and the lack of resources, I think NATS do an excellent job most of the time.

Lack of resources? At NATS ? That's an interesting one.

What was it someone said earlier that the "new" centre at Swanwick cost?

A couple of billion ?

And how much is salted away in the taxpayer funded NATS pension fund ?

And who bought a 21% stake of NATS in September 2013 from a bundle of airlines who bailed out ? Aviation people ? Nope, pensions people - specialists in pensions funded by another other great UK gravy train/feed trough, the great British higher education system (sic)! And now they (USS Sherwood - with the starship sounding name that you can use to deflect your own curiosity at this point and look no further if you wish) seem to own half of the Airlines Group 42% share of NATS!

I think I read somewhere before Swanwick was operational that the NATS pension scheme held funds of over £3BN. Its probably nearer double that now.

Anyone who knows here who might tell us?

Control of a heap of money like that sitting sweetly doing nothing except endlessly attracting more and more taxpayer contributions makes the everyday business of finding resources to push tin look a bit dulled now in comparison.

SLF-Flyer 14th Dec 2014 10:06

Back in the early 70's, the telegram service was automated. To tell the computer that the end of message had been reached four N's (NNNN) was sent.

This was OK until someone sent lower case N four times (,,,,) in part of the telegram which was the same code as N. The difference being the lower shift code was sent prior to the ,,,,.
To overcome the problem, all operators connected to the system were instructed to put the lower case code in between the ,,,, which would not show up on the telegram and stopped the message ending prematurely.

It is often the simple things that catch you out.

Del Prado 14th Dec 2014 10:10


Airlines recognise the need for asset replacement in RP2 but expect NERL to sweat the assets harder given the present financial constraints
- Taken from the RP2 consultation working group.

Costs are falling year on year. Are the delays still below the agreed targets?


Ps. 2planks, spot on.


All times are GMT. The time now is 13:29.


Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.