PPRuNe Forums

PPRuNe Forums (https://www.pprune.org/)
-   ATC Issues (https://www.pprune.org/atc-issues-18/)
-   -   DUB ATC Outage down to IT Cock Up (https://www.pprune.org/atc-issues/335744-dub-atc-outage-down-cock-up.html)

ground_star 18th Jul 2008 18:58

DUB ATC Outage down to IT Cock Up
 
So it turns out all the problems at EIDW are down to an IT cockup at Thales - the faulty component was no more than a circa £50 network card!

The following from a popular IT news website:-

An air traffic control fault that brought Dublin airport to its knees last week has been traced to an intermittently flakey network card.

Thales ATM, the makers of Dublin’s ATC system, conducted a review of the system, and after crawling around the airport with their little torches, “confirmed the root cause of the hardware system malfunction as an intermittent malfunctioning network card which consequently overcame the built-in system redundancy”. The flakey card had been responsible for previous problems since June 2.

So, problem solved? Er, sadly not. The IAA has slapped in further monitor tools, and plans “an enhancement” to the failure recover system. But whatever happens, the system will need to be revalidated, which could take weeks. In the meantime, it will “slowly add capacity“, but for safety reasons “will not operate the system to its limit until the system has been re-validated”.

Which is just what Irish travellers will want to hear as the country builds up to its August bank holiday travel season.

Just to completely cover its backside, it added: "Factors outside the direct control of the Irish Aviation Authority, however, such as weather or congestion in European airspace, may also contribute to flight delays."

So, be warned, your stag party may still be thrown off schedule if Dublin is caught unawares by a deluge of rain, a surprise papal visit or Father Fay seeing his reflection in the cockpit window.


Hey ho! At least the flippin' thing doesnt rely on Windows...

jumparound 18th Jul 2008 23:04

There are 50 network cards in use in Dublin alone.:eek:

Thales advise that the patch to allow the system to recognise a conflict and prevent the system going into Emergency Mode will not be available for install till the END OF AUGUST so we could see another 5 Weeks of delays.:ok:

But I'm sure The IAA will rush us back to 100% just because it will look better in the papers.:mad:

Anyone notice there news briefings are the same day after day just with the date changed not very good PR work.:ugh:

SM4 Pirate 19th Jul 2008 01:20

Is it just me or has this problem shown up before; especially in Australia. If Thales are still selling suspect equipment without proper redundancy then they should be held accountable.

ock1f 19th Jul 2008 09:16

It also states in the press release that Thales told the IAA it has not happened anywhere else with this system. Well all the have to do is look within their own company 'cos it happened in Shannon a few months back(feb?) . One network card was being replaced in a non-active suite and the new card was faulty so it shut the ENTIRE system down for 40 mins and did it again a few hours later.
Validated system with all safety checks..i dont thinks so.

I wonder who signed their name to all the official bits of paper confirming its infallibilty?

Ock1f

ock1f 19th Jul 2008 09:22

Its a big pity this all happening 'cos it makes us and the iAA look like muppets to the general public. When in fact all we want to be able and allowed to do ,is to give the best most efficient and quality service that we can and do when staffing equipment etc allow.

Controllers by their very nature want to expidite and move the max amount of traffic in the most efficient manner cos thats why we are here and you also get the traffic out of your hair quicker!

Pity we are being constrained in that and having to deal with major system ,safety,staff and procedure issues.

Just my 0.02

Ock

Token Sane Person 28th Jul 2008 18:33

Engineers perspective
 
Reading between the lines of the press reports, it sounds like the problem was an *intermittent* failure in the network card. A bad card can bring down an entire LAN, and exactly that kind of incident was responsible for the LAX customs network failure last year. However that is exactly the kind of issue that dual redundant LANS are supposed to protect against. :ugh:

In this case however it seems that the primary network kept going down and coming back up (technically known as "yoyo mode"). I guess that the system kept switching back to the primary, and then having to switch to backup again when the primary went back down.

Of course this is not acceptable behaviour; the Right Thing is to stay on the backup until someone has diagnosed and corrected the problem with the primary, or at least left it going long enough that the fault seems unlikely to recur quickly. But that is easy to say in hindsight. And actually testing for this kind of intermittent fault is a non-trivial exercise itself. You can't just nip down to the high street and buy a NIC with this precise fault. So I'm inclined to cut my Irish colleagues a bit of slack, and ask if they could please publish a post-mortem once the facts are known.

Mr Ron 23rd Aug 2008 03:58

Good summary TSP
 
Its good to hear some rational thought process being applied to a situation that never should have occurred. Dual redundancy issues have been with us for many years and it seems that whilst the Reliability Block Diagrams for the system show a good mathematical model, the changes of technology always manage to throw in one or two technical spanners. This will always be the case, its the recovery process (as TSP said) that should be the centre of development.

Thales, like everyone else will try to use as simple as possible components, based on the principle that 'complexity' breeds further 'complexity' and simplicity breeds good understanding usually. The down side is that simplicity is usually commensurate with mass production and this will therefore introduce the probability factor that there will be one bad egg out there just waiting to rear its ugly head.

As a point of view from working extensively with Thales over the years; my criticism with their system is that its designed by engineers and is really best suited to be operated by engineers, which is of course completely wrong. Even Thales admit their 'user' input into the ergonomics and friendliness of usage was extremely low, especially with the EuroCat 2000E systems and its derivatives.

Raytheon certainly beat Thales in this respect, but there are issues here too. However that is best placed in another thread. As far as the Thales IAA system is concerned, there were many flaws with the first delivery and without the European backing for funds this system would never have been introduced. But it is what it is and they have to make the most of it - question is, will they learn from the experience in 10 years time?


All times are GMT. The time now is 11:42.


Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.