PPRuNe Forums - View Single Post - DUB ATC Outage down to IT Cock Up
View Single Post
Old 28th Jul 2008, 18:33
  #6 (permalink)  
Token Sane Person
 
Join Date: Apr 2006
Location: UK
Posts: 15
Likes: 0
Received 0 Likes on 0 Posts
Engineers perspective

Reading between the lines of the press reports, it sounds like the problem was an *intermittent* failure in the network card. A bad card can bring down an entire LAN, and exactly that kind of incident was responsible for the LAX customs network failure last year. However that is exactly the kind of issue that dual redundant LANS are supposed to protect against.

In this case however it seems that the primary network kept going down and coming back up (technically known as "yoyo mode"). I guess that the system kept switching back to the primary, and then having to switch to backup again when the primary went back down.

Of course this is not acceptable behaviour; the Right Thing is to stay on the backup until someone has diagnosed and corrected the problem with the primary, or at least left it going long enough that the fault seems unlikely to recur quickly. But that is easy to say in hindsight. And actually testing for this kind of intermittent fault is a non-trivial exercise itself. You can't just nip down to the high street and buy a NIC with this precise fault. So I'm inclined to cut my Irish colleagues a bit of slack, and ask if they could please publish a post-mortem once the facts are known.
Token Sane Person is offline