PPRuNe Forums

PPRuNe Forums (https://www.pprune.org/)
-   Rumours & News (https://www.pprune.org/rumours-news-13/)
-   -   'System outage' grounds Delta flights (https://www.pprune.org/rumours-news/582700-system-outage-grounds-delta-flights.html)

Dimitri Cherchenko 8th Aug 2016 09:17

'System outage' grounds Delta flights
 
Delta airlines says all flights suspended "due to system outage nationwide"

'System outage' grounds Delta flights - BBC News

Twitter

Porky Speedpig 8th Aug 2016 12:12

Must be a nightmare for all concerned and affected - any word as to the root cause?

bafanguy 8th Aug 2016 12:15

This:

http://finance.yahoo.com/news/delta-...111236208.html

log0008 8th Aug 2016 12:20

8am and the world's busiest airport is very quiet! Not a good week for those traveling though major airports

Porky Speedpig 8th Aug 2016 12:24

Wow, that's one heck of a power outage. No doubt 2-3 back up power systems too so it will be interesting to see why one couldn't kick in.

crablab 8th Aug 2016 12:58

Surely in the age of the cloud they should have multiple data centres?!

Alanwsg 8th Aug 2016 13:12

Power cut crashes Delta's worldwide flight update systems ? The Register

OldLurker 8th Aug 2016 16:12

From experience elsewhere (not being able to see behind the curtain at Delta) I'd hazard a guess that their IT guys have been beating on the management for years to get proper hot fall-back for what is nowadays a mission-critical system, but management (supported by bean-counters) have been stalling on the necessary investment.

OTOH, they may have all the fall-back in the world but they've never actually exercised it properly, so when it's called on for real, it doesn't work ...

Lonewolf_50 8th Aug 2016 16:21


Originally Posted by OldLurker (Post 9467140)
OTOH, they may have all the fall-back in the world but they've never actually exercised it properly, so when it's called on for real, it doesn't work ...

I wonder. A whole lot of folks ran "business continuity plans" and tests before the dreaded "Y2K" event (which wasn't all that it threatened to be) so perhaps management consider that event the IT guys "crying wolf."

So now they lost a bit of this week's flock/wool ...

Porky Speedpig 8th Aug 2016 17:01

A veritable squadron of off schedule Delta birds on the Atlantic now most heading for JFK and ATL and arriving in quick succession. Likely to be a nightmare at CBP.

Derfred 8th Aug 2016 17:17


the dreaded "Y2K" event (which wasn't all that it threatened to be)
Y2J wasn't all that it threatened to be because they identified it beforehand and fixed it (at huge cost in some areas). You obviously missed that bit.

PAXboy 8th Aug 2016 17:24

Correct Derfred. Y2K was fixed in time but it suits everyone to say we cryed wolf. Then it is easier to not give credit where it's due.

Correct OldLurker. I was in IT for 27 years and because it works 99.9% of the time, they think it will be acceptable when it fails. As they say in the airline industry, "If you think preventative maintenance is expensive - try a crash for size" For years I saw IT starved of investment and then, when it did go wrong, they gave us the money we'd been asking for.

esa-aardvark 8th Aug 2016 18:00

Y2K, I made a lot of money out of this. I invested a smallish amount into a company
working on Y2K solutions. Then I forgot about it for a couple of years, investment
had improved in value by about 8 times. On the subject of power supplies the last Data
Centre which I managed had 2*Ships diesels, each capable of carrying the mainframe &
ancilliary equipment load. I wonder in the case of Delta if it was some equipment other
than the computers which failed. I remember in NZ a few years ago the highly robust
point of sale network went down when someone found and cut the only "single point failure" cable.

c52 8th Aug 2016 18:06

My near-invariable experience in IT was that there would be a total failure when the annual test of the uninterruptible power supply took place.

Ian W 8th Aug 2016 18:28

I wonder how long it will be for Delta to have a redundant system set up say at MSP? If they had done that the failure could have been transparent to the airline apart from the people directly involved at ATL. It seems that all the airline beancounters would prefer to upset their customers and give dispatch a really hard problem to solve at vast expense to the airline (just think of the EU mandated payments!) rather than have an efficient system that is fault tolerant. Perhaps, if IT had asked for the back up and it been refused the costs of the global ground stop and recovery could be put on the accountancy head count budget? That might concentrate their minds.

Smott999 8th Aug 2016 18:35

What's interesting is Georgia Power is denying that there was an outage at all. None of their customers had a loss and all their equipment was running.
Delta called them to have them look at a master switch of some sort which had failed. Hmm.

What is that called.....single point of, something.....what is it again?

If it is something that silly and bad, heads will roll.

vector4fun 8th Aug 2016 18:40

We had a lightning induced power failure years ago. Also had a bank of batteries and diesel generators to take over that failed to work. Seems the folks that maintained the UPS system monthly failed to disconnect the dummy load used for testing.....

Smott999 8th Aug 2016 18:52

I used to work at an intl bank, and every 6-9 months we had to do a full DIsaster recovery drill. Took it very seriously. If these guys had their whole global network sitting on one switch....yikes.

Lonewolf_50 8th Aug 2016 18:55


Originally Posted by Derfred (Post 9467214)
Y2J wasn't all that it threatened to be because they identified it beforehand and fixed it (at huge cost in some areas). You obviously missed that bit.

No, I didn't miss anything as I was involved in three different BCP's for Y2K. Thanks for playing. I just didn't add all of the bloody detail to that post, so perhaps I overdid the brevity.


My point is that some in current management, who probably weren't in management then, may perceive through MBA eyes that Y2K was "crying wolf" since they didn't know what it took to mitigate it. As you are probably aware, management "up and out" and medium to high turnover is common.

Joe_K 8th Aug 2016 19:51

Ars Technica has this:

"According to the flight captain of JFK-SLC this morning, a routine scheduled switch to the backup generator this morning at 2:30am caused a fire that destroyed both the backup and the primary. "

If true: oops.

Data center disaster disrupts Delta Air Lines | Ars Technica

Smott999 8th Aug 2016 20:19

So no separate data center? Or backup system on a different power circuit?

ex-EGLL 8th Aug 2016 20:29


We had a lightning induced power failure years ago. Also had a bank of batteries and diesel generators to take over that failed to work. Seems the folks that maintained the UPS system monthly failed to disconnect the dummy load used for testing.....
Or my personal favorite from ATC. We religiously switched to the standby generator on the first Sunday of the month to make sure all was good, it always was. One day we lost power, generator fired up and all was good...... for a couple of minutes, then it went very dark and very quiet.

Seems the monthly startup / shutdown checklists made no mention of fuel quantity !!!

FakePilot 8th Aug 2016 20:33

When everything is fully redundant, Murphy refocuses his effort on the part that controls both.

G-CPTN 8th Aug 2016 20:37


Originally Posted by ex-EGLL (Post 9467393)
Or my personal favorite from ATC. We religiously switched to the standby generator on the first Sunday of the month to make sure all was good, it always was. One day we lost power, generator fired up and all was good...... for a couple of minutes, then it went very dark and very quiet.

Seems the monthly startup / shutdown checklists made no mention of fuel quantity !!!

I think that was what 'sunk' New Orleans during Katrina - the auxiliary generators that were backup for the pumps 'ran out of fuel'.

Ian W 8th Aug 2016 20:59


Originally Posted by Smott999 (Post 9467383)
So no separate data center? Or backup system on a different power circuit?

This is the fact that makes no sense. If you are running a global system and it must be up and running 24/7 then you must not have all your systems in one place. Not only should there be a separate backup system it must be a geographically separate backup system ideally in another State - Delta could have theirs at MSP. Both the ATL and MSP systems should be running in parallel, with their own local backup for fault tolerance. Both should be able to support all operations 24/7, but when both are up they could share the load and keep each other in synch. Banks do this all the time. If ATL had gone down under this scheme only those people in the ATL center would have noticed. All users would have the same service from the one site.

As said before this sounds like a beancounter enforced lack of fault tolerance.

Peter47 8th Aug 2016 21:12

Here are some questions for IT experts - and I am sure there are plenty of issues I haven't even thought about.

If DL's computers are inop could other airlines help out, for example KLM produce flight plans for DL flights departing AMS or would there be problems owing to different regulatory regimes, etc? I know that there are specific rules relating to despatchers who have to be licenced in the US.

Where you have codeshare partners - lets say you are a DL ticketed passenger travelling on a KL flight - would your reservation appear in both systems which would effectively provide a backup? I have to say that I have visions of pax arriving at airports having to prove that they have a reservation - useful to have printed off a confirmation.

Presumably VS still has its own computer system as its ops appear to be unaffected.

crablab 8th Aug 2016 21:15

#1 Nah, all the Delta databases will have been in that datacenter (from the sounds of it) - you'd need all that information to produce anything useful and it would be a right pain to modify the KLM systems to handle the data etc. You might as well do it by hand.

#2 I believe all passenger reservations are stored on a central system to aid access of the TSA etc. Like APIS. So I guess you might be able to get data off of that?

Smott999 8th Aug 2016 21:32

Indeed Ian, makes no sense if they were without a physically remote hot backup data center with fully redundant data, available to switch on promptly should the main site be lost.
I wonder if their are or will be regs about that kind of thing.

Speaking of regs, what about EC 261, that automatic-compensation for delayed/cancelled flights in EU? I've used it myself a few hrs back.

Lot of folks stranded in LHR or Amsterdam might want to make use of it! I wonder how many Yanks know about it though...

Logohu 8th Aug 2016 21:41


I wonder how many Yanks know about it though...
Probably not many, but you can be sure their attorneys will ;)

archae86 8th Aug 2016 21:51

Need to use your backups, not just test them
 
One lesson I think I learned in another industry: if you want your backups and protections to work when they are needed, you have to actually integrate usage of them into your standard operations. No amount of "really really careful" testing is an adequate substitute.

True story: the factory site at which I worked which at times was probably on the worldwide top ten list for dollar value added across all industries, was so concerned about the single point failure of losing utility power (yes, we had lots of stuff on UPS, but some big stuff of interest was not) that they paid to have several miles of high-voltage connection made to a second point within the utility network. There was a nifty switch on our premises which at need would transfer our load from the one string of power towers to the other.

Came the day we needed the backup connection to work--not because of a failure of the utility, but because a forklift operator on our own premises accidentally damaged a very late connection line by swinging a load up into it.

The post-mortem established that the nifty transfer switch had a battery which needed to be alive for the transfer to happen. And there was no maintenance plan for looking after the battery, which had probably been dead for some time by the day of our need. That one cost a very, very, large amount of money.

Yes, a suitable test would have caught that one, but I'll still hold out for the higher standard of usage. That way people take it seriously, and people notice and fix the troubles.

lincman 8th Aug 2016 23:39

Reservation System Computer Back-up
 
I understand BA have a fully duplicated back-up in a secret location where it can't be sabotaged. Maybe DL ought to visit BA and learn something?

etudiant 8th Aug 2016 23:45


Originally Posted by lincman (Post 9467559)
I understand BA have a fully duplicated back-up in a secret location where it can't be sabotaged. Maybe DL ought to visit BA and learn something?

Hope the transfer switch battery has been checked regularly. ;)

Water pilot 9th Aug 2016 01:03

If it makes anyone feel better, years ago I worked for Microsoft in Redmond, Wa. Cost for our data centers was no object and we had the best backup power systems that money could buy at the time. Comes along a big storm in Redmond that took out the power and you guesed it, the lights go out, my screen goes dark, and the fire doors slammed shut..

I think lightning hit the backup generators. The good news is that my house was near "campus" and I got power restored days before anybody else.

underfire 9th Aug 2016 01:51

From my experience, including working in Bldg #2 at Redmond, these 'experts' in backup and redundancy are akin to the 'experts' in aviation, always 'formerly' employed in the business.
Everything has relied on boilerplate checklists, with single point failures at virtually every point, so on paper it looks fine, but in operation, it falls apart.
In Seattle, the Police Department had backup generators for the systems. The weekly tests of the systems went fine. When an earthquake happened, the backup generator systems went down 2 hours after startup. The weekly tests had been using up the fuel storage, and there was never a contract in place to keep the tanks topped off.

RobertS975 9th Aug 2016 02:59

Most "yanks" don't consult attorneys regarding canceled flights. DL has offered a $200 travel voucher to anyone who was canceled or delayed greater than 3 hours.

underfire 9th Aug 2016 03:06

go for the 15K FF miles!

ExXB 9th Aug 2016 06:10


Originally Posted by RobertS975 (Post 9467657)
Most "yanks" don't consult attorneys regarding canceled flights. DL has offered a $200 travel voucher to anyone who was canceled or delayed greater than 3 hours.

Yet 261 provides for €400/600 in CASH not in vouchers. It applies to all DL flights departing from an EU airport.

As passengers can't waive their rights they would be entitled to this in addition to their voucher.

ajamieson 9th Aug 2016 08:57

Although passengers can't waive their rights, airlines do not have to offer compensation where a passenger has agreed an alternative. Accepting the voucher could be argued as agreement to an alternative offer.

But yes, definitely worth lodging an EC261 as 600 Euro is way more use than an MCO/voucher that probably comes with a string of restrictions.

Smott999 9th Aug 2016 11:15

I once had to go EC261 on United as they stranded me in London.
It literally took 18 months and they denied everything and said their cancellation was "force majeur " and I was entitled to nothing. Until the courts ruled and it was time to pay up, then they tried to bribe me to drop the case. I told them they were in violation of law for contacting me directly instead of my attny and hung up.
There are actually firms in EU that do nothing but prosecute EC261. I think they took about 15% and I had to do nothing except email them my boarding pass and other info. Not bad, but United was just appalling about it.

sabbasolo 9th Aug 2016 11:18

Any idea why are DL flights still being cancelled today (Tuesday)? Positioning? Some loss of data?


All times are GMT. The time now is 10:49.


Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.