'System outage' grounds Delta flights
It is seldom the back up generators that fail if they are rigorously tested. It is normally some switch somewhere, which no-one seems to own. IT folk think they do good Project management. Maybe they do for software implementations. For real Engineering, hire a real Engineer.
if you want your backups and protections to work when they are needed, you have to actually integrate usage of them into your standard operations. No amount of "really really careful" testing is an adequate substitute.
the factory site at which I worked which at times was probably on the worldwide top ten list for dollar value added across all industries, was so concerned about the single point failure of losing utility power ... that they paid to have several miles of high-voltage connection made to a second point within the utility network.
Join Date: Jul 2005
Location: Canadian Shield
Posts: 538
Likes: 0
Received 0 Likes
on
0 Posts
Well, it's certainly a major wake-up call.
The proliferation of IT-based "solutions" in passenger air-transport recently has been remarkable: fully web-based reservations; on-line check-in; boarding cards via hand-held devices etc etc etc.
These days all one sees before Security at airports is a spotty 16-year old handling checked baggage - typically someone who wouldn't recognize a Manual System if it swam up and bit him.
When all said and done, Delta's back-up and disaster recovery procedures clearly fell far short. No real excuse for that.
[Presumably anyone dying in SE States that day got an extra day on earth. "It doesn't matter if you go to Heaven or Hell, you still have to go via Atlanta!"]
The proliferation of IT-based "solutions" in passenger air-transport recently has been remarkable: fully web-based reservations; on-line check-in; boarding cards via hand-held devices etc etc etc.
These days all one sees before Security at airports is a spotty 16-year old handling checked baggage - typically someone who wouldn't recognize a Manual System if it swam up and bit him.
When all said and done, Delta's back-up and disaster recovery procedures clearly fell far short. No real excuse for that.

[Presumably anyone dying in SE States that day got an extra day on earth. "It doesn't matter if you go to Heaven or Hell, you still have to go via Atlanta!"]

Join Date: Mar 2004
Location: Baltimore, MD
Posts: 272
Likes: 0
Received 0 Likes
on
0 Posts
It is seldom the back up generators that fail if they are rigorously tested. It is normally some switch somewhere, which no-one seems to own. IT folk think they do good Project management. Maybe they do for software implementations. For real Engineering, hire a real Engineer.
Join Date: Jun 2000
Location: last time I looked I was still here.
Posts: 4,507
Likes: 0
Received 0 Likes
on
0 Posts
Various observations come to mind:
1. Someone somewhere, perhaps, signed off on NOT installing a correct, suitable worse case, thoroughly tested - often- backup system. Heads might roll, but don't hold your breath. They can not pin 'pilot error' on this one.
2. Someone somewhere did not do a thorough threat/risk assessment of 'what happens if...../'
3. Someone somewhere was being over complacent. "it has never been a problem before, therefore it's OK."
4. When volcanic ash shutdowns airspace, and puts a/c & crews where you did not plan them to be, you used your computer systems to sort out the consequential poo-pile. Oops, the poo-pile is caused by your own computer system. Now where is that pencil & rubber, slide-rule and abacus? What do you mean there's no paper back up? Oops.
This saga could go on long enough for Hollywood to make an epic drama out of it, at least a TV box set. Then you could throw in some foreign espionage conspiracy and ruin the whole truth. Ground Crash Investigation could have a field day with this one. Human error puts a company on the edge.
What will be interesting will be the investigation as to root cause. I wonder if that will ever see the light of day to the public. Check out the dole queue for a clue.
1. Someone somewhere, perhaps, signed off on NOT installing a correct, suitable worse case, thoroughly tested - often- backup system. Heads might roll, but don't hold your breath. They can not pin 'pilot error' on this one.
2. Someone somewhere did not do a thorough threat/risk assessment of 'what happens if...../'
3. Someone somewhere was being over complacent. "it has never been a problem before, therefore it's OK."
4. When volcanic ash shutdowns airspace, and puts a/c & crews where you did not plan them to be, you used your computer systems to sort out the consequential poo-pile. Oops, the poo-pile is caused by your own computer system. Now where is that pencil & rubber, slide-rule and abacus? What do you mean there's no paper back up? Oops.
This saga could go on long enough for Hollywood to make an epic drama out of it, at least a TV box set. Then you could throw in some foreign espionage conspiracy and ruin the whole truth. Ground Crash Investigation could have a field day with this one. Human error puts a company on the edge.
What will be interesting will be the investigation as to root cause. I wonder if that will ever see the light of day to the public. Check out the dole queue for a clue.
Join Date: Dec 2001
Location: Richmond Texas
Posts: 305
Likes: 0
Received 0 Likes
on
0 Posts
Never worked in airline reservations but did work many years in the broadcast industry. I was surprised at the number of redundant systems I found that assured system failure if either of the duplicate systems failed! Also worked as a millennium auditor in the same industry. We found several epoch related risks that had nothing to do with Y2K.
After an excellent landing etc...
After an excellent landing etc...
Take away all tools from the Engineer except one (i.e. hammer). Now you have a programmer.
This is how they treated their core competency: aircraft. The 'systems engineering' function (a top-down view of overall function) was mostly contract management and very little actual engineering. You can imagine what lack of attention was given to non-core functions (data centers, facilities, etc.)
Not that it matters terribly, but here's latest info:
http://finance.yahoo.com/news/delta-...194608131.html
http://finance.yahoo.com/news/delta-...194608131.html
A wake up call? Delta already had the wake up call, ten years ago:
Comair's Christmas Disaster: Bound To Fail | CIO
Of course, Comair was only a subsidiary of Delta, not part of the main team - they were too smart to let such a thing happen!
The CEO of Comair walked the plank! Wonder if that'll happen this time around!
Comair's Christmas Disaster: Bound To Fail | CIO
Of course, Comair was only a subsidiary of Delta, not part of the main team - they were too smart to let such a thing happen!
The CEO of Comair walked the plank! Wonder if that'll happen this time around!
Last edited by twochai; 9th Aug 2016 at 21:50.
Join Date: Aug 2007
Location: West London, UK
Posts: 12
Likes: 0
Received 0 Likes
on
0 Posts
BA, mentioned in another post, is also hosted by a third party with a very resilient system indeed. I'm surprised that DL didn't move to third party hosting when they finally dropped the legacy in-house DELTAMATIC system.
Join Date: Feb 2003
Location: PBI
Posts: 215
Likes: 0
Received 0 Likes
on
0 Posts
Amazon computer services offered a much more redundant system and they (Delta) didn't want to pay the money.
You can assume Amazon are pretty much switched on with systems.
So Delta are running a 20-25 year old system that if one hub goes down so does the rest. All the senior IT execs are former IBM. That sums it up of course.
On another positive note all the competition are doing really well from this total Fook up!
You can assume Amazon are pretty much switched on with systems.
So Delta are running a 20-25 year old system that if one hub goes down so does the rest. All the senior IT execs are former IBM. That sums it up of course.
On another positive note all the competition are doing really well from this total Fook up!
Last edited by OldCessna; 9th Aug 2016 at 23:23. Reason: Typo
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes
on
0 Posts
From the Yahoo Link above:
And here we have the fundamental fault in the design. The 'backup' system should be operating all the time as a part of the live system. To all intents and purposes you have a widely distributed system that usually operates very efficiently. When part of the system fails all that happens is that the remaining part of the system carries on operating slightly less efficiently. There is no impact at all on operations and no failover to worry about.
I have no doubt that the IT people would want to have a fault tolerant system but the beancounters will have said how often do things fail? What is the cost of 2 computer centers? We are not paying that they can stay in the same building.... and there will be a Delta beancounter with his abacus out saying now that they still won on the deal.
"Monday morning a critical power control module at our Technology Command Center malfunctioned, causing a surge to the transformer and a loss of power," Delta COO Gil West said in a statement on Tuesday. "The universal power was stabilized and power was restored quickly."
However, the trouble obviously didn't end there. A Delta spokesperson confirmed to Business Insider earlier today that the airline's backup systems failed to kick in."
However, the trouble obviously didn't end there. A Delta spokesperson confirmed to Business Insider earlier today that the airline's backup systems failed to kick in."
I have no doubt that the IT people would want to have a fault tolerant system but the beancounters will have said how often do things fail? What is the cost of 2 computer centers? We are not paying that they can stay in the same building.... and there will be a Delta beancounter with his abacus out saying now that they still won on the deal.
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes
on
0 Posts
A wake up call? Delta already had the wake up call, ten years ago:
Comair's Christmas Disaster: Bound To Fail | CIO
Of course, Comair was only a subsidiary of Delta, not part of the main team - they were too smart to let such a thing happen!
The CEO of Comair walked the plank! Wonder if that'll happen this time around!
Comair's Christmas Disaster: Bound To Fail | CIO
Of course, Comair was only a subsidiary of Delta, not part of the main team - they were too smart to let such a thing happen!
The CEO of Comair walked the plank! Wonder if that'll happen this time around!
I have been told that in order to reduce stock holding at the airport Cincinnati had only a small supply of deicer and when snow/ice/freezing weather was forecast would call for supplies sufficient for the expected weather. in this case the tankers of deicer were on their way to the airport but were pulled over by law enforcement and told it was too dangerous for them to carry on driving due to the snow. So the airport was unable to deice aircraft and operations were halted. Not only did the aircraft tires freeze to the ground, but also the jetways froze in position.
Lots of holes in the cheese lined up. A really good learning exercise for the MBAs who run airports these days.
Join Date: Dec 2015
Location: Southampton
Posts: 120
Likes: 0
Received 0 Likes
on
0 Posts
The last company I worked for had mirrored data centres in Europe, Asia and America.
You could loose any 2 and everything would still work correctly at a "user level".
You could loose any 2 and everything would still work correctly at a "user level".
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes
on
0 Posts
There is only one explanation really, Delta beancounters felt the cost of a fault tolerant system made it worth taking the risk of a total system failure. Yet the cost of the backup system running as a 'hot spare' in a separate building would be peanuts compared to their cash and status losses now. There are still flights being cancelled today and their computer systems are still not recovered with lots of broken links and applications not back in synch. All those people with their 'e-boarding passes' on their phones could be in trouble. This may run on for months with people with bookings out months suddenly finding that the roll-back/roll-forward broke their bookings.
They should take the $200 a pax good will payments out of their beancounters' head count budgets. Only then with skin in the game would they appreciate the risk analyses.
Join Date: Jan 2008
Location: Netherlands
Age: 46
Posts: 308
Likes: 0
Received 0 Likes
on
0 Posts
Fact of the matter is that every backup system will introduce new failure modes.
It happens that everything stops because of inconstancy between primary and secondary systems. Systems can become unavailable as they need to re-synchronize (a common one is where a drive in a RAID array fails and the server starts filling the hot-spare). The best one I ever experienced is a UPS that failed: Everything had power, except the systems behind the UPS...
It happens that everything stops because of inconstancy between primary and secondary systems. Systems can become unavailable as they need to re-synchronize (a common one is where a drive in a RAID array fails and the server starts filling the hot-spare). The best one I ever experienced is a UPS that failed: Everything had power, except the systems behind the UPS...
Join Date: Nov 2007
Location: Texas
Posts: 1,908
Likes: 0
Received 0 Likes
on
0 Posts
Any idea why are DL flights still being cancelled today (Tuesday)? Positioning? Some loss of data?
Join Date: Feb 2015
Location: New Hampshire
Posts: 152
Likes: 0
Received 0 Likes
on
0 Posts
In this case, lots of schemes would have worked. As others have said, they just needed to actually exercise the one they picked. When it comes to software and information systems, if it hasn't been tested, it doesn't work.
So we have the first round of testing: Not bad, up in only a few hours. Too bad it wasn't a test.
Actually, a major cost in these systems is the testing. As each revision is made to the system, you need ways to routinely simulate and check daily activity without actually relying on that system.
Join Date: Sep 2007
Location: New York
Posts: 222
Likes: 0
Received 0 Likes
on
0 Posts
Coal Face
For the record, everyone i worked with or watched the last few days met the challenges at hand with grace and patience. It was impressive to watch people pull together to keep the show running. I would take my hat off to them; but i'm not supposed to...
Quote:
Any idea why are DL flights still being cancelled today (Tuesday)? Positioning? Some loss of data?
Any idea why are DL flights still being cancelled today (Tuesday)? Positioning? Some loss of data?