It is seldom the back up generators that fail if they are rigorously tested. It is normally some switch somewhere, which no-one seems to own. IT folk think they do good Project management. Maybe they do for software implementations. For real Engineering, hire a real Engineer.
|
if you want your backups and protections to work when they are needed, you have to actually integrate usage of them into your standard operations. No amount of "really really careful" testing is an adequate substitute. the factory site at which I worked which at times was probably on the worldwide top ten list for dollar value added across all industries, was so concerned about the single point failure of losing utility power ... that they paid to have several miles of high-voltage connection made to a second point within the utility network. |
Well, it's certainly a major wake-up call.
The proliferation of IT-based "solutions" in passenger air-transport recently has been remarkable: fully web-based reservations; on-line check-in; boarding cards via hand-held devices etc etc etc. These days all one sees before Security at airports is a spotty 16-year old handling checked baggage - typically someone who wouldn't recognize a Manual System if it swam up and bit him. When all said and done, Delta's back-up and disaster recovery procedures clearly fell far short. No real excuse for that. := [Presumably anyone dying in SE States that day got an extra day on earth. "It doesn't matter if you go to Heaven or Hell, you still have to go via Atlanta!"] :E |
It is seldom the back up generators that fail if they are rigorously tested. It is normally some switch somewhere, which no-one seems to own. IT folk think they do good Project management. Maybe they do for software implementations. For real Engineering, hire a real Engineer. |
Various observations come to mind:
1. Someone somewhere, perhaps, signed off on NOT installing a correct, suitable worse case, thoroughly tested - often- backup system. Heads might roll, but don't hold your breath. They can not pin 'pilot error' on this one. 2. Someone somewhere did not do a thorough threat/risk assessment of 'what happens if...../' 3. Someone somewhere was being over complacent. "it has never been a problem before, therefore it's OK." 4. When volcanic ash shutdowns airspace, and puts a/c & crews where you did not plan them to be, you used your computer systems to sort out the consequential poo-pile. Oops, the poo-pile is caused by your own computer system. Now where is that pencil & rubber, slide-rule and abacus? What do you mean there's no paper back up? Oops. This saga could go on long enough for Hollywood to make an epic drama out of it, at least a TV box set. Then you could throw in some foreign espionage conspiracy and ruin the whole truth. Ground Crash Investigation could have a field day with this one. Human error puts a company on the edge. What will be interesting will be the investigation as to root cause. I wonder if that will ever see the light of day to the public. Check out the dole queue for a clue. |
Never worked in airline reservations but did work many years in the broadcast industry. I was surprised at the number of redundant systems I found that assured system failure if either of the duplicate systems failed! Also worked as a millennium auditor in the same industry. We found several epoch related risks that had nothing to do with Y2K.
After an excellent landing etc... |
Take away all tools from the Engineer except one (i.e. hammer). Now you have a programmer. This is how they treated their core competency: aircraft. The 'systems engineering' function (a top-down view of overall function) was mostly contract management and very little actual engineering. You can imagine what lack of attention was given to non-core functions (data centers, facilities, etc.) |
Not that it matters terribly, but here's latest info:
http://finance.yahoo.com/news/delta-...194608131.html |
A wake up call? Delta already had the wake up call, ten years ago:
Comair's Christmas Disaster: Bound To Fail | CIO Of course, Comair was only a subsidiary of Delta, not part of the main team - they were too smart to let such a thing happen! The CEO of Comair walked the plank! Wonder if that'll happen this time around! |
Originally Posted by Peter47
(Post 9467438)
Presumably VS still has its own computer system as its ops appear to be unaffected.
BA, mentioned in another post, is also hosted by a third party with a very resilient system indeed. I'm surprised that DL didn't move to third party hosting when they finally dropped the legacy in-house DELTAMATIC system. |
Amazon computer services offered a much more redundant system and they (Delta) didn't want to pay the money.
You can assume Amazon are pretty much switched on with systems. So Delta are running a 20-25 year old system that if one hub goes down so does the rest. All the senior IT execs are former IBM. That sums it up of course. On another positive note all the competition are doing really well from this total Fook up! |
From the Yahoo Link above:
"Monday morning a critical power control module at our Technology Command Center malfunctioned, causing a surge to the transformer and a loss of power," Delta COO Gil West said in a statement on Tuesday. "The universal power was stabilized and power was restored quickly." However, the trouble obviously didn't end there. A Delta spokesperson confirmed to Business Insider earlier today that the airline's backup systems failed to kick in." I have no doubt that the IT people would want to have a fault tolerant system but the beancounters will have said how often do things fail? What is the cost of 2 computer centers? We are not paying that they can stay in the same building.... and there will be a Delta beancounter with his abacus out saying now that they still won on the deal. |
Originally Posted by twochai
(Post 9468553)
A wake up call? Delta already had the wake up call, ten years ago:
Comair's Christmas Disaster: Bound To Fail | CIO Of course, Comair was only a subsidiary of Delta, not part of the main team - they were too smart to let such a thing happen! The CEO of Comair walked the plank! Wonder if that'll happen this time around! I have been told that in order to reduce stock holding at the airport Cincinnati had only a small supply of deicer and when snow/ice/freezing weather was forecast would call for supplies sufficient for the expected weather. in this case the tankers of deicer were on their way to the airport but were pulled over by law enforcement and told it was too dangerous for them to carry on driving due to the snow. So the airport was unable to deice aircraft and operations were halted. Not only did the aircraft tires freeze to the ground, but also the jetways froze in position. Lots of holes in the cheese lined up. A really good learning exercise for the MBAs who run airports these days. |
The last company I worked for had mirrored data centres in Europe, Asia and America.
You could loose any 2 and everything would still work correctly at a "user level". |
Originally Posted by Tech Guy
(Post 9469102)
The last company I worked for had mirrored data centres in Europe, Asia and America.
You could loose any 2 and everything would still work correctly at a "user level". There is only one explanation really, Delta beancounters felt the cost of a fault tolerant system made it worth taking the risk of a total system failure. Yet the cost of the backup system running as a 'hot spare' in a separate building would be peanuts compared to their cash and status losses now. There are still flights being cancelled today and their computer systems are still not recovered with lots of broken links and applications not back in synch. All those people with their 'e-boarding passes' on their phones could be in trouble. This may run on for months with people with bookings out months suddenly finding that the roll-back/roll-forward broke their bookings. They should take the $200 a pax good will payments out of their beancounters' head count budgets. Only then with skin in the game would they appreciate the risk analyses. |
Fact of the matter is that every backup system will introduce new failure modes.
It happens that everything stops because of inconstancy between primary and secondary systems. Systems can become unavailable as they need to re-synchronize (a common one is where a drive in a RAID array fails and the server starts filling the hot-spare). The best one I ever experienced is a UPS that failed: Everything had power, except the systems behind the UPS... |
Any idea why are DL flights still being cancelled today (Tuesday)? Positioning? Some loss of data? |
Originally Posted by FakePilot
(Post 9468181)
Take away all tools from the Engineer except one (i.e. hammer). Now you have a programmer. Most engineering work I've observed is "will the whole thing work?" vs. software people "when can I use my favorite tool?"
In this case, lots of schemes would have worked. As others have said, they just needed to actually exercise the one they picked. When it comes to software and information systems, if it hasn't been tested, it doesn't work. So we have the first round of testing: Not bad, up in only a few hours. Too bad it wasn't a test. Actually, a major cost in these systems is the testing. As each revision is made to the system, you need ways to routinely simulate and check daily activity without actually relying on that system. |
Coal Face
For the record, everyone i worked with or watched the last few days met the challenges at hand with grace and patience. It was impressive to watch people pull together to keep the show running. I would take my hat off to them; but i'm not supposed to...
|
Quote: Any idea why are DL flights still being cancelled today (Tuesday)? Positioning? Some loss of data? |
All times are GMT. The time now is 10:17. |
Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.