'System outage' grounds Delta flights
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes
on
0 Posts
Fact of the matter is that every backup system will introduce new failure modes.
It happens that everything stops because of inconstancy between primary and secondary systems. Systems can become unavailable as they need to re-synchronize (a common one is where a drive in a RAID array fails and the server starts filling the hot-spare). The best one I ever experienced is a UPS that failed: Everything had power, except the systems behind the UPS...
It happens that everything stops because of inconstancy between primary and secondary systems. Systems can become unavailable as they need to re-synchronize (a common one is where a drive in a RAID array fails and the server starts filling the hot-spare). The best one I ever experienced is a UPS that failed: Everything had power, except the systems behind the UPS...
And Neilki, I was talking with your compatriots at around 4am this morning - they are doing a good job. I can only imagine the workload in ops and dispatch over the last few days.

Resident insomniac
Join Date: Aug 2005
Location: N54 58 34 W02 01 21
Age: 79
Posts: 1,873
Likes: 0
Received 1 Like
on
1 Post
It took best part of two hours to process the passengers and issue boarding cards (I don't remember whether they were hand-written - it was about 30 years ago).

Join Date: Jan 2011
Location: San Jose, CA
Age: 47
Posts: 0
Likes: 0
Received 0 Likes
on
0 Posts
From an I.T. management point of view, Delta are stupid.
Let's assume for a second that the whole outage was indeed caused by power circuits going down and backup power not kicking in. Fine, shit happens. Stuff fails.
What SHOULD have happened is that their entire system failed over to a backup SITE within 30 seconds. That is not impossible to do, and I just delivered a system (three months ago) that does just that. Ask yourself why Google, Facebook etc never go down. Redundancy, redundancy, redundancy and redundancy. And trust me, as an I.T. professional (MSc + JNCIE) I can guarantee you that this is not rocket science.
This time it's a power failure. Next time it's a criminal or terrorist act that takes out the entire DC.
Let's assume for a second that the whole outage was indeed caused by power circuits going down and backup power not kicking in. Fine, shit happens. Stuff fails.
What SHOULD have happened is that their entire system failed over to a backup SITE within 30 seconds. That is not impossible to do, and I just delivered a system (three months ago) that does just that. Ask yourself why Google, Facebook etc never go down. Redundancy, redundancy, redundancy and redundancy. And trust me, as an I.T. professional (MSc + JNCIE) I can guarantee you that this is not rocket science.
This time it's a power failure. Next time it's a criminal or terrorist act that takes out the entire DC.
Join Date: Feb 2008
Location: Earth
Posts: 101
Likes: 0
Received 0 Likes
on
0 Posts
Clearly these airlines have done little investment in IT systems and their own staff. Probably preferring to outsource everything and pay expensive consultants to come in once in a while. Splitting up your servers at various sites scattered around the country, or world, is expensive, but necessary when you rely on systems to do function as a company. Will airlines invest in their infrastructure rather than worrying about quarterly results and executive parachutes?
Clearly these airlines have done little investment in IT systems and their own staff. Probably preferring to outsource everything and pay expensive consultants to come in once in a while. Splitting up your servers at various sites scattered around the country, or world, is expensive, but necessary when you rely on systems to do function as a company. Will airlines invest in their infrastructure rather than worrying about quarterly results and executive parachutes?
In fairness remember that all the major US carriers have only recently experienced real financial difficulty, with bankruptcy rife. That is a difficult environment in which to fund a major DP upgrade that does not provide immediate economic advantage and so these carriers have let their legacy systems soldier on too long.
This incident will remind their managements to reassess that decision.
Join Date: Apr 2008
Location: Paris
Age: 73
Posts: 275
Likes: 0
Received 0 Likes
on
0 Posts
Shit happens. Even Google and Apple occasionally have their sites go dark. Let's hope Boeing and Airbus have fewer critical failures.
This is probably a nightmare for the insurers.
This is probably a nightmare for the insurers.
Paxing All Over The World

I spent 27 years in IT and much was putting the case to mgmt as to why they had to spend the money, if they wanted to achieve the aims that they said they did.
Dear Main Board Directors of DL:
You know that, when you buy a twin engine aircraft, the donkeys are BIG? Each has to have enough reserve power to be able to continue safely when a donkey conks at V1.
It's exactly like that. We need two mighty big IT Donkeys to get our passengers safely to their destination so that you can stay on the golf course.

Dear Main Board Directors of DL:
You know that, when you buy a twin engine aircraft, the donkeys are BIG? Each has to have enough reserve power to be able to continue safely when a donkey conks at V1.
It's exactly like that. We need two mighty big IT Donkeys to get our passengers safely to their destination so that you can stay on the golf course.

Sounds like a case for the Chaos Monkey, and its friends in the Simian Army.
(Netflix developed an application, called Chaos Monkey, that causes randomly selected servers and/or network links to fail deliberately, in order to test their failover arrangements, collect data on the recovery process' performance, and most importantly, to demonstrate that they work. And they're just selling TV.)
(Netflix developed an application, called Chaos Monkey, that causes randomly selected servers and/or network links to fail deliberately, in order to test their failover arrangements, collect data on the recovery process' performance, and most importantly, to demonstrate that they work. And they're just selling TV.)
Join Date: Oct 2002
Location: Silicon Hills
Posts: 234
Likes: 0
Received 0 Likes
on
0 Posts
Someone somewhere, perhaps, signed off on NOT installing a correct, suitable worse case, thoroughly tested - often- backup system. Heads might roll, but don't hold your breath. They can not pin 'pilot error' on this one.
Someone somewhere did not do a thorough threat/risk assessment of 'what happens if...../'
Someone somewhere did not do a thorough threat/risk assessment of 'what happens if...../'
But if you work in an office or cubicle, you get hammered for always asking "Yeah, but what if...." Then you're crapping on someone's plan and budget, and delaying the project. Those people do not think like we do, never will.
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes
on
0 Posts
From an I.T. management point of view, Delta are stupid.
Let's assume for a second that the whole outage was indeed caused by power circuits going down and backup power not kicking in. Fine, shit happens. Stuff fails.
What SHOULD have happened is that their entire system failed over to a backup SITE within 30 seconds. That is not impossible to do, and I just delivered a system (three months ago) that does just that. Ask yourself why Google, Facebook etc never go down. Redundancy, redundancy, redundancy and redundancy. And trust me, as an I.T. professional (MSc + JNCIE) I can guarantee you that this is not rocket science.
This time it's a power failure. Next time it's a criminal or terrorist act that takes out the entire DC.
Let's assume for a second that the whole outage was indeed caused by power circuits going down and backup power not kicking in. Fine, shit happens. Stuff fails.
What SHOULD have happened is that their entire system failed over to a backup SITE within 30 seconds. That is not impossible to do, and I just delivered a system (three months ago) that does just that. Ask yourself why Google, Facebook etc never go down. Redundancy, redundancy, redundancy and redundancy. And trust me, as an I.T. professional (MSc + JNCIE) I can guarantee you that this is not rocket science.
This time it's a power failure. Next time it's a criminal or terrorist act that takes out the entire DC.
Join Date: Jul 2001
Location: London
Posts: 90
Likes: 0
Received 0 Likes
on
0 Posts
Clearly these airlines have done little investment in IT systems and their own staff. Probably preferring to outsource everything and pay expensive consultants to come in once in a while. Splitting up your servers at various sites scattered around the country, or world, is expensive, but necessary when you rely on systems to do function as a company. Will airlines invest in their infrastructure rather than worrying about quarterly results and executive parachutes?
Paxing All Over The World
The key difference is that if the item (IT in this case) is 'in sourced' when it goes wrong, the CEO can reach out and grab the person (IT Director) and grasp them warmly by the throat. They can even be out of their job by the end of the day.
With out sourcing you cannot do that. You can terminate the contract but that will take months of negotiation and cost more money to change it to another bunch.
When an employee knows that they cannot be made to carry the can - their attitude is different. It is what makes pilots different. As we know, CEOs often pass the can to others too. I have never liked outsourcing because it costs a lot of things that do not show up on the balance sheet.
With out sourcing you cannot do that. You can terminate the contract but that will take months of negotiation and cost more money to change it to another bunch.
When an employee knows that they cannot be made to carry the can - their attitude is different. It is what makes pilots different. As we know, CEOs often pass the can to others too. I have never liked outsourcing because it costs a lot of things that do not show up on the balance sheet.