Go Back  PPRuNe Forums > Flight Deck Forums > Rumours & News
Reload this Page >

'System outage' grounds Delta flights

Rumours & News Reporting Points that may affect our jobs or lives as professional pilots. Also, items that may be of interest to professional pilots.

'System outage' grounds Delta flights

Old 10th Aug 2016, 19:27
  #61 (permalink)  
 
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by procede
Fact of the matter is that every backup system will introduce new failure modes.
It happens that everything stops because of inconstancy between primary and secondary systems. Systems can become unavailable as they need to re-synchronize (a common one is where a drive in a RAID array fails and the server starts filling the hot-spare). The best one I ever experienced is a UPS that failed: Everything had power, except the systems behind the UPS...
That is why you do not have 'backup systems' you have a widely distributed fault tolerant system. Yes they are a real pain to test as Scott says above, especially the regression testing after every change and fix. But Delta is probably wishing it had spent the money on a distributed system.

And Neilki, I was talking with your compatriots at around 4am this morning - they are doing a good job. I can only imagine the workload in ops and dispatch over the last few days.
Ian W is offline  
Old 10th Aug 2016, 19:48
  #62 (permalink)  
Resident insomniac
 
Join Date: Aug 2005
Location: N54 58 34 W02 01 21
Age: 79
Posts: 1,873
Likes: 0
Received 1 Like on 1 Post
Originally Posted by Ian W
I can only imagine the workload in ops and dispatch over the last few days.
Pah! I was about to board a B747 from HKG to TPE when the check-in system went down.

It took best part of two hours to process the passengers and issue boarding cards (I don't remember whether they were hand-written - it was about 30 years ago).
G-CPTN is offline  
Old 11th Aug 2016, 00:05
  #63 (permalink)  
 
Join Date: Jan 2011
Location: San Jose, CA
Age: 48
Posts: 0
Likes: 0
Received 0 Likes on 0 Posts
From an I.T. management point of view, Delta are stupid.

Let's assume for a second that the whole outage was indeed caused by power circuits going down and backup power not kicking in. Fine, **** happens. Stuff fails.

What SHOULD have happened is that their entire system failed over to a backup SITE within 30 seconds. That is not impossible to do, and I just delivered a system (three months ago) that does just that. Ask yourself why Google, Facebook etc never go down. Redundancy, redundancy, redundancy and redundancy. And trust me, as an I.T. professional (MSc + JNCIE) I can guarantee you that this is not rocket science.

This time it's a power failure. Next time it's a criminal or terrorist act that takes out the entire DC.
ph-sbe is offline  
Old 11th Aug 2016, 00:40
  #64 (permalink)  
 
Join Date: Feb 2008
Location: Earth
Posts: 101
Likes: 0
Received 0 Likes on 0 Posts
Clearly these airlines have done little investment in IT systems and their own staff. Probably preferring to outsource everything and pay expensive consultants to come in once in a while. Splitting up your servers at various sites scattered around the country, or world, is expensive, but necessary when you rely on systems to do function as a company. Will airlines invest in their infrastructure rather than worrying about quarterly results and executive parachutes?
413X3 is offline  
Old 11th Aug 2016, 02:44
  #65 (permalink)  
 
Join Date: May 2011
Location: NEW YORK
Posts: 1,352
Likes: 0
Received 1 Like on 1 Post
Originally Posted by 413X3
Clearly these airlines have done little investment in IT systems and their own staff. Probably preferring to outsource everything and pay expensive consultants to come in once in a while. Splitting up your servers at various sites scattered around the country, or world, is expensive, but necessary when you rely on systems to do function as a company. Will airlines invest in their infrastructure rather than worrying about quarterly results and executive parachutes?

In fairness remember that all the major US carriers have only recently experienced real financial difficulty, with bankruptcy rife. That is a difficult environment in which to fund a major DP upgrade that does not provide immediate economic advantage and so these carriers have let their legacy systems soldier on too long.
This incident will remind their managements to reassess that decision.
etudiant is offline  
Old 11th Aug 2016, 04:29
  #66 (permalink)  
 
Join Date: Apr 2008
Location: Paris
Age: 74
Posts: 275
Likes: 0
Received 0 Likes on 0 Posts
**** happens. Even Google and Apple occasionally have their sites go dark. Let's hope Boeing and Airbus have fewer critical failures.

This is probably a nightmare for the insurers.
edmundronald is offline  
Old 11th Aug 2016, 23:59
  #67 (permalink)  
Paxing All Over The World
 
Join Date: May 2001
Location: Hertfordshire, UK.
Age: 67
Posts: 10,214
Received 72 Likes on 58 Posts
Wink

I spent 27 years in IT and much was putting the case to mgmt as to why they had to spend the money, if they wanted to achieve the aims that they said they did.

Dear Main Board Directors of DL:

You know that, when you buy a twin engine aircraft, the donkeys are BIG? Each has to have enough reserve power to be able to continue safely when a donkey conks at V1.

It's exactly like that. We need two mighty big IT Donkeys to get our passengers safely to their destination so that you can stay on the golf course.


PAXboy is offline  
Old 12th Aug 2016, 02:48
  #68 (permalink)  
 
Join Date: Dec 2010
Location: SCAL
Posts: 116
Received 21 Likes on 8 Posts
In my 37 years of IT (starting when it was DP) both sides of the pond I have never worked anywhere there was a shortage of donkeys.
sherburn2LA is offline  
Old 12th Aug 2016, 15:22
  #69 (permalink)  
 
Join Date: Jan 2008
Location: On the lake
Age: 82
Posts: 671
Received 0 Likes on 0 Posts
I have never worked anywhere there was a shortage of donkeys.
One can only say "Amen" to that!
twochai is online now  
Old 12th Aug 2016, 15:59
  #70 (permalink)  
 
Join Date: Mar 2002
Location: Surrey, UK
Posts: 901
Received 12 Likes on 7 Posts
Sounds like a case for the Chaos Monkey, and its friends in the Simian Army.

(Netflix developed an application, called Chaos Monkey, that causes randomly selected servers and/or network links to fail deliberately, in order to test their failover arrangements, collect data on the recovery process' performance, and most importantly, to demonstrate that they work. And they're just selling TV.)
steamchicken is offline  
Old 12th Aug 2016, 20:49
  #71 (permalink)  
 
Join Date: Oct 2002
Location: Silicon Hills
Posts: 234
Likes: 0
Received 0 Likes on 0 Posts
Someone somewhere, perhaps, signed off on NOT installing a correct, suitable worse case, thoroughly tested - often- backup system. Heads might roll, but don't hold your breath. They can not pin 'pilot error' on this one.

Someone somewhere did not do a thorough threat/risk assessment of 'what happens if...../'
Pilots, controllers, firemen, etc are taught to think "What happens if..." They are safety personnel, and need to always have a backup plan or even two.

But if you work in an office or cubicle, you get hammered for always asking "Yeah, but what if...." Then you're crapping on someone's plan and budget, and delaying the project. Those people do not think like we do, never will.
vector4fun is offline  
Old 12th Aug 2016, 22:00
  #72 (permalink)  
Resident insomniac
 
Join Date: Aug 2005
Location: N54 58 34 W02 01 21
Age: 79
Posts: 1,873
Likes: 0
Received 1 Like on 1 Post
We used to call it 'contingency'.
G-CPTN is offline  
Old 12th Aug 2016, 22:10
  #73 (permalink)  
 
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by ph-sbe
From an I.T. management point of view, Delta are stupid.

Let's assume for a second that the whole outage was indeed caused by power circuits going down and backup power not kicking in. Fine, **** happens. Stuff fails.

What SHOULD have happened is that their entire system failed over to a backup SITE within 30 seconds. That is not impossible to do, and I just delivered a system (three months ago) that does just that. Ask yourself why Google, Facebook etc never go down. Redundancy, redundancy, redundancy and redundancy. And trust me, as an I.T. professional (MSc + JNCIE) I can guarantee you that this is not rocket science.

This time it's a power failure. Next time it's a criminal or terrorist act that takes out the entire DC.
Almost right - except I would make it a widely distributed redundant system so there is no backup just a system with two (or more) identical parts sharing data and transactions with redundant copies of all the data. So it is very over powered for what it is doing but any site can fail and the users don't notice not even a 30 second switch over. A distributed system is as fault tolerant (or more so) as a main system with a standby, but there is no expensive system sat doing nothing and failover is instant and transparent to the users.
Ian W is offline  
Old 12th Aug 2016, 22:26
  #74 (permalink)  
 
Join Date: Jul 2001
Location: London
Posts: 90
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by 413X3
Clearly these airlines have done little investment in IT systems and their own staff. Probably preferring to outsource everything and pay expensive consultants to come in once in a while. Splitting up your servers at various sites scattered around the country, or world, is expensive, but necessary when you rely on systems to do function as a company. Will airlines invest in their infrastructure rather than worrying about quarterly results and executive parachutes?
Quite the opposite I believe, I understand Delta have not outsourced their IT preferring to keep it all in house. airlines are airlines and their core competency the airline game and not the IT game. Sometimes it’s best to outsource a key function to a company whose core competency it is. Don’t assume outsourcing is bad; after all as SLF we outsource our travel arrangements every time we get on board an aircraft and I can assure you that's a lot safer than me trying to fly myself everwhere
STN Ramp Rat is offline  
Old 13th Aug 2016, 21:28
  #75 (permalink)  
 
Join Date: Jan 2008
Location: On the lake
Age: 82
Posts: 671
Received 0 Likes on 0 Posts
Don't forget, Delta even 'in-sources' their jet fuel - they own the refinery!! I trust they do a better job refining crude oil than refining crude IT!
twochai is online now  
Old 14th Aug 2016, 21:13
  #76 (permalink)  
Paxing All Over The World
 
Join Date: May 2001
Location: Hertfordshire, UK.
Age: 67
Posts: 10,214
Received 72 Likes on 58 Posts
The key difference is that if the item (IT in this case) is 'in sourced' when it goes wrong, the CEO can reach out and grab the person (IT Director) and grasp them warmly by the throat. They can even be out of their job by the end of the day.

With out sourcing you cannot do that. You can terminate the contract but that will take months of negotiation and cost more money to change it to another bunch.

When an employee knows that they cannot be made to carry the can - their attitude is different. It is what makes pilots different. As we know, CEOs often pass the can to others too. I have never liked outsourcing because it costs a lot of things that do not show up on the balance sheet.
PAXboy is offline  

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


Thread Tools
Search this Thread

Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.