Go Back  PPRuNe Forums > Misc. Forums > Passengers & SLF (Self Loading Freight)
Reload this Page >

BA delays at LHR - Computer issue

Wikiposts
Search
Passengers & SLF (Self Loading Freight) If you are regularly a passenger on any airline then why not post your questions here?

BA delays at LHR - Computer issue

Thread Tools
 
Search this Thread
 
Old 6th Jun 2017, 12:21
  #561 (permalink)  
Paxing All Over The World
 
Join Date: May 2001
Location: Hertfordshire, UK.
Age: 67
Posts: 10,150
Received 62 Likes on 50 Posts
Restarting a data centre is just like starting an aircraft: There is a sequence that has been tested and proved correct. Any component/generator/system that is dependent on another item being running - will be set to start after it. There is testing of links to other systems - just like checking 'full and free'.

It used to be that you started your car by setting manual controls and then going to the front of the car to swing a handle. Once it had 'caught', you jumped into the seat to adjust choke and mixture etc. Now the car does it all for you when you turn the key/push the button and it sequences everything in the right order.

Wrong order for anything and the flight crew have to go back to the top of the checklist before calling for push or moving under own power, or turning onto the active. So the question is: What state was BA's startup list in and when was it last read?
PAXboy is offline  
Old 6th Jun 2017, 12:32
  #562 (permalink)  
 
Join Date: Jan 2006
Location: Gatwick
Posts: 117
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by The_Steed
Then it's just a case of restoring from backup - which is easy because you test that process on a regular basis
Assuming they have the latest greatest tape drives (which is unlikely) you still need to get the tapes back from the off-site location (assuming the local copy is corrupt or unavailable) which can take a couple of hours then you start the restore.
DR restores are normally done in a specific sequence depending on the priority of the system to the business.
the latest tape drives can transfer up to 1TB per hour. They probably have several in a library or two meaning you could theoretically transfer 4-8TB per hour off the tapes assuming the rest of the infrastructure can take it.

Most DBA's and system admins are not good at archiving data though so the main databases will be massive and will need fully restoring before you can bring the database back on line. Until that happens, the other systems may as well be in another universe as they cannot do anything useful.

The customer I currently work with has its systems categorised into several categories. The highest priority should be up and running within 15 minutes. This can only happen in an ideal world where you don't have corruption.
2nd, 4 hours, 3rd 24 hours 4th 7 days. Many of the systems in the 2nd and 3rd category wont work fully without the 4th category systems running so whilst they can be used for reference, they cannot operate their business fully until everything is back online. Blame the developers for that one as well as the business managers for wanting to cut costs.
bbrown1664 is offline  
Old 6th Jun 2017, 13:20
  #563 (permalink)  
 
Join Date: Jan 2010
Location: Edinburgh
Age: 85
Posts: 74
Likes: 0
Received 16 Likes on 9 Posts
Dumb suggestion:-
Maybe data centre A had a failure, and operation had been passed seamlessly to DC B, when someone did a NO NO at DC B.
A bit like being AOK to survive with one engine out, when the second engine shuts down!
Although I would have expected BA to blame "exceptional circumstances" had that actually happened.
DType is offline  
Old 6th Jun 2017, 15:38
  #564 (permalink)  
 
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by PAXboy
Restarting a data centre is just like starting an aircraft: There is a sequence that has been tested and proved correct. Any component/generator/system that is dependent on another item being running - will be set to start after it. There is testing of links to other systems - just like checking 'full and free'.

It used to be that you started your car by setting manual controls and then going to the front of the car to swing a handle. Once it had 'caught', you jumped into the seat to adjust choke and mixture etc. Now the car does it all for you when you turn the key/push the button and it sequences everything in the right order.

Wrong order for anything and the flight crew have to go back to the top of the checklist before calling for push or moving under own power, or turning onto the active. So the question is: What state was BA's startup list in and when was it last read?
But there are geographically separate data centers backing each other up - or so we are told. But this is obviously not the case. It would appear that what they have really been operating with is a closely coupled distributed system which provides (provided) no redundancy or fault tolerance. It would appear someone has implemented something that turned an otherwise redundant system into a single monolith and failing any part of the monolith results in a total system crash. This was indeed what was reported with phones and display board failures.

This was not a power supply fault although that exposed it - it was a gross system architecture design failure. I can't imagine that it was originally set up like that, it is more likely that someone has removed the redundancy from the system in some way possibly through ignorance of how the fault tolerance operated.
Ian W is offline  
Old 6th Jun 2017, 16:04
  #565 (permalink)  
 
Join Date: Jan 2006
Location: Gatwick
Posts: 117
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Ian W
I can't imagine that it was originally set up like that, it is more likely that someone has removed the redundancy from the system in some way possibly through ignorance of how the fault tolerance operated.
Unfortunately I can. In my experience, the original developers didn't design any resilience in. Then an infrastructure architect got hold of it and designed in all of the resilience and fail over architecture in. Then the bill payer saw the price and order it be redesigned to meet a budget 10-50% of the infrastructure architects design. Finally, a solution goes in that doesn't meet the original requirements but meets a price on the basis that the "what-if" scenarios are so rare they can be discounted until they actually happen.......

Been there, seen it, done it and got the t-shirt.

In addition, the one thing you cannot protect against fully in this situation is where the master system writes corruption to the disk. The storage devices then replicate it to the DR location as they believe the master knew what it was doing and you end up with corruption on both sites.
bbrown1664 is offline  
Old 6th Jun 2017, 17:29
  #566 (permalink)  
 
Join Date: Aug 2001
Location: se england
Posts: 1,580
Likes: 0
Received 48 Likes on 21 Posts
Kill switches are not a very good idea unless they are designed to shed electrical loads gradually which is quite possible in the same way airliners have different busses. A straight forward off switch is liable to do more damage than the reasons for using it - if there is a small fire why kill the smoke extraction system, if there's a big fire its too late to turn off the power anyway.

If someone feels the necessity for such a devise it has to be behind a guard to prevent accidental use or behind glass in a break glass emergency case and in any event it is questionable as to why one person is allowed to work alone on something of this scale in todays world especially if they are a contractor -oh there is one reason, its cheaper, wonder which policy BA adopted.
pax britanica is offline  
Old 6th Jun 2017, 18:37
  #567 (permalink)  
 
Join Date: Jan 2006
Location: Gatwick
Posts: 117
Likes: 0
Received 0 Likes on 0 Posts
Sensible idea and the reason you have the kill switches in the data rooms. You don't want to be holding a wet hose that starts spraying the 3-phase supply.

As for the smoke extraction systems, they are on a different circuit normally and the fans for those are well away from the data halls so smoke can still be extracted from a de-energised room.
bbrown1664 is offline  
Old 6th Jun 2017, 20:50
  #568 (permalink)  
 
Join Date: Aug 2007
Location: Ireland
Posts: 216
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Heidhurtin
Whereas I have spent the last 2 years disconnecting these "big red switches" whenever I find one in any of my DC's (usually in the smaller installations). There are gas suppression systems to take care of any fire situation and automatic disconnection of electrical supplies to cater for any electrical fault (down to individual circuit level).
There is simply no need for a master cut off switch to kill the whole hall. It's not as if there are big mechanical nasties whirling around and looking to cause injury to the unwary (think workshop or factory) which certainly need some form of emergency manual intervention capability.
Gas suppresion may take care of a fire but what if somebody is electrocuted in the dc. The police will come and demand all power turned off, then stick police-tape around the place and investigate it for a couple of days. Not helped by that many older raised floors are of metal. Where you planning to drag the poor soul outside before you called the emergency services?
And what about fires above or below, where the building is not only used to house computers. Certain fires demand water and the fire brigade won't touch the place if power is still on. A whole warehouse recently burned for days in Norway partially because the fire brigade wouldn't set a foot on the roof since it was covered in solar panels and they couldn't be sure there where no live currents.
vikingivesterled is offline  
Old 6th Jun 2017, 21:40
  #569 (permalink)  
 
Join Date: Jul 2011
Location: Planet Earth
Posts: 29
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by The_Steed
I find it unbelievable that BA would not have automated failover to a backup Data Centre since they are running 24x7x365 safety critical systems.
No they are not. Well not in the datacentres anyway. On the planes...maybe
Banana4321 is offline  
Old 6th Jun 2017, 23:39
  #570 (permalink)  
 
Join Date: May 2010
Location: Boston
Age: 73
Posts: 443
Likes: 0
Received 0 Likes on 0 Posts
I smell a bit of a rat around the "physical damage" from uncontrolled power on.

While it is painfully true that individual pieces of equipment may fail on power up if they were marginal the idea that switching off counting to 20 and switching on would cause widespread physical damage just does not pass the sniff test.

If it did very poorly designed, marginal and fragile internal power distribution would be indicated.
Worst case should just be some tripped breakers if the startup surge was greater than capacity.

One sees that occasional on utility service outages where a line transformer will pop it's fuse when the service is restored and the whole blocks refrigerators try to start at once.

This does not cause a damaging surge and is typically resolved with a new fuse.
MurphyWasRight is offline  
Old 7th Jun 2017, 05:43
  #571 (permalink)  
 
Join Date: Jan 2008
Location: Somewhere colder than my clothes.
Age: 61
Posts: 59
Likes: 0
Received 5 Likes on 3 Posts
Originally Posted by vikingivesterled
Gas suppresion may take care of a fire but what if somebody is electrocuted in the dc. The police will come and demand all power turned off, then stick police-tape around the place and investigate it for a couple of days. Not helped by that many older raised floors are of metal. Where you planning to drag the poor soul outside before you called the emergency services?
And what about fires above or below, where the building is not only used to house computers. Certain fires demand water and the fire brigade won't touch the place if power is still on. A whole warehouse recently burned for days in Norway partially because the fire brigade wouldn't set a foot on the roof since it was covered in solar panels and they couldn't be sure there where no live currents.
Posting from a mobile so forgive the poor formatting. Regarding the issues of electrocution - in a properly designed area the supply to the component causing the shock would be disconnected before anything fatal happened, assuming everything is correctly bonded etc. In any case the "big red button" cannot be used to protect against electrical fault or shock for obvious reasons - by the time you get to it you're already too late.

If the fire brigade, or police or any other emergency service, required the installation to be powered down in an emergency this could be done directly from the UPS and other switchgear (accepting the need to invoke disaster recovery and role swap to the backup) within 2-3 minutes - less than the likely response time of any fire service. I'll grant there could be a problem with solar panels which can't be switched off, but we don"t have these in a data hall. How would the fire service deal with UPS battery strings which can reach several hundred volts?
I restate the point - there's no need for a "big red button".
Heidhurtin is offline  
Old 7th Jun 2017, 05:49
  #572 (permalink)  
 
Join Date: Jan 2008
Location: Australia
Posts: 277
Received 225 Likes on 119 Posts
Airlines spend half the norm on IT? - The Economist

"The first lesson from such painful experiences is to refrain from pruning investment in IT too far, as some airlines may have in their desperate efforts to fend off budget competitors. “Legacy carriers like BA saw spending on this as an overhead,” says Henry Harteveldt of Atmosphere Research, a consultancy. “But it should be seen as a cost of doing business.” In 2015 airlines spent 2.7% of their revenues on IT, half the norm across all industries and a lower share even than hotels."

Scary if true.
http://www.economist.com/news/busine...ys-botches-its
artee is online now  
Old 7th Jun 2017, 08:23
  #573 (permalink)  
 
Join Date: Jul 2014
Location: England
Posts: 401
Received 1 Like on 1 Post
Originally Posted by artee
“Legacy carriers like BA saw spending on [IT] as an overhead,” says Henry Harteveldt of Atmosphere Research, a consultancy.
To be fair, that was the attitude in many companies, not only airlines, that were run by an older generation of top managers who didn't really understand the pervasive necessity of IT in the modern world. Most of those companies have learned their lesson now.
“But it should be seen as a cost of doing business.”
That's what an overhead is: a cost of doing business. The difficulty often was to get the dinosaurs in management to see IT as producing benefit, not cost only. That wasn't helped by turf wars within certain companies.

Example from a large company (not air industry), anonymised to protect the guilty: two relatively small units, because of what they did, each legitimately spent as much on IT as the whole of the rest of the company. Both fought tooth and nail against integrating their IT with the rest of the company, against counting their IT spend in the company's total IT spend and, crucially, against counting even small parts of their profits or successes as 'benefits of IT'.
"In 2015 airlines spent 2.7% of their revenues on IT, half the norm across all industries and a lower share even than hotels."
Crude percentages like that are meaningless. There's no real 'norm' across all industries – I think he must mean the average across all industries, which is meaningless too. How much does Google spend on IT? I don't know, but obviously the percentage must be high because that's most of what Google does. OTOH, as some people here may have noticed, airlines have to spend a lot of money on certain highly complex and very, very expensive equipment (clue: it's not IT*). The fact that their spend on IT is less, as a percentage of their total spend, than in other industries is hardly a surprise.

* Although, of course, modern aircraft contain a lot of embedded IT. Is the purchase and maintenance cost of that kit counted as 'spending on IT'? Turf wars over again?

Last edited by OldLurker; 7th Jun 2017 at 08:24. Reason: correction
OldLurker is offline  
Old 7th Jun 2017, 11:11
  #574 (permalink)  
 
Join Date: Aug 2001
Location: se england
Posts: 1,580
Likes: 0
Received 48 Likes on 21 Posts
Artee
While the Economist is not a reliable source about many things lets assume for once it is right about IT spending . If that is a low percentage it is probably because airlines have such massive big budget items on this like airframe depreciation and fuel that perhaps it pushes the IT percentage down compared to say a bank which spends relatively more in percentage terms on IT because it doesn't have to buy things like A 380s or spend a fortune on fuel.

The big problem though is where it is just regarded as a cost centre and not a contributor-a vital one to revenues and market development. Most established big companies have got into the rather bizarre and dangerous mind set that revenue will keep coming in because it has done within living memory and therefore what needs to be done is to control-ie cut costs. BA seem to have taken this disease it to heart since so many contributors on hear make it clear that IT is critical not just for airline Ops (important enough in itself if efficiency there leads to say fuel savings) but in managing pricing and marketing campaigns to directly influence revenues and margins.
pax britanica is offline  
Old 7th Jun 2017, 11:16
  #575 (permalink)  
Paxing All Over The World
 
Join Date: May 2001
Location: Hertfordshire, UK.
Age: 67
Posts: 10,150
Received 62 Likes on 50 Posts
Turf Wars? Oh yes! When I worked with a multi-national (stock market quoted, you would know their name) then there was an old manager in the Property department who had always managed 'the telephones'. But, when they became too big and complex and multi-sited for him to control that the IT department took them over? He had to be in the meetings and, because he controlled access to all the (14 or something) UK locations - he had to be appeased. Waste of time and effort.

Oh and the old 'DP' manager who was given the new comms responsibility didn't understand the new systems either. And he had an addiction problem, as well as being mean spirited and narrow of vision. But then, that's not so remarkable in British [so called] management.
PAXboy is offline  
Old 7th Jun 2017, 12:29
  #576 (permalink)  
 
Join Date: Jan 2008
Location: Australia
Posts: 277
Received 225 Likes on 119 Posts
Originally Posted by pax britanica
Artee
While the Economist is not a reliable source about many things lets assume for once it is right about IT spending...
You're certainly right about the relative spend in capex (and opex) intensive industries. I think the critical point is the one you make about IT being a cost centre - at one time enlightened companies saw their IT as being a competitive differentiator.
That certainly doesn't seem to be the case at BA (sadly).
artee is online now  
Old 7th Jun 2017, 12:45
  #577 (permalink)  
 
Join Date: Nov 2014
Location: Arundel
Posts: 1
Likes: 0
Received 0 Likes on 0 Posts
Coming clean

Surely there must be some definite inside information by now from the many staff that have had the heave-Ho.
Henty1 is offline  
Old 7th Jun 2017, 13:11
  #578 (permalink)  
 
Join Date: Jan 2010
Location: London
Posts: 379
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by OldLurker
Crude percentages like that are meaningless. There's no real 'norm' across all industries – I think he must mean the average across all industries, which is meaningless too. How much does Google spend on IT? I don't know, but obviously the percentage must be high because that's most of what Google does. OTOH, as some people here may have noticed, airlines have to spend a lot of money on certain highly complex and very, very expensive equipment (clue: it's not IT*). The fact that their spend on IT is less, as a percentage of their total spend, than in other industries is hardly a surprise.

* Although, of course, modern aircraft contain a lot of embedded IT. Is the purchase and maintenance cost of that kit counted as 'spending on IT'? Turf wars over again?
Read the quote: "In 2015 airlines spent 2.7% of their revenues on IT, half the norm across all industries and a lower share even than hotels."

So, taking an average across all industries smooths out all manner of variations and gives a useful indication of the amount of IT spend. And what is striking is that the airline industry, which relies on IT as much as most other industries, treats IT as a burden on the company (a cost centre) instead of being a key business enabler. You really do get what you pay for.
Caribbean Boy is offline  
Old 7th Jun 2017, 15:07
  #579 (permalink)  
 
Join Date: May 2011
Location: NEW YORK
Posts: 1,352
Likes: 0
Received 1 Like on 1 Post
If memory serves, the airline IT systems were cooperatively developed/shared among carriers, which might help explain the below normal percentage of revenue invested.
Also, there are only 2-3 major airframe, engine and systems suppliers, so again the IT burden is less.
Still surprised that the failure did not cascade into the other carriers in the BA family.
Seems full integration is not yet achieved.
etudiant is offline  
Old 7th Jun 2017, 18:24
  #580 (permalink)  
 
Join Date: Oct 2002
Location: London UK
Posts: 7,657
Likes: 0
Received 18 Likes on 15 Posts
Originally Posted by Henty1
Surely there must be some definite inside information by now from the many staff that have had the heave-Ho.
Any enhanced payoff (ie beyond statutory minimas) is normally accompanied by a clause that they will not offer any comment about their previous employer.
WHBM is offline  


Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.