Go Back  PPRuNe Forums > Misc. Forums > Passengers & SLF (Self Loading Freight)
Reload this Page >

BA delays at LHR - Computer issue

Passengers & SLF (Self Loading Freight) If you are regularly a passenger on any airline then why not post your questions here?

BA delays at LHR - Computer issue

Old 29th May 2017, 16:04
  #281 (permalink)  
None but a blockhead
 
Join Date: Nov 1999
Location: London, UK
Posts: 535
I think that rumour is credible too - absent a knowledge of how modern the BA data centre's infrastructure actually is. Many cascading faults in corporate infrastructure wouldn't happen if said infrastructure had no legacy systems; but much corporate infrastructure is heavily dependent on legacy systems. And say it didn't happen that way because it couldn't - OK, so how could a modern, well-engineered, inherently reliable system fail so badly? Because it did.
Self Loading Freight is offline  
Old 29th May 2017, 16:06
  #282 (permalink)  
 
Join Date: May 2008
Location: Paris
Age: 57
Posts: 101
Originally Posted by MG23 View Post
If BA had high-availability IT, we wouldn't be having this discussion.

That rumour seems dubious to me, but having worked with Indian outsourcing in a previous life, it's not that dubious.
I've refused contracts where one particular outsourcer was involved. No names, obviously. The practice has been in the very large systems I've worked on to keep the crown jewels at home. Functions such as security, systems design and management remain in the home territory while some application development and support can more easily be taken on by an appropriate outsourcer with the right skills. The problem is that the skills required for very high-end computing are relatively new to India. It's also a mindset. You've or, worse, caused a problem on of these machines? First step, push way from the keyboard ; second step beginning ring teammates and the boss. Deliberate. Running at the problem almost always makes it worse. Declare a disaster if needed. The latter is the problem. Everyone is afraid of that word. In decades I've seen that anything short of 737 crashing into a data center will not be treated as a disaster. Surely it can be fixed?

It isn't like that. That single message about a pointer error can proliferate rapidly and be compounded by errant efforts to airbrush it away.

Sorry for using a flight forum for going on about this, but I so enjoy reading what the flight jocks have to say that I can't help contributing from inside the climate rooms I haunt.
Nialler is offline  
Old 29th May 2017, 16:13
  #283 (permalink)  
 
Join Date: May 2011
Location: Hampshire
Age: 73
Posts: 820
While Senor Cruz is still going on about a power failure being the culprit, it struck me that everyone, including me, has been assuming this refers to the big electric, coming through the wall (230V, 440V 3KV or whatever). If any of these fails, one would expect UPS to immediately take over ( a "no break" supply).
What about the internal power suuplies driving the servers etc, ie the bits that turn the incoming electricity into 5V, 12V, 24V etc? If you let the smoke out of these, no amount of UPS back up is going to help.
Just a thought.
KelvinD is offline  
Old 29th May 2017, 16:17
  #284 (permalink)  
 
Join Date: May 2008
Location: Paris
Age: 57
Posts: 101
Originally Posted by Self Loading Freight View Post
I think that rumour is credible too - absent a knowledge of how modern the BA data centre's infrastructure actually is. Many cascading faults in corporate infrastructure wouldn't happen if said infrastructure had no legacy systems; but much corporate infrastructure is heavily dependent on legacy systems. And say it didn't happen that way because it couldn't - OK, so how could a modern, well-engineered, inherently reliable system fail so badly? Because it did.
My experience based on decades as a consultant operating directly in the fields of resilience, availability, disaster recovery and business continuity planning is that the legacy systems are always the most robust.
Nialler is offline  
Old 29th May 2017, 16:21
  #285 (permalink)  
 
Join Date: Jan 2015
Location: Centre of Universe
Posts: 315
I was at LHR on Saturday trying to use Staff Travel. Clearly with this utmost in my mind my experiences;
1. The BA Staff at check in were superb. When the back up "system" came on line they had to share PC's / work in 2-3's to get individual bookings ticketed and tagged. Accessing their systems was slow and time consuming and familiarity a issue - Overall 10 out of 10.

2. Early on they cancelled all Staff Travel to concentrate on fare paying punters - can't argue with that. It didn't materialise much - the order came from upon high - we just hung around as it was ever changing.

3. Their Manager came round giving directions and updates, he missed a couple of occasions to say thanks and I could see the opportunity was missed. This was down to pressure i'm sure but the relationship with the Manager is the number 1 on staff engagement.

4. Having checked in at STD -30 mins (having been in queue for 2hrs 30mins) finally got boarding passes and legged it to drop bags off (bag belts u/s also is that BA or BAA)
got airside - no gate on FFS just a security man helping

5. Got to gate 10B no BA staff for some time. They came and gave a few updates. After some time the BBC news TV on the gate said all flights up to 1800 cancelled. The gate staff hadn't heard this though (as comms was all down to phones). The one particular BA lady kept us updated every 30 mins or so, great communications with as much as she could give - again 10/10. When we gave up she was being hassled by about 20 pax so we just shouted well done and she thanked us.

6. Getting out was another issue. Everyone exited via gate 12 (hundreds) and only two small exit lanes. Could have gone wrong big time but old bill helped out. Quick dash through immigration and into bag hall - BA staff said don't bother so we just went out.

Apart from the disappointment / long day the only negative to me was some other BA Staff Traveller trying to "jump" the queue as his flight was going soon (just like rest).

There is always one, trying to go away on school hols with kids - well when it goes wrong you'll only do that once

And finally at my work we use mainly Indians and they are better by a country mile
Twiglet1 is offline  
Old 29th May 2017, 16:23
  #286 (permalink)  
 
Join Date: May 2008
Location: Paris
Age: 57
Posts: 101
Originally Posted by KelvinD View Post
While Senor Cruz is still going on about a power failure being the culprit, it struck me that everyone, including me, has been assuming this refers to the big electric, coming through the wall (230V, 440V 3KV or whatever). If any of these fails, one would expect UPS to immediately take over ( a "no break" supply).
What about the internal power suuplies driving the servers etc, ie the bits that turn the incoming electricity into 5V, 12V, 24V etc? If you let the smoke out of these, no amount of UPS back up is going to help.
Just a thought.
No. The UPS is more than a bank of batteries. It's an expensive piece of kit which smooths the supply during Brown outs, during spikes and during the absence of any power at all. The electrical input to a properly specced enterprise server should never fluctuate.
Nialler is offline  
Old 29th May 2017, 16:24
  #287 (permalink)  
 
Join Date: Aug 2002
Location: Europe
Posts: 3
Cost cutting...indefinitely

From Bbc News:
"Earlier this year, Mr Cruz told Skift magazine: "We're always going to be reducing costs... It's now injected into the DNA. If one particular day we don't come up with an idea to reduce our costs, then we're not doing our job."

This IT global mess is the result of the above corporate philosophy.
Lives have been ruined. Millions of pounds wasted.
Constant and indefinite cost cutting is a corporate suicide. There is a limit. BA is showing the first signs of this suicide mission. It must be stopped.
ILS27LEFT is offline  
Old 29th May 2017, 16:52
  #288 (permalink)  
 
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,349
Originally Posted by Nialler View Post
I've refused contracts where one particular outsourcer was involved. No names, obviously. The practice has been in the very large systems I've worked on to keep the crown jewels at home. Functions such as security, systems design and management remain in the home territory while some application development and support can more easily be taken on by an appropriate outsourcer with the right skills. The problem is that the skills required for very high-end computing are relatively new to India. It's also a mindset. You've or, worse, caused a problem on of these machines? First step, push way from the keyboard ; second step beginning ring teammates and the boss. Deliberate. Running at the problem almost always makes it worse. Declare a disaster if needed. The latter is the problem. Everyone is afraid of that word. In decades I've seen that anything short of 737 crashing into a data center will not be treated as a disaster. Surely it can be fixed?

It isn't like that. That single message about a pointer error can proliferate rapidly and be compounded by errant efforts to airbrush it away.

Sorry for using a flight forum for going on about this, but I so enjoy reading what the flight jocks have to say that I can't help contributing from inside the climate rooms I haunt.
This kind of crash after outsourcing and losing experienced staff has happened before. You would think that CEOs would learn from other's disasters but apparently not.


It was precisely the reason that the patched and kludged together RBS/Nat West banking system fell over...and the 'inexperienced' operatives in Hyderabad were the likely culprits in screwing up an upgrade backout.


RBS computer failure 'caused by inexperienced operative in India' - Telegraph
https://www.theregister.co.uk/2012/0...at_went_wrong/

This kind of thing should never ever happen, but if you are unaware of the particular foibles of what is otherwise a fully fault tolerant system it can be surprisingly easy to break the system when you have full SysAdmin privileges and have finger trouble trying to stop the system going down.
Ian W is offline  
Old 29th May 2017, 16:55
  #289 (permalink)  
Thread Starter
 
Join Date: Apr 2010
Location: London
Posts: 7,072
"Constant and indefinite cost cutting is a corporate suicide. There is a limit."

Not so - every organisation should alwys be looking to cut costs - working practices change, technology changes

BUT you have to do it while maintaining or improving the product - cost cutting as a sole driver is very very bad business pratice
Heathrow Harry is offline  
Old 29th May 2017, 16:56
  #290 (permalink)  
 
Join Date: Mar 2008
Location: South London
Posts: 35
Originally Posted by aox View Post
"a friend told me".
I really dislike non-attributable stories. Sure, cock-ups occur in business all the time and I've seen plenty by Accountants and Non-Accountants alike.
Tight Accountant is offline  
Old 29th May 2017, 16:58
  #291 (permalink)  
 
Join Date: Aug 2007
Location: Tullamore
Posts: 27
Originally Posted by Nialler View Post
I would dispense what that rumour as the work of someone who has little clue about large scale high-availability IT (such skills not being a prerequisite for life). Systems are not patched/tested on production environments (for the record a failback mirror site is most certainly a production system). There will be a chain of systems for testing, from a sandpit environment, through test, development, pre-production, eser acceptance testing then production itself. These type of fixes are usually completely dynamic, but those that require restarts require only that the operating system be restarted - not the actual hardware. There should be no power issues and certainly none where remotely distinct sites are involved.


Finally, given that they're still on a background of TPF, the machines running TPF are typically z-Series enterprise servers from IBM. i.e. designed with internal redundancy and with continuous uptime as one of the core aspects of their architecture. Their power requirements have shrunk from a time when, yes, the airport lights might flicker as the beast was woken up, through to today's models, which are CMOS based and run off little more than a kettle connection. The meantime between failures on these machines is measured in years. They do not fail in the type of circumstances described.


Thanks for posting it, though.
Hi Nialler. The rumour is not suggesting that the patch itself was faulty, just that the restart procedure was inadequately careful.

BA have no TPF neither in their own site nor, if Amadeus are to believed, in the underlying Amadeus architecture. This, I'm afraid, is all 'modern' stuff with hundreds of boxen performing trivial proportions of the overall workload.

If the fix is the SMB fix for WannaCry to the server, it could require an OS restart, not just an appliaction restart (depending on the OS). Even if it didn't, hundreds of applications starting will draw more power as they reload, rebuild caches etc.

Anyway, the whole thing is so unclear, and this from a man who claims to be digital to the core, that you have to think it was something enormously f'd up.
yoganmahew is offline  
Old 29th May 2017, 16:59
  #292 (permalink)  
 
Join Date: Mar 2014
Location: UK
Age: 71
Posts: 54
Originally Posted by ILS27LEFT View Post
Constant and indefinite cost cutting is a corporate suicide. There is a limit. BA is showing the first signs of this suicide mission. It must be stopped.
Yes, our whole species is undergoing this new philosophy. And is failing, but nobody notices ... what people notice are the promises of extreme savings and extreme profits.
At the same time the lack of challenge in our society is creating a new level of incompetence. Not only is there incompetence, but nobody really cares. Conscience is a long way away.
Does anyone have a goal except mortgage payments and facebook?
Without a goal there is no reason to do any more than this.
Why did Victorian engineers build so well ?
What was inside them that is not inside this generation ?
Quite a lot, methinks.
rideforever is offline  
Old 29th May 2017, 16:59
  #293 (permalink)  
 
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,349
Originally Posted by Nialler View Post
No. The UPS is more than a bank of batteries. It's an expensive piece of kit which smooths the supply during Brown outs, during spikes and during the absence of any power at all. The electrical input to a properly specced enterprise server should never fluctuate.
Many of the major systems I have dealt with run on batteries all the time. The choice is which system: grid, standby grid, standby generators to trickle charge the batteries.
Ian W is offline  
Old 29th May 2017, 16:59
  #294 (permalink)  
 
Join Date: Mar 2008
Location: South London
Posts: 35
Originally Posted by Ian W View Post
It was precisely the reason that the patched and kludged together RBS/Nat West banking system fell over...and the 'inexperienced' operatives in Hyderabad were the likely culprits in screwing up an upgrade backout.

RBS computer failure 'caused by inexperienced operative in India' - Telegraph
https://www.theregister.co.uk/2012/0...at_went_wrong/
Ian - I don't know your background but I understand that RBS has a whole host of legacy systems which need considerable TLC. It will be interesting to understand whether legacy systems fell over at BA; I suspect not.

Last edited by Tight Accountant; 29th May 2017 at 17:01. Reason: Grammar and improved clarity.
Tight Accountant is offline  
Old 29th May 2017, 17:01
  #295 (permalink)  
 
Join Date: May 2008
Location: Paris
Age: 57
Posts: 101
Originally Posted by Ian W View Post
This kind of crash after outsourcing and losing experienced staff has happened before. You would think that CEOs would learn from other's disasters but apparently not.


It was precisely the reason that the patched and kludged together RBS/Nat West banking system fell over...and the 'inexperienced' operatives in Hyderabad were the likely culprits in screwing up an upgrade backout.


RBS computer failure 'caused by inexperienced operative in India' - Telegraph
https://www.theregister.co.uk/2012/0...at_went_wrong/

This kind of thing should never ever happen, but if you are unaware of the particular foibles of what is otherwise a fully fault tolerant system it can be surprisingly easy to break the system when you have full SysAdmin privileges and have finger trouble trying to stop the system going down.
I've seen the complete history which led to the problem. The problem was most certainly not one caused by outsourcing. The ad referenced in the article is for an experienced admin. That is way below the level where this problem occurred. I've spoken with the principals involved.
Nialler is offline  
Old 29th May 2017, 17:08
  #296 (permalink)  
 
Join Date: May 2008
Location: Paris
Age: 57
Posts: 101
Originally Posted by Ian W View Post
Many of the major systems I have dealt with run on batteries all the time. The choice is which system: grid, standby grid, standby generators to trickle charge the batteries.
Exactly. They're not just there for hard outages. In the past, with older mainframes, I have wanted to limit the Ups to twenty minutes. If air-handling is gone too a system will cook itself.
Nialler is offline  
Old 29th May 2017, 17:08
  #297 (permalink)  
 
Join Date: Aug 2007
Location: Tullamore
Posts: 27
Originally Posted by Tight Accountant View Post
Ian - I don't know your background but I understand that RBS has a whole host of legacy systems which need considerable TLC. It will be interesting to understand whether legacy systems fell over at BA; I suspect not.
Once it has been in long enough for the original staff to have moved on to other projects, it is legacy. If it's 5 years old, it's probably legacy. That legacy can extend back for many years beyond that really just increases the difficulty of finding someone who really understands it!
yoganmahew is offline  
Old 29th May 2017, 18:36
  #298 (permalink)  
 
Join Date: Jul 2007
Location: Hadlow
Age: 57
Posts: 594
Pilot drove cancer sufferer home.

British Airways boss refuses to resign as Heathrow endures third day of disruption

Last edited by Super VC-10; 29th May 2017 at 18:37. Reason: reason for link
Super VC-10 is offline  
Old 29th May 2017, 18:43
  #299 (permalink)  
 
Join Date: Oct 2002
Location: London UK
Posts: 6,625
Originally Posted by Tight Accountant View Post
Is the RBS event (which everyone is referring to), the one where they were fined 42m (thereabouts) by the Regulator?
Speaking of the Regulator, I've yet to hear any comment by the Secretary of State for Transport about this major failure. Although individual MPs are currently not in Parliament while the election goes on, the Ministers continue until the next government, and continue to draw their hefty salaries for the responsibility.

So where's the Rt Hon Mr Grayling then ?
WHBM is offline  
Old 29th May 2017, 18:48
  #300 (permalink)  
 
Join Date: Jan 2006
Location: England
Posts: 145
And where's Willie?
Stoic is offline  

Thread Tools
Search this Thread

Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service - Do Not Sell My Personal Information -

Copyright 2021 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.