Go Back  PPRuNe Forums > Misc. Forums > Passengers & SLF (Self Loading Freight)
Reload this Page >

BA delays at LHR - Computer issue

Wikiposts
Search
Passengers & SLF (Self Loading Freight) If you are regularly a passenger on any airline then why not post your questions here?

BA delays at LHR - Computer issue

Thread Tools
 
Search this Thread
 
Old 5th Jun 2017, 18:15
  #541 (permalink)  
 
Join Date: Aug 2007
Location: inv
Posts: 348
Likes: 0
Received 2 Likes on 1 Post
British Airways IT chaos was caused by human error - BBC News

now saying someone puled a plug out
scr1 is offline  
Old 5th Jun 2017, 18:56
  #542 (permalink)  
 
Join Date: Aug 2001
Location: se england
Posts: 1,578
Likes: 0
Received 48 Likes on 21 Posts
That could be true but it would be ludicrous , there isnt a big red switch or a 3 pin plug that youcan turn off/pull out on these things.

If something like that happened what does it say for BA security and risk management thata low level contractor can shut down the entire airline with one mistake or maybe even deliberate act. farcical. Any mission critical facility of that scale and class should never allow one individual workign alone , especially where complex or HV power is involved just for safety reasons let alone business protection. How does the business know that one guy isnt , drunk, drugged, dumped by wife /girl. Facing a disciplinary, bribed, malicious etc etc etc A complete joke
pax britanica is offline  
Old 5th Jun 2017, 19:10
  #543 (permalink)  
 
Join Date: Aug 2013
Location: Devon
Posts: 6
Likes: 0
Received 0 Likes on 0 Posts
There is only 1 UPS?

[sarc]
dmsims is offline  
Old 5th Jun 2017, 19:40
  #544 (permalink)  
 
Join Date: Jan 2008
Location: UK
Posts: 8
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by pax britanica
That could be true but it would be ludicrous , there isnt a big red switch
There will be a big red switch somewhere to cut all power in the event of (for instance) a fire. This will bring all power down in an uncontrolled manner leading to all sorts of bad stuff. Activating it when the building isn't actually on fire would be a Very Bad Thing to do.

Similarly, just turning everything back on again and assuming that it will all just start up and resume would be a Very Bad Thing.
SteppenHerring is offline  
Old 5th Jun 2017, 20:45
  #545 (permalink)  
 
Join Date: Jun 2000
Location: last time I looked I was still here.
Posts: 4,507
Likes: 0
Received 0 Likes on 0 Posts
There will be a big red switch somewhere to cut all power in the event of (for instance) a fire. This will bring all power down in an uncontrolled manner leading to all sorts of bad stuff.

I'm not an IT wizzkid, more a dinosaur. Please excuse: but as an engineer/pilot of sorts I would like to think that in the event of having to evacuate and switch off a working system, by a 'kill switch' it would not be harmful and uncontrolled. The expectation would be that it would be used again after the emergency. Why do you say it would not be protected and "lead to all sorts of bad stuff."? Back to the topic, before this becomes an IT classroom, why would BA have a system that was so vulnerable to an emergency shutdown?
RAT 5 is offline  
Old 5th Jun 2017, 20:47
  #546 (permalink)  
 
Join Date: Mar 2010
Location: The Home of the Gnomes
Posts: 412
Likes: 0
Received 3 Likes on 2 Posts
A kill switch is a great idea. Especially when there is an entirely separate backup system in a different location.

Hang on...
Tay Cough is offline  
Old 5th Jun 2017, 21:31
  #547 (permalink)  
 
Join Date: Jan 2008
Location: UK
Posts: 8
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by RAT 5
I'm not an IT wizzkid, more a dinosaur. Please excuse: but as an engineer/pilot of sorts I would like to think that in the event of having to evacuate and switch off a working system, by a 'kill switch' it would not be harmful and uncontrolled.
That's why the Big Red Button is never to be used. It's there to cut all power immediately to protect (for instance) the fire brigade. In aircraft terms, it's like hitting the engine fire extinguisher.

The difference it that the aircraft engine isn't throwing tons of information at several hundred other engines at the time.
SteppenHerring is offline  
Old 5th Jun 2017, 21:56
  #548 (permalink)  
 
Join Date: Jul 2011
Location: Planet Earth
Posts: 29
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by SteppenHerring
That's why the Big Red Button is never to be used. It's there to cut all power immediately to protect (for instance) the fire brigade. In aircraft terms, it's like hitting the engine fire extinguisher.

The difference it that the aircraft engine isn't throwing tons of information at several hundred other engines at the time.
Agreed.

It's a little bit more complicated than that.
Banana4321 is offline  
Old 6th Jun 2017, 01:24
  #549 (permalink)  
Paxing All Over The World
 
Join Date: May 2001
Location: Hertfordshire, UK.
Age: 67
Posts: 10,143
Received 62 Likes on 50 Posts
This from BBC:
Willie Walsh. chief executive of IAG, said an engineer disconnected a power supply, with the major damage caused by a surge when it was reconnected.
He said there would now be an independent investigation "to learn from the experience". However, some experts say that blaming a power surge is too simplistic.

Mr Walsh, appearing at an annual airline industry conference in Mexico on Monday, said: "It's very clear to me that you can make a mistake in disconnecting the power. "It's difficult for me to understand how to make a mistake in reconnecting the power," he said.

He told reporters that the engineer was authorised to be in the data centre, but was not authorised "to do what he did".
Also on Monday, Mr Walsh apologised again for the incident, saying: "When you see customers who suffered, you wouldn't want it to happen to any airline or any business."

He added: ""I wouldn't suggest for one minute we got communications right at BA, we didn't."
I wonder if they have an 'emergency planning department' or ever waste their time on 'role play' and 'testing' ...
PAXboy is offline  
Old 6th Jun 2017, 01:35
  #550 (permalink)  
aox
 
Join Date: Mar 2015
Location: UK
Posts: 227
Received 0 Likes on 0 Posts
"very clear"

Why is Mr Walsh using one of Theresa May's catchphrases?
aox is offline  
Old 6th Jun 2017, 01:35
  #551 (permalink)  
 
Join Date: Jan 2010
Location: London
Posts: 379
Likes: 0
Received 0 Likes on 0 Posts
Perhaps Willie Walsh would like to revise his full backing for Alex Cruz who in the first two days of the crisis was nowhere to be seen except in videos wearing a high-vis jacket in a room full of computers.
Caribbean Boy is offline  
Old 6th Jun 2017, 04:59
  #552 (permalink)  
 
Join Date: Oct 2002
Location: London UK
Posts: 7,648
Likes: 0
Received 18 Likes on 15 Posts
Originally Posted by Caribbean Boy
Perhaps Willie Walsh would like to revise his full backing for Alex Cruz who in the first two days of the crisis was nowhere to be seen except in videos wearing a high-vis jacket in a room full of computers.
Hasn't Willie done that already ?

He added: ""I wouldn't suggest for one minute we got communications right at BA, we didn't."
I presume as Chief Exec Alex Cruz is as responsible for communications as for other aspects.
WHBM is offline  
Old 6th Jun 2017, 09:28
  #553 (permalink)  
 
Join Date: Jan 2006
Location: Gatwick
Posts: 117
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by pax britanica
That could be true but it would be ludicrous , there isnt a big red switch or a 3 pin plug that youcan turn off/pull out on these things.

If something like that happened what does it say for BA security and risk management thata low level contractor can shut down the entire airline with one mistake or maybe even deliberate act. farcical. Any mission critical facility of that scale and class should never allow one individual workign alone , especially where complex or HV power is involved just for safety reasons let alone business protection. How does the business know that one guy isnt , drunk, drugged, dumped by wife /girl. Facing a disciplinary, bribed, malicious etc etc etc A complete joke
For safety reasons, EVERY data room in EVERY data centre will have big red push stops around the walls to kill the power in an emergency. This kills everything instantly and does not shut anything down graciously. As a result, bringing things back up after an EMERGENCY STOP button has been activated can take a lot longer than if you had shut things down cleanly.

I personally have hit the big red button accidentally in the past due to it being un-guarded and being placed right at waist height next to a rack I had to move. IT happens in IT. Not much you can do as the engineer in the room when it all goes quiet except for apologise and help start bringing things back up again.

Last edited by bbrown1664; 6th Jun 2017 at 10:20. Reason: typo
bbrown1664 is offline  
Old 6th Jun 2017, 09:42
  #554 (permalink)  
 
Join Date: Dec 2011
Location: London
Posts: 7
Likes: 0
Received 0 Likes on 0 Posts
Fred, can you check your email again...ta.

Anyone out there have first hand experience of how you switch these systems on and off. Drop me a line [email protected] (Transport Correspondent)
RichardBeeb is offline  
Old 6th Jun 2017, 09:58
  #555 (permalink)  
Thread Starter
 
Join Date: Apr 2010
Location: London
Posts: 7,072
Likes: 0
Received 0 Likes on 0 Posts
"bringing things back up after an EMERGENCY STOP button has been activated can take a lot longer than if you had shut things down cleanly."

see my post on the Forties Oil Platform - 2 minutes to close down - 3 days to bring back up.... as bbrown says the clue is in the word "EMERGENCY".....................
Heathrow Harry is offline  
Old 6th Jun 2017, 10:44
  #556 (permalink)  
 
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by scr1
From that article:
However, an email leaked to the media last week suggested that a contractor doing maintenance work inadvertently switched off the power supply.The email said: "This resulted in the total immediate loss of power to the facility, bypassing the backup generators and batteries... After a few minutes of this shutdown, it was turned back on in an unplanned and uncontrolled fashion, which created physical damage to the systems and significantly exacerbated the problem."
This is an admission that the BA system was not designed as a fault tolerant system. It should not be possible to fail a distributed fault tolerant system by failing one data center however untidily. Similarly, by definition an untidy restart that caused various failures in an already 'failed' data center should be completely transparent to users just extend the length of time for that data center to be brought back up.

I can remember walking around doing acceptance testing in a system that _was_ fault tolerant randomly failing servers, disk drives boards within servers power supplies etc. and the system just kept going as it was designed to. The BA system was obviously not designed to be fault tolerant. Or the system had been put into a state where it was not fault tolerant by people not knowing what they were doing.
Ian W is offline  
Old 6th Jun 2017, 10:45
  #557 (permalink)  
 
Join Date: Jan 2008
Location: Somewhere colder than my clothes.
Age: 61
Posts: 59
Likes: 0
Received 5 Likes on 3 Posts
Originally Posted by bbrown1664
For safety reasons, EVERY data room in EVERY data centre will have big red push stops around the walls to kill the power in an emergency. This kills everything instantly and does not shut anything down graciously. As a result, bringing things back up after an EMERGENCY STOP button has been activated can take a lot longer than if you had shut things down cleanly.

I personally have hit the big red button accidentally in the past due to it being un-guarded and being placed right at waist height next to a rack I had to move. IT happens in IT. Not much you can do as the engineer in the room when it all goes quiet except for apologise and help start bringing things back up again.
Whereas I have spent the last 2 years disconnecting these "big red switches" whenever I find one in any of my DC's (usually in the smaller installations). There are gas suppression systems to take care of any fire situation and automatic disconnection of electrical supplies to cater for any electrical fault (down to individual circuit level).
There is simply no need for a master cut off switch to kill the whole hall. It's not as if there are big mechanical nasties whirling around and looking to cause injury to the unwary (think workshop or factory) which certainly need some form of emergency manual intervention capability.

Without repeating other posts, I agree about an uncontrolled restart causing damage though, seen that one personally.
Heidhurtin is offline  
Old 6th Jun 2017, 11:09
  #558 (permalink)  
 
Join Date: Dec 2011
Location: London
Posts: 7
Likes: 0
Received 0 Likes on 0 Posts
Uncontrolled restarts

How does an restart work though? How easy is it to get it wrong? Wrong sequence?
RichardBeeb is offline  
Old 6th Jun 2017, 11:26
  #559 (permalink)  
 
Join Date: Apr 2007
Location: Aberdeen
Posts: 56
Likes: 0
Received 0 Likes on 0 Posts
Even if you take the excuse at face value (that someone hit the big red button) it still doesn't explain why their systems didn't failover to the backup Data Centre.

I find it unbelievable that BA would not have automated failover to a backup Data Centre since they are running 24x7x365 safety critical systems. The pertinent data should be replicated in real-time, so apart from a brief interruption, there shouldn't have even been any indication to the Users that anything happened.

Putting that to one side, they should still have been able to get things back up and running again pretty quickly. I would have expected BA to have pretty modern kit, so I would have thought the biggest risk would be from data corruption rather than hardware issues. Then it's just a case of restoring from backup - which is easy because you test that process on a regular basis
The_Steed is offline  
Old 6th Jun 2017, 11:28
  #560 (permalink)  
 
Join Date: Sep 2015
Location: UK
Posts: 110
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Ian W
From that article:
The BA system was obviously not designed to be fault tolerant. Or the system had been put into a state where it was not fault tolerant by people not knowing what they were doing.
Here's a quote from a Register article, which leads to some obvious conclusions:
However, within the comments of the BA chief executive there is one telling statement:
Tens of millions of messages every day that are shared across 200 systems across the BA network and it actually affected all of those systems across the network.
Sorry for the text speak, but WTF? How does it require 200 systems to issue a boarding pass, check someone in and pass their security details on to the US – even if they aren't going there? Buried deep in The Register comments on the article is an allegedly former BA employee claiming that this is in fact the case, that all of these systems are required for BA to function. How did BA get to the point that there are 200 systems in the critical path?
Source: https://www.theregister.co.uk/2017/0...path_analysis/
Joe_K is offline  


Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.