PPRuNe Forums - View Single Post - BA delays at LHR - Computer issue
View Single Post
Old 2nd Jun 2017, 19:16
  #480 (permalink)  
fchan
 
Join Date: Apr 2007
Location: UK
Posts: 38
Likes: 0
Received 0 Likes on 0 Posts
At the Kegworth incident engine X lost power due to mechanical failure. Pilot looked at the poorly designed instruments and decided engine Y was the culprit so shut it down. Later, realising the error, he tried to relight engine Y but too late to avoid the ground short of the diversion field.

To translate that to the current IT incident what about this possible scenario?

Maintainer sees that UPS X has fully or partially failed as they occasionally do. He did not have to do anything as UPS Y continues smooth ops. But it’s good practice to investigate and fix (but not on a Bank Hol) as losing a second one may be serious. So he goes to shut it down whilst he works on it. He accidentally shuts down UPS Y after misreading the diagnosis screen/some labels on the switches etc. Or the maintenance screen gives the wrong info (note it was all upgraded quite recently so an error in logic/labelling may have recently been introduced). Now with 2 down warning alarms and messages are seriously starting to sound and maybe a third UPS is rapidly draining its batteries whilst taking the entire load. So he panics and turns the wrong switches to get UPS Y back on air quickly. Or the reconnect sequence is partly or fully computer controlled and gets it wrong. Damage ensues for reasons stated elsewhere here although I don’t really see how a well-designed system would cause this.

Does not explain why data centre 2 did not take over. IT not my thing but UPSs and reliability modelling are.

Last edited by fchan; 5th Jun 2017 at 10:31.
fchan is offline