PPRuNe Forums - View Single Post - BA delays at LHR - Computer issue
View Single Post
Old 31st May 2017, 12:42
  #406 (permalink)  
bbrown1664
 
Join Date: Jan 2006
Location: Gatwick
Posts: 117
Likes: 0
Received 0 Likes on 0 Posts
I have worked in IT for about 30 years and have some experience with regard DR planning and requirements.

Datacentres are generally, as has been said already, fed by 2 or more diverse feeds from the grid. These then go via a UPS which is either in parallel or in series with the power feed. The major difference here is that the UPS can be used for removal of power spikes etc if it is in series and reduces the need for another switch that will flick in should the power fail.

If the input power does fail, the UPS will provide power to the building for anything from 5 minutes to several hours. This is dependant on the servers etc being powered. BAckup generators should kick in as soon as power fails. This gives them 5 minutes or more to warm up and settle down before the UPS batteries run out of power.

Assuming all of the above works, power then continues to be provided for the rest of time as long you have a plan in palce to refil the diesel tanks of the generators before they run dry. In reality, mains power comes back on line, the generators turn themselves off and normality is restored.

What tends to happen though is that the backup generator fails to start, the UPS batteries have not been replaced in the last 10 years and when the mains power goes down, you have all systems falling over in a heap. This is the time to switch to the DR data centre.

This is where the real problems begin. All too often this has not been tested properly. Systems have different RTO (recovery time objectives) which can vary from instant to weeks and many people don't realise that system#1 which has an instant RTO is and but cannot run as it is dependant on system#73 which is on a 3 day RTO. Not only that, when they tested it, the production system didn't fail, it was cleanly shut down, meaning that you now have corrupted data in system#1 and that brings an whole new head ach for people to sort out.

There is a common phrase I use which I don't think the forum software will allow but here it is in another format.

In IT, Sierra-Hotel-India-Tango happens.
bbrown1664 is offline