PPRuNe Forums - View Single Post - BA delays at LHR - Computer issue
View Single Post
Old 27th May 2017, 21:11
  #85 (permalink)  
Ian W
 
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by xyzzy
I did operational IT for a living, rising to CIO, before I went into academia. I got bored with being told that filesystems and databases wouldn't stand sudden stops (ACID properties, right?). I was expected to buy exotic database products from Larry Ellison, tended by smug contractors who had a million and one reasons Postgres just wouldn't do. So I made it a point of acceptance testing from development into production that the systems I was expected to run had to survive a sudden stop. Salesmen from Oracle and NetApp talk about journalling, so let's see it: we're going to flip the power at the time of our choice during your testing, and your product will survive it, and we'll do it again a few times for fun, or you can all go back to your offices and fix it. It's not the 1990s, and fsck isn't a thing any more. They'd whinge and whine that I should be doing an orderly shutdown, but I genuinely meant "I will go into the development lab and flip switches at random".

I flushed out any number of problems with this approach.
I can remember a 7 day 24 hrs a day test cycle where we were running the system under load then going in and deleting primary then backup processes and confirming that the system kept running, then crashing hardware and confirming that the system kept running and that the error messages led the support engineer to the right fault and the documented recovery worked.
Like you I had to insist on that level of testing including overload testing.

Anson Harris
Perhaps the management should read this thread for some expert advice on how to run their IT systems - it seems that the world's supply of experts' opinions are here for the taking.
I don't know that it is the world's supply of opinions. There was a time that this level of understanding was common knowledge. Unfortunately, in the same way that manual flying skills are not valued by MBA management the same applies to architecture and design of computer systems that (as has been shown) are essential for the company operations.

There is NO excuse for a company the size of BA / IAG not to have mirrored redundant systems, ideally three, in widely separated locations. The systems should all be sized to be able to support the the entire operation so can take over 'standalone' if necessary. Under normal operations they load share providing excellent transaction times.

It beggars belief that any modern company would put all its IT eggs in one basket dependent on one power supply or switchover gear. Yet now we have had both BA and Delta exhibit the same lack of foresight. With the EU rules on compensation this will be a huge cost to BA. They could have had a reliable computer system for a lot less than it will cost them.
Ian W is offline