PPRuNe Forums - View Single Post - BA delays at LHR - Computer issue
View Single Post
Old 27th May 2017, 16:51
  #31 (permalink)  
Ian W
 
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Robin757
This raises the fundamental issue that businesses are becoming increasingly reliant on large scale IT infrastructure to run their businesses (Network Rail is going that way for running trains for example). When they go wrong you are totally helpless and the damage it does to the businesses and the inhuman nightmare it causes to people are incalculable. Some fundamental questions about this need to be address including is it all possible to "go manual" for some basic functions for example to at least let some planes fly. Otherwise, fingers crossed.
What this, and other recent failures at other airlines, shows is a lack of professionalism in their IT departments. It is completely possible to build redundant, reliable, completely fault tolerant systems. This was being done in the 1970s as the hardware and software was far less reliable, systems had to be built accepting that they _would_ fail. So they were also built to first gracefully degrade and then to recover rapidly. There was an entire body of knowledge on failsafe and fault-tolerant 'non-stop' systems and systems design with both hardware and software fault tolerance. All this has now been thrown away and instead reliance placed on the greater reliability of computer systems.

This is a direct parallel to flight crew who now rely on the system and are unable to takeover manually if the 'systems' fail. There are now IT professionals who have no idea how to write fault tolerant software and are so far away from the hardware that they often have no idea how to match hardware and software fault recovery. So now the systems tend to stay up longer but when you have a failure you have the computing equivalent of AFR447 designed into the system. It is not only aviation, we have seen major Cloud Computing providers have their 'clouds' fail in the last year.

There will be more of these complete machine failures unless the IT world relearns how to provide fault tolerance and non-stop systems.
Ian W is offline