Go Back  PPRuNe Forums > Misc. Forums > Passengers & SLF (Self Loading Freight)
Reload this Page >

BA delays at LHR - Computer issue

Passengers & SLF (Self Loading Freight) If you are regularly a passenger on any airline then why not post your questions here?

BA delays at LHR - Computer issue

Old 27th May 2017, 15:38
  #21 (permalink)  
Thread Starter
 
Join Date: Apr 2010
Location: London
Posts: 7,072
Likes: 0
Received 0 Likes on 0 Posts
British Airways has cancelled all flights from Heathrow and Gatwick because of computer problems.

A "major IT system failure is causing very severe disruption to our flight operations worldwide", the airline said.

It apologised for the "global system outage" and said it was "working to resolve the problem".

Heathrow Airport said it was "working closely" with BA to solve the issue.

There is no evidence at this stage to suggest the system failure was caused by a cyber attack, BA told BBC business correspondent Joe Lynam.

All passengers affected by the failure - which coincides with the first weekend of the half-term holiday for many in the UK - will be offered the option of rescheduling or a refund.

The airline, which had previously said flights would be cancelled until 18:00 BST, has now cancelled all flights for Saturday and asked passengers not to come to Gatwick or Heathrow airports. Other airlines flying in and out of Heathrow and Gatwick are unaffected.


Piles of checked luggage could be seen on the floor in the Heathrow Some passengers have reported having to leave Heathrow without their luggage

The problems mean parts of BA's website are unavailable and some travellers claimed they could not check in on the mobile app.

A senior aviation figure - who wished to remain anonymous - said a failure of this magnitude was "extraordinary" and "rarely seen in the aviation industry".

BA aircraft landing at Heathrow are unable to park up as outbound aircraft cannot vacate the gates, which has resulted in passengers being stuck on aircraft.

Journalist Martyn Kent said he had been sitting on a plane at Heathrow for 90 minutes. He said the captain told passengers the IT problems were "catastrophic".

BA staff in Heathrow's Terminal 5 were resorting to using white boards, according to passenger Gareth Wharton
.
Delays have been reported in Rome, Prague, Milan, Stockholm and Malaga due to the system failure.

Philip Bloom said he had been waiting on board a Heathrow-bound flight at Belfast for two hours. He added: "We haven't been told very much just that there is a worldwide computer system failure. "We were told that we couldn't even get on other flights because they are unable to see what flights we can be moved to."

An


With a lack of technology, staff were using whiteboards in Heathrow As ever, it's the lack of information that's really making passengers angry. The GMB union says the problem could have been avoided if BA hadn't made hundreds of IT staff redundant and outsourced their jobs to India at the end of last year. Yes, the union has a big axe to grind, but still, people will want to know if BA made its IT systems more vulnerable by scaling back computer support to save money.

If planes can't take off, they can't leave gaps at the gate for others to land. If flights are delayed by more than about five hours, the airline must swap crews because shift lengths are strictly limited for safety reasons. Telling customers to stay away is a drastic measure, but it's the only chance BA has off clearing the backlog of flights.

The BBC's Phillip Norton was at Rome International airport, waiting to fly to London. He said BA staff were unable to say how long delays would be, telling him "all flights are grounded around the world". Alma Saffari was in Marseille waiting to get her flight back to Heathrow. She said: "When we finally boarded the captain came out and told us their computer systems were down worldwide. "Eventually after sitting on the tarmac for one and a half hours we disembarked the plane. Now we are sitting in the departure area outside the gate."

Ms Saffari, who is with her 13-month-old baby, said she had been given a voucher for food and drink.
Heathrow Harry is offline  
Old 27th May 2017, 15:51
  #22 (permalink)  
 
Join Date: May 2008
Location: Paris
Age: 60
Posts: 101
Likes: 0
Received 0 Likes on 0 Posts
It failures

A word of caution:I wouldn't automatically hit the "outsourcing" button here. Lots of core systems remain domestically controlled. RBS was a classic example of this. As with any aviation incident I'd counsel against jumping to conclusions.

That said, there is far less answerability in the banking industry than in aviation. Time, maybe, to treat an IT outage as a transport safety issue with a full published enquiry? I think so, and would urge the same scrutiny on other industries.

I speak from the perspective of one with over thirty years experience on very large scale critical systems. I may even be part of the problem.
Nialler is offline  
Old 27th May 2017, 15:56
  #23 (permalink)  
 
Join Date: Oct 2005
Location: La Napoule
Posts: 149
Likes: 0
Received 0 Likes on 0 Posts
Very poor that no one from the Management Board has 'fronted up' to this mess.

Must be awful for the staff and more so for the pax.

Beeb have just wheeled out their resident expert (Mr Bray) to confuse the issue even more.

Not good for the Brand. Hashtag 'damagelimitation'
Binder is offline  
Old 27th May 2017, 16:05
  #24 (permalink)  
 
Join Date: Jan 2006
Location: Cyprus
Age: 76
Posts: 270
Likes: 0
Received 0 Likes on 0 Posts
I am told that the systems are so interlinked that they can't even cancel individual flights hence the mass cancellations. I just hope they have got a good backup program because the system holds pax bookings for 364 days and all vital engineering records are electronic
Walnut is offline  
Old 27th May 2017, 16:07
  #25 (permalink)  
 
Join Date: Mar 2010
Location: Yorkshire Dales
Posts: 28
Likes: 0
Received 0 Likes on 0 Posts
This raises the fundamental issue that businesses are becoming increasingly reliant on large scale IT infrastructure to run their businesses (Network Rail is going that way for running trains for example). When they go wrong you are totally helpless and the damage it does to the businesses and the inhuman nightmare it causes to people are incalculable. Some fundamental questions about this need to be address including is it all possible to "go manual" for some basic functions for example to at least let some planes fly. Otherwise, fingers crossed.
Robin757 is offline  
Old 27th May 2017, 16:20
  #26 (permalink)  
 
Join Date: Jun 1999
Location: world
Posts: 3,424
Likes: 0
Received 0 Likes on 0 Posts
The "cost cutting" bug is finally biting!
Hotel Tango is offline  
Old 27th May 2017, 16:22
  #27 (permalink)  
 
Join Date: Jun 2001
Location: Rockytop, Tennessee, USA
Posts: 5,898
Likes: 0
Received 1 Like on 1 Post
Originally Posted by GPU
The 404 error that was coming up suggests significant tomfoolery.
Don't know the cause but over here in the U.S. the ba.com site comes up OK but the flight status page yields this enlightening message:

Error 404--Not Found

From RFC 2068 Hypertext Transfer Protocol -- HTTP/1.1:

10.4.5 404 Not Found

The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent.

If the server does not wish to make this information available to the client, the status code 403 (Forbidden) can be used instead. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address.
This message is like the man in a balloon joke, technically correct but of little value to the end user.

Looking at FR24, several BA international inbounds are parked with the power on, apparently still waiting to offload pax and several more (including BOM, DME, LCA and BKK) are airborne inbound.

A few narrowbodies appear to still be inbound to LGW including FNC, FAO and VCE.

A wave of BA arrivals will soon land in the U.S., in time for local reporters to cover the scene at baggage claim and customs if the onward connections were hit by the computer outage. If the luggage tags were printed while the computers were working would everything still be routed OK on arrival?
Airbubba is offline  
Old 27th May 2017, 16:36
  #28 (permalink)  
 
Join Date: Aug 2001
Location: se england
Posts: 1,570
Likes: 0
Received 44 Likes on 20 Posts
Of course ina crisis no senior management will be around to talk to the media much less the poor pax. Sounds a complete nightmare. All sensible UK companies gave up outsourcing to India over the last few years because you lose control in these situations so it sounds like the BA CEo ex Sr Vuelling is responsible as he is a rabid cost cutter and frankly should be sacked since this will be a decsion at his level not the head of IT who would have put the proposal forward under instruction.

As along time global telecoms guy I can tell you same maze of copper wires ,optical fibres telco buildings and data centres that form the physical infrastructure layer that knits the internet and mega systems like airline ops and reservations together . There is no magic 'cloud' it is the same old stuff that has always been used for shipping messages around and it is not quite fit for purpose for the internet world yet . It runs faster and on a much bigger scale but it doesnt have sufficient resilience or redundancy to provide 100% assurance. In the language of this forum the Airlines are operating at 180 or 240 mins ETOPs with planes that should only be certified for 60mins . It will be a few more years and a lot more investment before we start seeing these global glitches for big corporations diminish
pax britanica is offline  
Old 27th May 2017, 16:42
  #29 (permalink)  
 
Join Date: Apr 2002
Location: UK
Posts: 1,221
Received 9 Likes on 7 Posts
Many years ago I was waiting for a Shuttle (a Trident!) flight at T1 and the staff were muttering about a computer failure. Then one of the said "we'll have to do it manually" and they started digging around in drawers only to be stopped in their tracks by someone calling "it's back". Huge sighs of relief and we went but a couple of minutes late.

I'm a little surprised there is no longer a manual backup capability.

On second thoughts maybe I'm not. These days, with electronic tickets if you can't access that ticket database how do you prove the passenger is booked and ticketed?
Hartington is offline  
Old 27th May 2017, 16:43
  #30 (permalink)  
 
Join Date: May 2008
Location: Paris
Age: 60
Posts: 101
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Walnut
I am told that the systems are so interlinked that they can't even cancel individual flights hence the mass cancellations. I just hope they have got a good backup program because the system holds pax bookings for 364 days and all vital engineering records are electronic
This is classic stuff. The public demands ever increasing functionality which ends up as bolt-ons to legacy systems. The multiple layers between the core system and the application and presentation layers introduce multiple single failure opportunities , undermining the redundancy built into the core.

All the while, the systems grow so tediously complex that a BCP (business continuity plan) becomes unmaintainable. Reversion to manual systems was up until fairly recently possible. Boarding cards could be hand-written and cross checked against a common locally hosted list. I actually had this happen to me about twenty years ago in Dublin when I was on my to a (irony of ironies) BCP conference. Disaster Recovery in the IT sense is about far more than restore of a system. It is also about having the ability to maintain your core business activities with minimal or no disruption.
Nialler is offline  
Old 27th May 2017, 16:51
  #31 (permalink)  
 
Join Date: Dec 2006
Location: Florida and wherever my laptop is
Posts: 1,350
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Robin757
This raises the fundamental issue that businesses are becoming increasingly reliant on large scale IT infrastructure to run their businesses (Network Rail is going that way for running trains for example). When they go wrong you are totally helpless and the damage it does to the businesses and the inhuman nightmare it causes to people are incalculable. Some fundamental questions about this need to be address including is it all possible to "go manual" for some basic functions for example to at least let some planes fly. Otherwise, fingers crossed.
What this, and other recent failures at other airlines, shows is a lack of professionalism in their IT departments. It is completely possible to build redundant, reliable, completely fault tolerant systems. This was being done in the 1970s as the hardware and software was far less reliable, systems had to be built accepting that they _would_ fail. So they were also built to first gracefully degrade and then to recover rapidly. There was an entire body of knowledge on failsafe and fault-tolerant 'non-stop' systems and systems design with both hardware and software fault tolerance. All this has now been thrown away and instead reliance placed on the greater reliability of computer systems.

This is a direct parallel to flight crew who now rely on the system and are unable to takeover manually if the 'systems' fail. There are now IT professionals who have no idea how to write fault tolerant software and are so far away from the hardware that they often have no idea how to match hardware and software fault recovery. So now the systems tend to stay up longer but when you have a failure you have the computing equivalent of AFR447 designed into the system. It is not only aviation, we have seen major Cloud Computing providers have their 'clouds' fail in the last year.

There will be more of these complete machine failures unless the IT world relearns how to provide fault tolerance and non-stop systems.
Ian W is offline  
Old 27th May 2017, 16:58
  #32 (permalink)  
 
Join Date: Jan 2017
Location: Glasgow
Posts: 3
Likes: 0
Received 0 Likes on 0 Posts
Yes, RBS was domestically controlled, and it wasn't *technically* outsourced as the people worked for a wholly owned subsidiary, but the guys pressing the buttons that day were in India, considerably cheaper, and less experienced than the team that used to run Batch from Edinburgh.
v8gaz is offline  
Old 27th May 2017, 17:00
  #33 (permalink)  
 
Join Date: Jan 2008
Location: Esher, Surrey
Posts: 466
Likes: 0
Received 0 Likes on 0 Posts
BA very senior management / board members were advised decades ago that the was a lot of attention required to reduce the chances of such " down times"
It was pointed out that many large companies had had a director assigned to the problem, Not just another hat of many for a board member to wear.
Even trying to ditch the " Computer Security "label and replacing it with "Business Continuity " failed. Pointing out that in the worse situation the whole business could fail also fell on deaf ears.
As others have pointed out the whole situation is now much more complex and has many more layers . Well done the bran counters.
There is a much smaller pool of loyal skilled folks to dig them out of their hole.

Any Marks & Spencer sandwiches left?
beamender99 is offline  
Old 27th May 2017, 17:03
  #34 (permalink)  
 
Join Date: May 2008
Location: Paris
Age: 60
Posts: 101
Likes: 0
Received 0 Likes on 0 Posts
There are dinosaurs such as I am who still operate in that environment you describe. I remember the Gulliver system, where unknown to those operating it, they were issuing native IMS commands without benefit of any meaningful interface. The problem has been one of market differentiation by means of ever more sophisticated mobile accessibility. This has created a competitive demand which rides roughshod over the traditional demands of zero downtime disciplines. As I say, I'm a dinosaur. Some of the macros called in my assembler are older than I am. The file open macros are sixty years old. They've survived for the simple reason that they have worked all that time.
Nialler is offline  
Old 27th May 2017, 17:06
  #35 (permalink)  
 
Join Date: Jun 2009
Location: Canada
Posts: 464
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by Robin757
When they go wrong you are totally helpless and the damage it does to the businesses and the inhuman nightmare it causes to people are incalculable.
That's only a big issue when you outsource to the cheapest bidder who saves money by not testing properly and has high turnover due to low wages, so the people who wrote the software have long gone by the time the bugs hit, and no-one knows how the code works any more when they need to fix it.

Plenty of companies have huge IT infrastructure without these kind of problems. Netflix, for example, has a policy of constant testing by randomly making its servers crash and ensuring that nothing bad happens when they do. As I understand it, the only thing they're 100% reliant on is Amazon staying up in at least one region of the world.
MG23 is offline  
Old 27th May 2017, 17:08
  #36 (permalink)  
 
Join Date: Feb 2003
Location: BHX LXR ASW
Posts: 2,269
Received 5 Likes on 3 Posts
So does this mess affect City Flyer's operations in the regions, or is this just London Airways?
crewmeal is offline  
Old 27th May 2017, 17:14
  #37 (permalink)  
Paxing All Over The World
 
Join Date: May 2001
Location: Hertfordshire, UK.
Age: 67
Posts: 10,126
Received 58 Likes on 48 Posts
Ian W
There will be more of these complete machine failures unless the IT world relearns how to provide fault tolerance and non-stop systems.
There will be more of these complete machine failures when the Directors of the Board pay the money!

I was in telecommunications and IT for 27 years, including mission critical stuff for banking. After the recession of 90/91, it was all about saving money. One small example: A friend of mine who works for a small software company still get faced by the Boss telling the staff to put their development plans on hold as he just sold sone new feature to a customer. A feature that had been dropped from the development plan for a good reason. That is within the last month.

On this scale of events: as has been said above, there are too many systems, many of them legacy that do not dovetail well together. irrespective of the outsoucing problem (and it is a problem) this level of complexity, to provide ever more features and services - will fail.
PAXboy is offline  
Old 27th May 2017, 17:18
  #38 (permalink)  
 
Join Date: Jan 2001
Location: €
Posts: 74
Likes: 0
Received 0 Likes on 0 Posts
What this, and other recent failures at other airlines, shows is a lack of professionalism in their IT departments.
More a lack of professionalism in the budgeting department.

Looks to me like their mainframe went down.
All tracking of Planes, Passengers, Baggage, Freight, Meals, Maintenance, Flight Planning and on and on and on will be down.

CIA, NSA, MIA, KGB, bla bla bla take a dim view of Flights leaving without prior notification of browsing history, credit card details and so on of each person on board.

Mainframes generally take many hours to get up and running again once the problem is identified and resolved. Many hundreds of subsystems need to be individually started in the right sequence and verified for proper operation.

You being unable to purchase a ticket is the last of anybody's worries.

Last outage I saw (other BIG player in Europe) cost more than €10 million, previous outage more then 14 years ago. Cost of parallel backup system: €40 million just to set up.

You do the math ....
lamer is offline  
Old 27th May 2017, 17:19
  #39 (permalink)  
 
Join Date: Mar 2009
Location: Malaysia
Posts: 112
Likes: 0
Received 0 Likes on 0 Posts
Obviously BA has placed far too much faith and reliance in it's computer system(s).
What remains to be seen is, was this a hack of their system(s) and if so will the public be told?
Awaiting any feedback with interest.
Carjockey is offline  
Old 27th May 2017, 17:23
  #40 (permalink)  
Paxing All Over The World
 
Join Date: May 2001
Location: Hertfordshire, UK.
Age: 67
Posts: 10,126
Received 58 Likes on 48 Posts
Whilst we will never know what happened, you can be sure that responsible staff in the IT department will have been warning about this for years. Whenever there is a massive corporate stuff up - the warnings were ignored. That's because
  • The Board don't bother listening to staff
  • The senior managers and Board have, mostly, come directly in from Uni or another company.
  • The days of someone at the top actually KNOWING and UNDERSTANDING what is happening at the bottom are long gone.
  • The outsourcing is a sympton of the problem - not the cause.
  • Now that Board members can have a wireless router sent in the post and it's plug-n-play at home, they think that IT is easy. I wish I was joking but I saw that attitude over 20 years ago.
PAXboy is offline  

Thread Tools
Search this Thread

Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.