PPRuNe Forums

PPRuNe Forums (https://www.pprune.org/)
-   Passengers & SLF (Self Loading Freight) (https://www.pprune.org/passengers-slf-self-loading-freight-61/)
-   -   BA delays at LHR - Computer issue (https://www.pprune.org/passengers-slf-self-loading-freight/595169-ba-delays-lhr-computer-issue.html)

aox 31st May 2017 11:23

Maybe what BA needs is an inspirational PR consultant.

I hear Lynton Crosby might be available fairly soon.

Ian W 31st May 2017 11:26


Originally Posted by Mac the Knife (Post 9788197)
I have some, relatively small scale, experience in such matters and no "Power Surge" short of a massive nuclear EMP, is going to take out my systems.

All this hand-waving about "power surges", as if it were some kind of excuse, is a straight out lie (even if it were true, which we know was another lie). A big essential system like BA's should properly be architected in such a way that NOTHING will take it right down for days on end.

Degraded for a few hours has to be the worst case scenario.

There is simply no excuse at all that would cut it with any competent systems manager - ergo, they had no competent systems manager either in house or out.

And for any big company to outsource their IT management is very very foolish. One largish company that I am familiar with has someone reasonably senior in their server room 24/365 and at least two other branches with spare capacity to take over within minutes.

Incompetence, lies, injudicious outsourcing and penny-pinching on one of their core resources.

Plain stupid.

The 'power surge' excuse worked - it convinced the engineering illiterate CEO and he passed it onto the media who have even less engineering knowledge.
We will find out eventually what caused the problem and it will almost certainly have only the most tenuous link to power supply.

fchan 31st May 2017 11:38


Originally Posted by Ian W (Post 9787673)
In a machine room from my past, that no longer exists, a security guard groped for the light switch and powered down the research data center using the emergency switch despite its switch cover - rather obviously the lights did not come on although several smaller ones went off.

Exactly same thing happened in my old company. Fortunately the room was only a small part of overall ops and was backed up.

Another time a maintainer tried to shut down one poorly performing widget using the maintenance screen; his mouse pointer slipped down one line in the menu from “shut down this widget” to “shut down all widgets” and he did not spot it, with serious results for the many operators using the widgets. In the aviation no blame culture it’s not the fault of the maintainer or security guard, it’s poor design or procedures. Proper analysis and testing in the design and build phase should find this.

I am surprised to hear here that servers can be so sensitive to mains problems. I guess the problem is that that their designers assume they are always fed from a nice clean UPS supply and, if the UPS fails to filter the surges, brownouts and spikes, the server has no designed in protection against them. The UPS manufacturers will probably claim it can never happen.

c52 31st May 2017 11:41

Is it a fair guess that all other computers, maybe for miles around, on the same power network had problems at the same time on Saturday morning?

Thought not.

My experience of working with a UPS goes back to the 1980s when the one thing that could be guaranteed to cause an outage was a demo of the UPS.

fergusd 31st May 2017 12:05


Originally Posted by fchan (Post 9788254)
I am surprised to hear here that servers can be so sensitive to mains problems. I guess the problem is that that their designers assume they are always fed from a nice clean UPS supply and, if the UPS fails to filter the surges, brownouts and spikes, the server has no designed in protection against them. The UPS manufacturers will probably claim it can never happen.

They are not. Anybody with the barest minimum of knowledge would deploy only server chassis fitted with two power supplies, each fed from diverse electrical feeds in the data center, backed up using different UPS's each connected to different diesel backup generators. Professional kit deployed by professionals can always be deployed with multiple power supplies. Short of a catastrophic event at the data center (bomb/flood/plane crash) you don't lose power on all power feeds.

Heidhurtin 31st May 2017 12:26

At last, after an age of lurking, a subject where I do feel qualified to comment. In my organisation the operation of data rooms is split between IT and Real Estate - I am an engineer working for the latter and am responsible for all M+E services supporting the equipment. (Anything inside the racks belongs to IT).

For highly critical loads there is always more than a single UPS supplying critical power - at least 2 and more usually 4 or more. Cooling is also on a redundant backup system, details vary depending on the method used, ditto generators etc and we even consider water buffer tanks. Technically these are 2N services with distributed supplies and need to be fault tolerant AS A SYSTEM. There is also a similarly resilient backup data room in a distant location.

There's always scope for idiocy, but any work is controlled by a comprehensive change control system with at least 3 individual approvers before it's authorised. I can't comment with the same authority on the software side although an element of resilience is always included, and failover procedures are regularly practiced by IT. At least once per year I fail various elements of the support systems to test the backups and stress test the whole installation.

There is absolutely no way a well designed and maintained IT infrastructure could fail so catastrophically from a power irregularity - this is bread + butter stuff. An installation that is not well designed or maintained however.....

XBA1709 31st May 2017 12:27

For those of you who didn't manage to read the 67 page Thomson Reuter transcript of the Nov 2015 IAG Capital Markets Day presentation here's a direct quote from page 46

'Willie Walsh - International Airlines Group - Group CEO
So I come back to what this is all about. Somebody asked me during the break there you know did I really give a speech around show me the money in a tapas bar in Madrid. And I have to say I was insulted because those of you that know me if you can picture me in the tapas bar with a glass of wine in my hand, I didn't say show me the money. I said show me the (expletive) money. Silvia stand up. Shout it out. Shout it out with pride. Come on.

Silvia Cairo Jordan - International Airlines Group - Head of Group Commercial Planning and Policy
Show me the (expletive) money.

Willie Walsh - International Airlines Group - Group CEO
All right. I'll bring you back to show me the (expletive) money. And we make no apologies for it. You know because we're in a new industry. And you've heard me talk about this. You've heard me talk about the airline industry being different. I honestly believe it is. Because it's no longer an industry that's driven by those capacity statistics, ASKs, people going out there talking about the way they're growing. It's an industry that's now driven by capacity balanced with unit revenue. Capacity balance by unit costs. Capacity balanced by return on investment. There are more and more airline leaders talk about return investment. So we're here to show you the (expletive) money today. Because look at what we're doing. We're increasing all of our targets. And we're increasing all of our targets because we are determined to deliver on them. We know we can deliver on them. We've got a proven track record of delivery. '

Although the quote is from the film Jerry Maguire I get the impression he thinks he is the new Gordon Gekko. He certainly demonstrates what he is concentrating on and it is certainly not service or the customer.

bbrown1664 31st May 2017 12:42

I have worked in IT for about 30 years and have some experience with regard DR planning and requirements.

Datacentres are generally, as has been said already, fed by 2 or more diverse feeds from the grid. These then go via a UPS which is either in parallel or in series with the power feed. The major difference here is that the UPS can be used for removal of power spikes etc if it is in series and reduces the need for another switch that will flick in should the power fail.

If the input power does fail, the UPS will provide power to the building for anything from 5 minutes to several hours. This is dependant on the servers etc being powered. BAckup generators should kick in as soon as power fails. This gives them 5 minutes or more to warm up and settle down before the UPS batteries run out of power.

Assuming all of the above works, power then continues to be provided for the rest of time as long you have a plan in palce to refil the diesel tanks of the generators before they run dry. In reality, mains power comes back on line, the generators turn themselves off and normality is restored.

What tends to happen though is that the backup generator fails to start, the UPS batteries have not been replaced in the last 10 years and when the mains power goes down, you have all systems falling over in a heap. This is the time to switch to the DR data centre.

This is where the real problems begin. All too often this has not been tested properly. Systems have different RTO (recovery time objectives) which can vary from instant to weeks and many people don't realise that system#1 which has an instant RTO is and but cannot run as it is dependant on system#73 which is on a 3 day RTO. Not only that, when they tested it, the production system didn't fail, it was cleanly shut down, meaning that you now have corrupted data in system#1 and that brings an whole new head ach for people to sort out.

There is a common phrase I use which I don't think the forum software will allow but here it is in another format.

In IT, Sierra-Hotel-India-Tango happens.

Ian W 31st May 2017 13:12

The problem is often not hardware or software but 'wetware' as the human element is sometimes called. Or to put it another way - "Nothing is fool proof as fools are too ingenious".

Somewhere I worked in the past had a uninterruptable power supply providing smoothed power for the sensitive electronics being used - and the system would crash many evenings for no apparent reason. It was then found that a cleaner was using one of the smoothed power supply sockets in the machine room for a vacuum cleaner putting noise and spikes into the power not realizing the difference between the black sockets with smoothed power and the white sockets for 'domestic' power.

Bengt 31st May 2017 15:15

Tieto , one of the larger outsourcing companies in the Nordic countries had a major datacenter crash about two years. This was caused by a storage area network (SAN) turning corrupt during an upgrade. For redundancy they had a second storage set but that did not help as it also went corrupt doing mirroring in real time.
When changing recovery strategy to restore it was found out the backup was not restoreable due to upgraded OS (or something like that).

Service for several large organisation was down for almost a week.

The problems for BA at Heathrow reminds me about this as both the wide outage as well as the long time before being operational again are similar.

ILS27LEFT 31st May 2017 15:42

Totally agree
 

Originally Posted by Freehills (Post 9787905)
Pretty much by definition anyone who ends up running a large company is an ambitious sociopath, because if you are not, you will not get there. Two of the largest firms in the world (Amazon and Apple) were built by deeply unpleasant people (Bezos & Jobs). Society at large lets it persist because - "they get things done" and, why would anyone normal want the role anyway?

Yes I agree, the above is a fact. If somebody stating clearly the above concept goes on You Tube + TV then he/she could easily go for next general elections and will certainly get lots of votes, the web is changing the power game, it is more difficult to lie to the public nowadays, I mean more difficult than in the past.
We all know you are 100% right. We need nice, kind human beings running our societies. Happiness levels are too low in high GDP countries, this is the direct result of uncontrolled corporate greed where sociopaths are significantly rewarded and those who want to improve happiness for most are excluded by the system.
Often the same sociopaths are heavily involved in charitable missions, such a paradox and so fake :mad:

Twiglet1 31st May 2017 17:25

Was at LHR on Saturday. Bags pitched up on yesterdays flight so 2 days late.

(only problem is two of en famille are stuck in UK but that's a minor point)

From my perspective as one of xxxxx thousand punters - can't grumble too much, all forgotten and we'll survive. Thumbs up to BA front line staff Saturday.

A320ECAM 31st May 2017 19:59


Originally Posted by aox (Post 9788240)
Maybe what BA needs is an inspirational PR consultant.

I hear Lynton Crosby might be available fairly soon.

Or Max Clifford.

Where is Willie Walsh and why did that hi-viz wearing Alex Cruz not help out in the terminals instead of hiding away sending emails/tweets?

davidjohnson6 31st May 2017 21:42

Looks like senior figures at BA/IAG want an independent investigation - I imagine it'll end up going to one of the big consulting companies (e.g. PwC). Quite how the incentives will play out when the consulting company has its own advisory services to sell is something that leaves me rather puzzled
BA board members are expected to request an inquiry into the IT fiasco - BBC News

PAXboy 31st May 2017 22:29

Unfortunately, sending a 'heavy mob' in to ask questions - will not give you the answers. Everyone will now be out to protect their own @rse and point the blame. They will have only been there a short time, they were told to do this, they were in the process of checing procedures when X did Y and the Z happened. etcetera.

If the company had a strong line of communication, where the senior IT mgmt were KNOWN and VISIBLE? Then you can get the information you need.

Since it is clear that it was an utter failure, at every level, You might be better off starting form scratch. Trying to work out what went wrong and then patching it? Again? But, as davidjohnson6 says, the 'heavy mob' will be only too glad to specify a new system. That is, one that no one at the top understands. Oh, that looks like what they already have.

Ian W 31st May 2017 22:34

This is one of the problems.

PR says give the punters an excuse that sounds good and won't affect pax or share prices.

Sensible organization management says: find out the truth so we don't go down the same hole again.

These are often conflicting aims.

Heathrow Harry 31st May 2017 23:03

I have to say that BA are losing the plot with every passing minute - I've spent the last few days traveling and the problems have come up in many conversations

Not one single person believes a word they issue

"the story changes once or twice a day "..

" where was the CEO?"

"all those poor people and not a single manager to be seen"

"Did no-one ever TEST these systems?"

This is heading towards being a "Ratner" moment if they aren't careful

syseng68k 31st May 2017 23:59

I've been wading through it as well. All the breathless gushing may be necessary for investors, but are they actually saying anything, other than endless efforts to reduce the cost base ?. From what I can see, they are upgrading all their IT to a common standard across the group and that work is still in progress, so plenty of room for trouble as the old kit is replaced by new. The software complexity must be horrendous, not to mention all the hardware and networking kit assocuiated with it. Admirable aim anyway.

Couple of things sprang out:

“It's using the Cloud, and we're really positive that we've now got both the BA and IB selling teams, who are able to do dealing and, also, workflow within a single system in absolutely future technologies”…

and:

“But, equally, you're going to see a much more Windows environment”....

So, up to date with the latest trendy cloud tech, but is it organised in a fully redundant manner and with guaranteed real time perfomance and failover ?.

Since when was anything from uSoft ever considered robust enterprise class ?. Fine for desktops and order processing, but wouldn’t be using it for any critical infrastructure, anywhere.

If you want stuff that works from top to bottom, go to IBM or Oracle. Very expensive, but good reason for it…

BigFrank 1st Jun 2017 00:23

IAG CEO " soon to be spotted "
 
So far he has been conspicuous by his absence. Very.

But the BBC link above states that he will be on parade this morning at BCN launching Level; IAG's new trans continental LCC.

¿ Will he take questions from journos or will he emulate Spanish PM Señor Rajoy and deliver his address from behind a high tech (sic) one way mirror ?

( As a shareholder, one hopes at very least that Willie's pathetic barrack-room language, quoted above by a business news agency no less, will be kept under strict control. Or better still only deployed, strictly in private, on those responsible for the current fiasco.)

c52 1st Jun 2017 03:47


“It's using the Cloud, and we're really positive that we've now got both the BA and IB selling teams, who are able to do dealing and, also, workflow within a single system in absolutely future technologies”…

and:

“But, equally, you're going to see a much more Windows environment”
How can anyone who's not got some kind of a grip on the rules of language hope to give clear leadership or design anything properly?


All times are GMT. The time now is 09:39.


Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.