PPRuNe Forums

PPRuNe Forums (https://www.pprune.org/)
-   Passengers & SLF (Self Loading Freight) (https://www.pprune.org/passengers-slf-self-loading-freight-61/)
-   -   BA delays at LHR - Computer issue (https://www.pprune.org/passengers-slf-self-loading-freight/595169-ba-delays-lhr-computer-issue.html)

sherburn2LA 28th May 2017 05:25


Originally Posted by PAXboy (Post 9784878)
The six phases of a big project is a cynical take on the outcome of large projects:

In my (38 years) experience most IT mega projects go in just two phases

1) Fire all the people who say it won't work

later

2) Fire all the people who said it would work

LTNman 28th May 2017 05:36

I used to install UPS's (uninterruptible power supplies) as part of my job that guaranteed power for a set time. Years later we would go back when they failed to operate as the batteries that were on a constant trickle charge had a limited life and that no one had thought about changing them.

Rwy in Sight 28th May 2017 05:36


Nialler

prevents against large physical attacks. Logical ones are not prevented against.
I thought it is possible to avoid logical attacks by having the codes/software written by different people. So mistakes and logical bombs happen only in one system. I hope it makes sense.

Nialler 28th May 2017 05:43


Originally Posted by Rwy in Sight (Post 9784942)
I thought it is possible to avoid logical attacks by having the codes/software written by different people. So mistakes and logical bombs happen only in one system. I hope it makes sense.

For something such as a a database subsystem and its the associated application layer there will be many hundreds and potentially thousands of coders involved.

Sunfish 28th May 2017 06:06

freehills:


Quote:
Originally Posted by Sunfish View Post
As aa general rule, if IT is critical to the business, then you don't outsource it.
Major airlines that don't outsource IT
Delta (and they messed up)
Air New Zealand

And, frankly, it is a silly rule. I don't see many companies building their own PC operating systems, despite how critical that is. For a parallel, engines are also critical to the airline business, but since the break up of United in the 1930's, no airline hasn't outsourced designing & building engines
you misunderstand. there are multiple reputable and competitive sources for PC's, operating systems, engines (new and overhauled), tyres, etc. of course you can outsource all that without losing control and generate efficiencies in the process.

however, there is only one BA, and one set of BA IT systems that are intricately and inextricably bound up with BA'S business strategy. The two are inseparable and complementary, you cannot change business strategy without modifying business rules that are implemented in the software, period.

if you lose detailed control of the strategic IT in your business, you risk exactly what BA are now experiencing, because your contractor cannot know in detail what your strategic priorities are as well as you do, and those priorities change monthly.

for example, consider what software changes might be required to implement a marketing idea to increase baggage allowances.

ImageGear 28th May 2017 06:12


To IPL a mainframe shouldn't take more than five to ten minutes.
and this is totally unacceptable as BA have found to their severe cost. Whoever took the decision to implement a system where recovery is to IPL and blow the reservations system out of the water with no failover needs to take the long walk.

To call the latest crop of "big servers", mainframes, is blatant misrepresentation. True fault tolerant mainframes no longer really exist.

30 Years ago, I ran a demonstration in Paris for airline CIO's and IT Directors, of multi-host file sharing across duplicate mainframes, where it was proved that on catastrophically failing one mainframe, the end result was not one transaction being lost anywhere between the reservations terminal and the failing computer memory module. The impact on the reservations clerk was a "wait" screen that lasted around 40 seconds. The solution sold, but not to BA.

There is absolutely no excuse, 30 years later, for any airline reservations department to accept anything less.

Some very big heads must roll. :=

Imagegear.

Nialler 28th May 2017 06:33


Originally Posted by ImageGear (Post 9784959)
and this is totally unacceptable as BA have found to their severe cost. Whoever took the decision to implement a system where recovery is to IPL and blow the reservations system out of the water with no failover needs to take the long walk.

To call the latest crop of "big servers", mainframes, is blatant misrepresentation. True fault tolerant mainframes no longer really exist.

30 Years ago, I ran a demonstration in Paris of multi-host file sharing across duplicate mainframes, where it was proved that on catastrophically failing one mainframe, the end result was not one transaction being lost anywhere between the reservations terminal and the failing computer memory module. The impact on the reservations clerk was a "wait" screen that lasted around 40 seconds. The solution sold, but not to BA.

There is absolutely no excuse, 30 years later, for any airline reservations department to accept anything less.

Some very big heads must roll. :=

Imagegear.

You misunderstood me. My entire career has been working on mainframes. The type of fail over capabilities you describe are not quite thirty years old, but you are correct that not simply sysplex, but geographically dispersed sysplex is possible. Not just possible but standard. My clients all have datacentres at sites which h are remote from each other and are cycled on a scheduled basis. OK, I'm an old it head, but I'm sure that pilots will Nod at the concept of double or triple redundancy. Their lives depend on it. In my case it is just my job.

My fear always is that a single system failure might not be restricted or contained when it is a logical or intrinsic programmer error which with the cold logic of object code propagates through the redundant systems also. The problem in your primary hydraulic system is not actually isolated because the same problem which led to its failure exists on the fallback.

BetterByBoat 28th May 2017 06:57

Why is everyone being so quick to accept the official line? A power supply failure taking down multiple systems across multiple data centers. Possible but it wouldn't be at the top of my list of likely explanations.

Heathrow Harry 28th May 2017 07:19

Exactly - and should also be (relatively) easy to fix........................

some sort of botched software upgrade is my guess -... but blaming the hardware is probably a first step to trying to save your job

Tay Cough 28th May 2017 07:27


Some very big heads must roll.
This is BA. Of course they won't.

FlightCosting 28th May 2017 07:38


Originally Posted by Heathrow Harry (Post 9784992)
Exactly - and should also be (relatively) easy to fix........................

some sort of botched software upgrade is my guess -... but blaming the hardware is probably a first step to trying to save your job

Time to back to pen and paper. Back in the day (1970) in the brand new high tech terminal 1 the only computer we had was the Solaris information board and that broke down often. Paper pax manifest and hand written load sheets.

xs-baggage 28th May 2017 07:45

I seem to remember that two or three years ago BA were putting Navitaire into LHR, LGW, and LCY as part of what I was told was a "backup strategy" (I presume they meant a fallback strategy, but there you go). My involvement with UK airline IT ended shortly afterwards - does anyone know if they ever did it?

RomeoTangoFoxtrotMike 28th May 2017 08:22


Originally Posted by sherburn2LA (Post 9784935)
In my (38 years) experience most IT mega projects go in just two phases

1) Fire all the people who say it won't work

later

2) Fire all the people who said it would work

Typo. You meant:-

2) *Promote* all the people who said it would work...

["*We* know it will work, it's the silly IT nerds who couldn't do it. Sack them, promote us for having the brilliant idea in the first place"].

fchan 28th May 2017 08:28

I worked 40 years in safety and reliability in transportation industries, laterally with ATC. I could tell many stories of power supply disasters in so called very resistant systems. In the Far East I was doing a project in a road control centre. I asked them about power supply resilience and was assured they had all the batteries and generators to make it bulletproof. THE VERY NEXT DAY on entering the centre there were worried looking managers and a long extension lead across the floor leading the back of a server rack. “We had a power failure and had to do that to get it working again”.

In the same country the main and backup power supply went through the same circuit breaker which failed taking down a complete train line.

In UK ATC the last serious power outage was 15-20 years ago. Since then NATS has vastly improved the power systems so, whilst nothing is impossible, I’d be extremely surprised if power was the cause. If it did happen the most likely cause would be maintainer error rather than a straight hardware or software failure in the power supplies. Many of NAT’s recent very occasional issues have been due to software upgrades not going to plan but they are nearly always recoverable to the backup system or old software in minutes. The up to hours interruption to some traffic flow is only due to the time to rebuild the traffic flow to one that works safely; in an airport like Heathrow, as it’s working at >95% capacity, a small blip can’t be recovered in minutes. NATS does its software upgrades only in quiet periods and would never chose a May Bank Holiday w/e to do it. Unlike an airline whose planes are on the ground in incidents like current one, so safe, ATC has many planes in the sky so has to be much more careful.

RomeoTangoFoxtrotMike 28th May 2017 08:33


Originally Posted by BetterByBoat (Post 9784978)
Why is everyone being so quick to accept the official line? A power supply failure taking down multiple systems across multiple data centers. Possible but it wouldn't be at the top of my list of likely explanations.

I can see how a "power failure" in a specific subsystem, the consequences of which were neither fully thought through (or were ignored), the response to which was mishandled, could rapidly spiral out of control. It's probably at this point that having outsourced IT half a world away really began to bite...

Gertrude the Wombat 28th May 2017 08:39


Originally Posted by LTNman (Post 9784941)
I used to install UPS's (uninterruptible power supplies) as part of my job that guaranteed power for a set time. Years later we would go back when they failed to operate as the batteries that were on a constant trickle charge had a limited life and that no one had thought about changing them.

????? I don't see what use a UPS is if it doesn't monitor itself, surely it will have reported that it needed a new battery even if the bureaucratic systems for maintaining it failed???

ExSp33db1rd 28th May 2017 08:43

A million years ago - circa 1970's - BOAC sent us home from New York as passengers, unscheduled - pick up your tickets at the airport. On arrival JFK the check in girl advised us that we couldn't travel as "the system was down" and she couldn't print our tickets. The Flt. Eng. handed her his pen. We flew home.

It's called progress.

coalencanth 28th May 2017 08:49

There's quite a gem allegedly from the great Alex himself been leaked on flyertalk, apparently from their internal network, what do they call it, yammer?

Could have joined BA a few years back, thank god I didn't. Great company going down the pan.

oldart 28th May 2017 08:55


Originally Posted by FlightCosting (Post 9785003)
Time to back to pen and paper. Back in the day (1970) in the brand new high tech terminal 1 the only computer we had was the Solaris information board and that broke down often. Paper pax manifest and hand written load sheets.

Pen and paper with some Blu Tack would have given passengers some kind of information in the terminals, however that would have needed someone with common sense and not a keyboard stuck to their hands.

Rwy in Sight 28th May 2017 09:05


Great company going down the pan.
It seems they get the best in Europe to recover from severe schedule disruption.


All times are GMT. The time now is 02:53.


Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.