Air Traffic System Failure [Archive]

groundstation

21st Jan 2009, 03:21

ATIS ATIS YBBN O 210409
RWY: 01
+ OPR INFO: NO DEPARTURES DUE SYSTEM FAILURE TILL TIME 0445.
START CLEARANCE REQUIRED
WND: 360/20
VIS: GT 10 KM
CLD: FEW030 FEW040
TMP: 29
QNH: 1014

Does anyone have some information on the closure of Brisbane Airport to all departures for the next 45 minutes?

OmniRadial

21st Jan 2009, 03:53

Hi there, first post.

Looks like its all of YBBB having dramas.....

Mackay tower explained that the flight data processor in Brissy has failed in parts and the redundancy didn't come back on, so they shut it down and upon restart, the data appeared to be lost, however it's not necessarily lost, it hasn't been incorporated into the system.

Bet they're having fun in the BNE ops room right now.....

Curved Approach

21st Jan 2009, 03:56

With the Brisbane Flight Data Processor down, each controller must manually enter flight plan details on their console from faxed copies....

Plazbot

21st Jan 2009, 04:06

Thank buddah I am on days off. That is for real a bad thing to happen:uhoh:

malroy

21st Jan 2009, 04:21

Not only do controllers need to enter flight plan details manually for each sector a flight will transit, the co-ordination requirements between controllers changes. Each console becomes in effect a mini TAAATS system, that does not talk to any other TAAATS system, so controllers need to give full coordination for each flight, and full radar handovers as the normal system tools do not work.

Bear in mind also that there has been no simulator refresher training program for over 2 years!

groundstation

21st Jan 2009, 04:42

Bear in mind also that there has been no simulator refresher training program for over 2 years!

So how could you ATC's possibly remember how to safely operate this computer system when you don't receive refresher training? Don't you have to pass yearly checks on these systems much as a pilot would in a simulator?

boree3

21st Jan 2009, 05:00

It`s my fault! I turned off the machine that goes 'ping' as i walked out the door today. I thought we had a plan if something fell over. Oh, now i remember. Everybody STOP.......

Crikey, i hope the airconditioning is still working `cos there will be a few sweaty palms atm.

malroy

21st Jan 2009, 05:01

So how could you ATC's possibly remember how to safely operate this computer system when you don't receive refresher training?

Because according to a manager in the training area...
"Show me one incident in the room that has been caused by not having simulator refresher training"

But seriously, this type of failure is similar to what occurs in system data upgrades. The data upgrades are planned shutdowns that are coordinated and happen at times of low/no traffic (night).
So, if your ATC has been rostered for a doggo on which there was an upgrade recently...

nafai

21st Jan 2009, 06:21

Article from: The Courier-Mail

FLIGHTS out of Brisbane Airport are running up to an hour and a half behind schedule after an air traffic control computer glitch grounded aircraft.

Flights finally began to be cleared for takeoff about 3.50pm (AEST) after the delay.

A spokeswoman from Qantas said there were "some delays" of up to 90 minutes as a result of air traffic control technical issues in Brisbane.

"We have experienced some delays of 60 to 90 minutes," the spokeswoman said.

"We don't have a definitive number as to how many were delayed, we're still in the process as it's been back up and running as of 3.50pm."

Computer glitch delays flights | The Australian (http://www.theaustralian.news.com.au/story/0,25197,24943528-12377,00.html)

Plazbot

21st Jan 2009, 06:57

But seriously, this type of failure is similar to what occurs in system data upgrades. The data upgrades are planned shutdowns that are coordinated and happen at times of low/no traffic (night).
So, if your ATC has been rostered for a doggo on which there was an upgrade recently...
I myself have not worked an upgrade in over 4 years. At the time this fell over, the area I work is right in full swing. This is not good.

I am sure the tossers will find some way of blaming the controllers. If noone smashes into someone else during this, that fu ck wit Greg should at long last realise his workforce is extremely professional and should be ashamed at how he has treated them and described their motives.

stupid

TID EDIT. Offensive, whichever way you look at it.

Going Boeing

21st Jan 2009, 07:08

Could an ATCO give us a brief rundown on how the backup systems work? I was under the impression that the two centres computers "ghost" each other so that if one centre has a major failure (eg burnt down), the other centre's computer would have all the flight data and would be able to take over controlling the other FIR - assuming that there are sufficient ATCO's.

Plazbot

21st Jan 2009, 07:30

No, the centres arre entirely stand alone. the flight data processor is the braioin of the whole thing. Back in the day we had paper strips with the trackng and level information written on them and when it changed, we wrote the new details on them. Every controller had a strip (or multiple) that showed where the aircraft were going. What were called ADSOs (airways data systems officers) and before that Flight Datas used to prepare the strips from hard copy flight plans that the pilots would submit.

Today, the flight data processor gets all that info and sends it to every console and updates the info as it changed. The FDP links all the consoles together so much of the info is not required to be physically coordinated. This made it extremely efficient in areas where the separation is based solely on the numbers. Where I work, we use to be 4 Flight Service consoles and 2 ATC consoles. Now we do it ALL with 1 person. When the FDP falls over, those efficiencies are lost.

If it was a full FDP fail, it is a legitimately big deal.

Chapi

21st Jan 2009, 07:31

In disaster recovery, it is possible to re-configure a centre (ML or BN) to operate the other centre's airspace ... except that everyone would be in a "degraded" mode due reduced number of workstations to manage the airspace.

But this is disaster recovery ... not degraded modes ...

If a major processor (eg flight data processor) fails, then its secondary unit is supposed to take over.

If both processors fail, then the centre is in degraded mode, and needs to reduce traffic to low levels so that the controllers can manage the significant loss of functionality.

Degraded modes ops is not the same as disaster recovery (which may take a couple days to implement)

Hempy

21st Jan 2009, 07:36

So, is that three or four "1 in 10,000 year" events we've had so far?

No Further Requirements

21st Jan 2009, 07:54

Yeah, I was there for the one in Melbourne about 2 years ago. Not much fun. I hope it all gets sorted asap.

Cheers,

NFR.

ER_BN

21st Jan 2009, 08:01

Chapi,

I'm shocked...."Disaster Recovery".

Dear me no no no....

AsA spin says though shalt not use the word "disaster".

Business Continuity please!

Robbovic

21st Jan 2009, 08:19

Chapi et al,
Disaster recovery is one of those things that sounded great when Thomson/Thales were selling us TAAATS.
It is, in effect, a great big fairy tale.
It has never been realistically tested (apart from showing that we can, indeed, reconfigure the sim to look something like Brisbane or Melbourne).
The logistics involved in firing up either centre are huge (how do you get the required controllers from one centre to the other?)
One of the biggest stumbling blocks is coms - I recently saw the TAS disaster recovery plan for VHF coms - it is so out of date it is not funny - I doubt anyone would get anywhere near their correct frequencies.
All the people involved in putting together the business continuity plans (disaster recovery) have long been hounded out of the organisation so none of the planning is up to date with recent (and not so recent) changes.

Quite frankly, the safest way is for the affected centre to sit tight and wait for the cavalry.

Chapi

21st Jan 2009, 08:27

Ooopps,

ML & BN are two independent systems;

When the FDP fails its a real problem for the centre;

Degraded modes ops means slooowwing everything down - 'business continuity'

In a catasrophic event (eg BN centre burns down), we are supposed to be able to recover by moving to the other centre. I think the "correct" term is 'business resumption'

It's a plan - hope I am not on to try for real.

Wanderin_dave

21st Jan 2009, 09:30

Ooooooh, THAT'S what that switch does....... :bored:

man on the ground

21st Jan 2009, 09:43

GB - as mentioned above, the 'one centre taking over from the other' is only theoretical; never been tried or tested; and even according to the script, the first response would be about 3 days later!

'Max' capacity, which would still be way short of normal, would take about 1 - 2 weeks to achieve.

And that's after all the spare controllers get moved to the recovery centre. Oh, hang on - we don't have any spare controllers.....

C-change

21st Jan 2009, 09:51

I heard that the failure was actually due to non-payment of the electricty bill to Energex. :eek:

Not enough money left after AsA paid their dividend to the Gov. recently !!!

Roger Standby

21st Jan 2009, 13:09

Oh, wait a minute - some idiot in Canberra is actually working on that idea now. Apparently they are convinced it will save a fortune.

The guys in Brissie should be rubbing their hands together. If that plan goes it ahead, it's gonna mean VR galore.

Hempy

21st Jan 2009, 13:14

Oh, wait a minute - some idiot in Canberra is actually working on that idea now. Apparently they are convinced it will save a fortune.
The guys in Brissie should be rubbing their hands together. If that plan goes it ahead, it's gonna mean VR galore.

lol I see what you did there....:ok:

Ex FSO GRIFFO

21st Jan 2009, 14:13

Thinks....

Are we saying here that, in the 'unlikely' event of a fire in one of the Centres, then the 'other' Centre, or its Training Facility, has the capacity to 'Take over' the other's airspace??

At least that's what we were told in 'those days'....

Question - How do you 'suddenly' get suitably RATED controllers in ML for BN airspace, or vice versa??

(Ah Silly, we put them in a BIG aeroplane thingy and fly them down..
How does THAT aeroplane get there without a Clearance from Rated controllers?
Ah, the one in seat 1A looks out and tells the pilot its OK....)

:yuk:
.

Its about as daft.......:}

Hempy

21st Jan 2009, 15:46

The legendary Won King once told me that when BN CN gets cleaned up by a cyclone/tsunami/bushfire etc, the surviving (http://www.ausbf.com/forums/style_emoticons/default/laughing7.gif) controllers will be shipped to ML....by bus.

http://www.g3tsome.com/forums/style_emoticons/default/help.gif

Blockla

21st Jan 2009, 17:57

I'm pretty sure that the Bus option is in the plans, maybe it'll be drive your own car in reality... They techs will take at least 2 days to convert the gear anyway, so what's the rush...

The Business Continuity is simply a we've lost all hope of recovering one centre; then only very limited services would be provided at the other end, certainly not back to normal... You might find that entire groups get allocated a single console or none at all, perhaps one to share.

blind freddy

21st Jan 2009, 19:09

Boys from the north,

You need to look at this seriously. Who wants to move to the cold miserable southern centre.....?:=

Lets burn down ML Centre now, before they get us!:E:ok:
Any volunteers????

(Edit: For all you Management types out there, this is called tongue in cheek. Humour; you should try it some time. Oh hang on, you have. It is called your current offer!)

zube

22nd Jan 2009, 01:27

There's no need for any terrorist organisations to target Australian infrastructure. We do a good job ourselves of stuffing it up.

Ivasrus

22nd Jan 2009, 02:51

During the extended degraded mode, did FP2's depart before all affected degraded workstations had entered flight plans? This is what happened when all the YMMM remote TCUs were disconnected for several hours a few months back... chaos.

scran

22nd Jan 2009, 03:59

Ouch - a failure of the FD processor would not be fun.

I'm told that when Raytheon were installing a system in Norway, they had it up and running in test mode and the Norwegians asked "How long does it take to restore the system after a total power failure (considering with Gens, UPS etc this would supposedly never happen)?" The Raytheon reps answered something like "Well - almost impossible to happen, but about 24 Hours".

The Norwegians then immediately caused an absolute total power failure (the system died on que) and said "You time starts now....."

Never been able to verify the story - but bet it was fun if it really happened!!

undervaluedATC

22nd Jan 2009, 05:28

I was reminded today that the people who helped everybody (ie. ATC and managers) got little/no acknowledgment post the event - so thank you, FDC's, from me. I know you had everybody calling you for flight plans thinking that each person was the only one affected (at first) and clearly should have been afforded priority. Excellent job by you fine folk.

Well done to all my fellow ATC'ers too - I notice that the papers only reported the delay, and not the fact that a (reduced) service was still offered, instead of no service.

Louis Cypher

23rd Jan 2009, 08:32

Any truth to the rumour that TFN was trapped in the back (way up the back!) of one of the departures stuck on the ground at B'vegas for over an hour when the FDP failed?? :p:E:D

Almost makes cold start seem worthwhile

max1

23rd Jan 2009, 09:49

undervalued spot on about the FDCs.
Great people who get bugger all appreciation.

I was on for the failure, it highlighted those who have had a lot of exposure to the system, and pre TAAATS experience, and have had refresher training (even though it may have been 3+ years ago), and those who have been there for a while but have had no refresher training.
To me the actions of some of them ALMs was less than optimum, you don't allow departures until you check with the controllers, and you don't release airspace to the Military until you check with the controllers.
On the whole it showed why you pay for controllers not for
'airspace monitors', when the poo hits the fan give me experience anyday. Sorry for the delays to industry but safety carries the day. ASA will probably 'tick off' the refresher training now.

TrafficTraffic

23rd Jan 2009, 10:28

No surprise to see another thread that was showing promise of pointing out what a good job everybody did on the day - being hijacked....

:=

TT

malroy

23rd Jan 2009, 11:19

Well done also to those staff who offered to extend their shifts to assist, even if they were subsequently told by management that their services were not required and they could go home.

I am sure your colleagues appreciated the thought. Pity that the extra safety was not affordable.

Frantic Finals

24th Jan 2009, 01:16

scran

Your story happened pretty well as described - there was a bit of discussion before starting. It was a case if they didn't do it then they would never get to do the test once it went live!

Everything came back in much less than 24hrs, most functionality was restored in 10-15 mins!

The Norwegians (and Raytheon) were happy. :)

FF

C-change

25th Jan 2009, 21:54

Sounds like, once again, that you all did an excellent job of keeping things running as there were no media beat ups of planes having near misses (even though you had a standard) but I also notice zero praise or thanks from management, surprise, surprise.

Did anyone get set up or hammered by management when things failed ?

Long live paper strips !!!

topdrop

26th Jan 2009, 02:05

I also notice zero praise or thanks from management, surprise, surprise
Well, surprise surprise, I have seen emails from two different 3rd level managers praising and thanking their staff for a job well done.

max1

26th Jan 2009, 06:17

I also saw praise from a 3rd Level manager, on the whole experience carried the day.
My concern is the 'dumbing down' of controllers that the new training regime envisions. Controllers will get training only in those areas that will immediately concern them.
It was beneficial with the system collapse we had on Wednesday that we still had a base of knowledge that enabled those more experienced controllers to assist those lesser experienced ones, in that the actions that were being taken affected others. When the system fails, it is still the controllers responsibility.
To me it highlighted those out of touch managers who talk of controllers 'monitoring' rather than controlling airspace. We are responsible for anything that occurs in our airspace, we do not 'monitor'.

C-change

26th Jan 2009, 11:35

I have seen emails from two different 3rd level managers praising and thanking their staff for a job well done.

How high is a 3rd level manager? Is that someone who works in your group/ building or CB?

Its excellent that some one said well done but what I was getting at was that no one (ie centre manager/ HR) made the public aware or any other announcement. I guess that would be admitting that the system is not perfect.

Praise for a job well done these days is very rare, its mostly when people F%#@ up that you hear about it.

Ex FSO GRIFFO

26th Jan 2009, 14:10

Don't tell me....didn't ANY of your ALM's

- or whatever they call themselves -

OFFER to COME IN and RELIEVE the PRESSURE by ASSISTING 'on the floor'

by writing strips or 'getting details' or whatever youse guys/gals do these days,

when 'IT' all falls over....???

IF NOT...then SHAME SHAME SHAME.......(Apols to Derrin..)

:yuk:

Or, did the ' failure' occur outside their hours..?

undervaluedATC

26th Jan 2009, 22:19

When the system broke, it was all hands on deck (and it was illuminating to see how crowded the aisle could be when everybody was called into the room from out the back - not sure what would have happened had the failure occurred in the late evening/middle of the night.....)

Some ALMs were more help than others - much like the more experienced ATC's were more help to their colleages than the new guys (generally speaking).

An off-the-record conversation confirmed what I suspected: it has been some time since the one-centre-offline-contingency plan has been updated.:rolleyes: And some people still reckon there are good reasons (mostly saving money) for just having one single ATC centre in Australia. Unbelievable.

C-change

27th Jan 2009, 08:03

Was there any sign of an e-mail from our esteemed leader of leaders TFN?

Thats who I was hinting at in an earlier post. Funny how senior management are quick to lay blame on the so called renegade controllers who are apparently causing all the problems but when those same controllers save his organisations arse, and keep everyone safe, they duck for cover and say nothing.

The silence shows an appalling lack of leadership from a managerial team who clearly do not give a **** about their staff. Imagine if there had been a problem, you can bet it would be "due to the actions of a renegade controller engaged in industrial action".

To any senior AsA management who might read this, hang your heads in shame you spineless, piss weak bastards.