Problems at Swanwick

Closed Thread Subscribe

Thread Tools

Search this Thread

17th May 2002, 23:24

#41 (permalink)

crowman

Join Date: Feb 2001

Location: Fareham U.K.

Posts: 30

Likes: 0

Received 0 Likes on 0 Posts

Bexil 160, your suppositions are totally correct! Also it is almost certain that the CFMU TACT computer went down due to the rapidity and complexity of the flow restrictions placed by LACC due to their inability to split sectors

I SAY NO MORE AS MY COVER IS EXPOSED FOR MANY TO SEE!!

18th May 2002, 01:49

#42 (permalink)

Scott Voigt

Join Date: Jul 2001

Location: Fort Worth ARTCC ZFW

Posts: 1,155

Likes: 0

Received 0 Likes on 0 Posts

BEX;

Sorry to hear of another BAD day at NERC... You are making me just want to go to work and hug DSR and our old HOST <G>. Next time you come to visit, I should be able to get you in the building...

Take care

18th May 2002, 01:56

#43 (permalink)

Scott Voigt

Join Date: Jul 2001

Location: Fort Worth ARTCC ZFW

Posts: 1,155

Likes: 0

Received 0 Likes on 0 Posts

Take3call5;

I think that I would have to take exception to your comment about it not being the suits fault...

If the requirements statement was correctly written and then the contractor was kept to a correctly written contract, then you wouldn't have these issues.

We had some serious problems with our itteration of the "new" and "improved" NAS. or ISSS as it was known. We finally were able to convince the suits that this was NOT going to work in a manner that was better or even as good as the old system. It was scrapped and then we took what elements did work and to save money put it together into something that would at least work and replace an old and failing system. Now we are working at slowly replacing all the other items that need to be updated. We know that the NAS software is VERY COMPLEX and must work all the time in real time. We are not in a hurry for the sake of safety and the customers....

regards

18th May 2002, 02:59

#44 (permalink)

Brookmans Park

Join Date: Sep 2000

Location: stansted,essex,europe

Posts: 136

Likes: 1

Received 0 Likes on 0 Posts

problems at swanwick

having just read the posts re the Europeaen cup in Scotland
could there be any connection with the the extra traffic which this generated and the subsequent problems at SwAnNwIkK ??:o

18th May 2002, 07:17

#45 (permalink)

BDiONU

Beady Eye

Join Date: Feb 2001

Location: UK

Posts: 1,495

Likes: 0

Received 0 Likes on 0 Posts

Dan Ryan:
No. Traffic levels was not the issue. Won't find out 'till I go into work on Monday what really happened but it appears from comments in this thread that there was a problem with one workstation. This caused the watch management to delay 'splitting out' the Ops room to accomodate daytime traffic levels until they could be sure that the problem wasn't going to be replicated on other workstations. At least thats my surmise.

Scott:
I fully agree with your comment about requirements statements etc. However, you underestimate the complexity of the system at LACC. It is not very unusual for a 'fix' on one part of the system to introduce regression or an unwanted 'feature' on another part. It can be impossible to know this until a 'problem' is found and the engineers can examine the data (personally I would have it re-written using an object oriented language so that regression cannot be introduced in other areas).

Sounds like the upgrade that was put on just didn't 'take' properly on one workstation (although they all appeared to work (albeit without flight data) when checked by me and my team) so that, for safety, they fully checked out all the rest before commiting to daytime traffic levels. IMO the only decision which they could make under the circumstances.

In my opinion I am sure that EVERYONE is aware of the problems caused to other units, to the airlines and to all the other services connected with flying when there's a problem like this. Just wish there was an easy solution to 'fixing' things, other than relying on everyone else's professionalism to 'carry on regardless'

18th May 2002, 07:32

#46 (permalink)

Check 6

Join Date: May 2000

Location: US

Posts: 896

Likes: 0

Received 0 Likes on 0 Posts

A couple of questions:

Do these "glitches" at Swanwick affect London Mil?

Did London Mil move also?

Cheers,

18th May 2002, 07:42

#47 (permalink)

BDiONU

Beady Eye

Join Date: Feb 2001

Location: UK

Posts: 1,495

Likes: 0

Received 0 Likes on 0 Posts

Check 6:

London Mil are still at West Drayton, hopefully will move with LTCC (circa 2005??). These problems will not have affected them, except where they had traffic to join controlled airspace. Quite probably they handled some commercial flights who were willing to fly outside airways.

N.B. There still military controllers (LJAO) working with (and at) the LTCC at Swanwick, just as there were at LATCC.

18th May 2002, 09:17

#48 (permalink)

POMPI

Join Date: Sep 1998

Location: Portsmouth, Hampshire, UK

Posts: 21

Likes: 0

Received 0 Likes on 0 Posts

Never did like CXSS - too complex.

18th May 2002, 10:25

#49 (permalink)

chippy63

Join Date: Nov 2001

Location: uk

Posts: 358

Likes: 0

Received 0 Likes on 0 Posts

SLF is self-loading freight, ie folks like me.

18th May 2002, 10:26

#50 (permalink)

HEATHROW DIRECTOR

Join Date: Oct 2000

Location: Berkshire, UK

Age: 79

Posts: 8,268

Likes: 0

Received 0 Likes on 0 Posts

Yes, all three Lon Mil controllers are still at West Drayton!!

18th May 2002, 11:26

#51 (permalink)

Sonic Cruiser

Join Date: Jun 2001

Location: London

Posts: 52

Likes: 0

Received 0 Likes on 0 Posts

EGLL must have been interesting yesterday morning, where did all the inbounds (particularly BA T4 Long Haul) park if very few of the outbounds moved?

GMP and GMC must have been busy positions to be operating yesterday. Were inbound holds lengthy as well or were restrictions put on the number of Flights allowed in to Heathrow??

I read that there will be backlogs right across the weekend as airlines try to get the aircraft back in the right place.

18th May 2002, 21:23

#52 (permalink)

Scott Voigt

Join Date: Jul 2001

Location: Fort Worth ARTCC ZFW

Posts: 1,155

Likes: 0

Received 0 Likes on 0 Posts

Hi Take3Call5;

Actually I probably know the system that you have fairly well since all that you have had and now have were off shoots of what we have or decided to not do...

I completely understand the issues with doing something to the software and then effecting something else in the system. That is why we do a LOT of testing on all of our patches and then test them at all 20 facilities when we install them here. Guess what, even with doing that it doesn't always work. We had a failure just last month on a new patch due to those issues, but we do the install on the midnight shift and bring it back up before the traffic starts getting busy, so if the system flops rigth away, there isn't a lot of impact when we reload the old system and bring it back online...

As to the complexities of any sort of NAS system replacement, we completely understand that too and that is why we are now going with the thought of replacing small parts of the NAS one at a time and then turning them off one at a time. Do this until we get to the radar and data processing and then replace those. Don't try to do a big bang. There is too much at risk to do it that way, as well as a training nightmare for the work force. We don't let pilots get into a new aircraft with just a few days of training over a couple of month period. They go through a LOT of training and are taken off the line as it were to immerse in training. Obviously with our staffing in most of the busy parts of the world, we just can't do this. So do it the smart way. Go in baby steps and get the whole thing done over a course of years so that you have minor training issues that are easy to deal with and there is very little if any dissruption to the users.

regards

18th May 2002, 22:06

#53 (permalink)

Jay Foe

Join Date: Jul 2001

Location: Middle of Nowhere

Posts: 33

Likes: 0

Received 0 Likes on 0 Posts

Regarding the problems of the last few days, I think todays 'Matt' Cartoon on the front of the Telegraph was quite amusing, (fingers crossed this works)

http://www.telegraph.co.uk/core/Matt...Matt.telegraph

Wahey it worked!!!!!!!! I can do modern technology. Now where's PAR 2000............

19th May 2002, 11:37

#54 (permalink)

DCS99

Join Date: Sep 2000

Location: Danger - Deep Excavation

Posts: 341

Likes: 13

Received 0 Likes on 0 Posts

I work on mainframe airline Res and DCS systems, most recently for a certain carrier which had a large cross on its tail, so I can imagine with knowing dread, the kind of situation that happened last week.

I've written and tested stuff as well as it can be, loaded it and it's gone wrong. OK, we follow the fallback plan, clean up any mess, re-test and try again. It happens to everyone at some point.

The systems are damn complex but we work equally damn hard to make sure we've thought of everything before going live and we do take it personally when others say things like "outsource IT!", or "don't these programmers/engineers know what they're doing?".

We want to deliver quality all the time, because we know the business and the terrible effects of even the smallest cock-up, but sometimes it's like trying to add another storey on a building between 2 existing floors. It ain't easy, but that's the existing architecture we're working with!

Back to Friday's snagettes:
The worst kind of problem is when a software change has been loaded and it doesn't go wrong till some hours later. At that stage, the fallback option might not be on the cards. It's fall forward but the morning shift may not know exactly what happened the night before, the logs have crashed or whatever.

To try and prevent these situations, you need:

1 Decent test systems with real live system data

2 Investment in Automated Volume testing tools (programmers dislike repetitive testing and anything that automates it is a great benefit).

3 For big changes, get the right people in on the night.

4 Check out as much as you can during the quiet hours at night

5 Pay them decent compensation. They should stay behind till the morning shift comes in and handover is complete.

I don't know the set-up at ATC other than through second hand sources, so flame me if I'm jumping to conclusions, but it seems like not all of these points were actioned for the change which went wrong on Friday.

I also fear that Point 5 - Paying Overtime - was something the management wanted to avoid - or am I speaking out of turn there?

19th May 2002, 14:34

#55 (permalink)

no sig

Join Date: May 1999

Location: Vancouver, BC.

Posts: 748

Likes: 0

Received 0 Likes on 0 Posts

Would anyone like to hazard a guess at how many more of these 'glitches' we are going to have to cope with?

If we are running at risk that this will occur again in some form then tell us (the airlines), we'll do what we can to assist as we did with the change over. But this failure cost my outfit in excess of £400K probably more, resulted in the cancellation of 44 flights and all the misery that goes with it.

Really folks, we need to do better.

19th May 2002, 15:27

#56 (permalink)

BEXIL160

Join Date: Jan 2001

Location: I sell sea shells by the sea shore

Posts: 856

Likes: 0

Received 0 Likes on 0 Posts

Three things will affect UK ATC for the foreseeable (sp?) future..

1) Serious lack of validated Controllers and Assistants at Swanwick

2) NAS (at West Drayton) could easily FLOP again, or the link to it could be lost (not usually too serious at LATCC, but even startovers can ruin Swanwick's whole day)

3) Unknown (and known) faults wthin the highly complex Swanwick software

NONE of the above are likely to be fixed in the short term, and the staffing situation is a LONG TERM issue. It takes YEARS to train and validate ATCOs, meanwhile more are retiring / leaving/ on long/short term sick, than are being replaced.

Once again, I am very sorry to be the bearer bad news, but I'd rather be "open and honest" with you than the claptrap that comes out of One Kemble Street. This ain't spin, it's the truth.

Sadly, yours
BEX

19th May 2002, 16:50

#57 (permalink)

no sig

Join Date: May 1999

Location: Vancouver, BC.

Posts: 748

Likes: 0

Received 0 Likes on 0 Posts

BEXIL
Thank you as ever. I suppose what's hard to swallow is the thought that after three events, albeit, apparently unrelated, we are likely to face the mayhem of Friday again. I know it's a complex system, however, the fact that we have had three failures really does make one question to integrity of the software/system/ and the management of same.

19th May 2002, 19:44

#58 (permalink)

chiglet

Join Date: Apr 2001

Location: Near Stalyvegas

Age: 78

Posts: 2,022

Likes: 0

Received 0 Likes on 0 Posts

Three quotes spring to mind
1 "Our Skies Are NOT For Sale"

2 "The Buck Stops Here"

3 "Action This Day"
Mr Blur and our "esteemed" CE, Mr Eveready have [obviously] not studied Modern History, or read the 'papers, or listened to the troops, but then again, what else is Chuffin' new?

we aim to please,it keeps the cleaners happy

Last edited by chiglet; 19th May 2002 at 20:49.

19th May 2002, 21:56

#59 (permalink)

2 six 4

Join Date: Sep 1998

Location: UK

Posts: 272

Likes: 0

Received 0 Likes on 0 Posts

Went to the pub on Friday night .

A friend of mine who is a secretary was concerned. Was it my computer that failed and caused all those delays ? Well sort of ... I said. After consoling me with a pint she told me they had been discussing it in the office when it came on the news.

Why don't you do what we do when the damned machine stops ? What's that I naively ask ? Just type CTRL + ALT + DELETE it works every time !!!

DOH We pay ££ millions for sophisticated software companies to design this complex beast and my mate tells me the answer down the pub

Where's that CTRL button and I'll tell Cheese and Ham .......

20th May 2002, 00:45

#60 (permalink)

Iron City

Join Date: Jan 2002

Location: USA

Posts: 394

Likes: 0

Received 0 Likes on 0 Posts

Hope your Swanwick is not the same as the voice switching and control system in the states (primary ARTCC voice com switch) when they control-alt-delete that it takes a couple hours to do a complete cold reboot. cans and string in the meantime, and a couple BIG megaphones.

Closed Thread Share

First
Prev
3 / 4
Next
Last