Go Back  PPRuNe Forums > Ground & Other Ops Forums > ATC Issues
Reload this Page >

All London airspace closed

Wikiposts
Search
ATC Issues A place where pilots may enter the 'lions den' that is Air Traffic Control in complete safety and find out the answers to all those obscure topics which you always wanted to know the answer to but were afraid to ask.

All London airspace closed

Thread Tools
 
Search this Thread
 
Old 28th Dec 2014, 19:47
  #161 (permalink)  
 
Join Date: Jan 2008
Location: Bracknell, Berks, UK
Age: 52
Posts: 1,133
Likes: 0
Received 0 Likes on 0 Posts
Thankyou IanW. I suspected that was the case, now if the plonkers above you could stop wittering about Network Attached Storage (and getting most of it wrong, this means you Eengr), then you can get back on topic.
Mike-Bracknell is offline  
Old 29th Dec 2014, 16:46
  #162 (permalink)  
 
Join Date: Jan 2011
Location: Seattle
Posts: 716
Likes: 0
Received 3 Likes on 2 Posts
NAS Host. That is the National Airspace System - Host Computer.
Sorry about the acronym name space collision. Henceforth, I shall stick with the term "Network Storage". However, I stand by my claim that there appears to be some confusion in the NATS report appendix (almost another collision there).

There are two mentions of RAID. One states that RAID was not used (but mirrored disks) with no further explanation. The other states that
File corruption occurred on the primary server which then transferred to the
hot standby as they were linked via RAID*.
But file sharing between servers is NOT a feature of RAID. It is one of Network Storage. So this leads me to believe that the information given for the NATS incident report may ot be consistent.Sorry to be pedantic, but it is this level of detail that will indicate whether the NATS systems are properly designed or not.
EEngr is offline  
Old 29th Dec 2014, 21:05
  #163 (permalink)  
 
Join Date: Jan 2008
Location: Bracknell, Berks, UK
Age: 52
Posts: 1,133
Likes: 0
Received 0 Likes on 0 Posts
It is therefore a badly-written report, and you therefore will not empirically glean any information as to the inner workings of NATS as a result.

FYI, anyone referring to 'NAS' in any form related to storage would not be let near the quoting process for such a mission-critical service. 'NAS' stops being useful at the home-user/small-business arena, hence my points above. SAN, on the other hand, is more likely, but still given the vintage a distributed system based on shared fabric is not very likely.
Mike-Bracknell is offline  
Old 12th Jan 2015, 20:01
  #164 (permalink)  
 
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes on 0 Posts
the top brass won't be back until around 5th January.............
Have they been back long enough yet to find an independent Inquiry Chairman we can have confidence in ?
118.70 is offline  
Old 13th Jan 2015, 14:54
  #165 (permalink)  
ImageGear
Guest
 
Posts: n/a
Unhappy

I am a late arrival to this discussion, but a little clarity should to be brought to the table.

NAS is virtually irrelevant at the core of mission critical systems. What we probably should be discussing is real time, Multi-Host File Sharing, mass storage replication, SAN Disks, and high speed cache, to achieve a resilient transaction based platform.

In a true multi-host, live environment, with directly connected hot-standby system "in the same room", sharing files, and with full on/off-site backup, and software release managed, test and development environments, would go some way to delivering the level of resiliency needed for this Application.

Current thinking across many systems integrators and suppliers suggest that clients are more concerned with the cost of everything, and the value of nothing, and this means that comprehensive hardware and software fault recovery, error check, correction and reporting, and resiliency slip quietly out the window.

It is therefore completely unacceptable that start-up files are exercised for the first time on a live platform. These are usually scripts which are run to pre-configure systems prior to release of a new version from dev to test, or test to live. In my opinion this is a "find an alternative career" error.

Perhaps a little of our Oriental Friends technology ethos could do with being imported, for example - "No single hardware or software system failure is acceptable WITHIN THE SERVICE LIFE OF THE PRODUCT" (What manufacturer or integrator can or would offer that to NATS today, and would they pay the price) How many full system down events is contractually acceptable to NATS?

I don't know and very few do.
 
Old 16th Jan 2015, 18:23
  #166 (permalink)  
 
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes on 0 Posts
Independent Enquiry Terms of Reference have been published at

Enquiry Terms of Reference | Regulatory Policy | About the CAA

....Overall the Enquiry will address:
  1. The root causes of the incident on 12 December 2014 affecting the Area Control Operations Room, including the measures that had been put in place to prepare for routine changes to systems that occurred on the 11 December 2014 and for support to the military task that was re-locating onto the AC system.
  2. NATS’ handling of the incident to minimise disruption without compromising safety, including the measures to suppress and re-generate traffic and associated communications with airlines, airports and other stakeholders.
  3. Whether the lessons identified in the review of the disruption in December 2013 have been fully embedded and were effective during this incident.
  4. Levels of future resilience and service delivery that should be expected across the en route air traffic network taking into account relevant aviation benchmarks and costs.
  5. Further measures to avoid or reduce the impact of technology or process failures in the future (either by NATS or within the wider industry).
  6. Recommendations on how NATS can improve its response to any future service disruption caused by a system failure.
Scope

In order to fulfil its objectives the scope of the Enquiry will focus on:
  1. NATS’ ability to maintain a safe operation during periods of operational contingency caused by failures of its systems and how this is balanced against the disruption to normal operations.
  2. The functioning of the NERL operation and the interdependencies of the systems that support it including communication, surveillance and flight data and their failure modes, contingencies and operational workarounds.
  3. The preparation and testing of planned changes to systems and procedures linked to regular Aeronautical Information Publication updates or in association with other infrastructure changes.
  4. The effectiveness of NATS’ incident communications process triggered during the event both in terms of NATS’ customers (principally airlines and airports), other ATM agencies including the ATM Network Manager, the regulator, and the government.
  5. The linkage to previous operational failures, their handling and the lessons that have been learned from them.
  6. How NATS’ investment and efficiency plans have previously, and will in future, contribute to operational resilience and the speed of restoring normal working. In particular would an earlier than currently planned introduction of new technology improve resilience and be operationally feasible.
  7. The effectiveness of the CAA oversight arrangements that are in place and under consideration for normal operations, changes to operations and incident/contingency arrangements.
Accountability

The Enquiry is jointly sponsored by and will report to the two chairs of CAA and NATS.
Enquiry Panel Members

The Enquiry panel will consist of the following members:
  • Sir Robert Walmsley KCB (Chair)
  • Sir Timothy Anderson KCB DSO
  • Clayton Brendish CBE
  • Prof. John McDermid OBE
  • Mike Toms
  • Joe Sultana (Director Network Management, Eurocontrol)
  • Mark Swan (Group Director Safety and Airspace Regulation, CAA)
  • Martin Rolfe (Managing Director Operations, NATS).
John McDermid seems useful , coming from the High Integrity Systems Engineering Group at York University

Professor John A. McDermid, HISE Research Group, Department of Computer Science, The University of York

Is Mike Toms the ex Planning Manager from BAA ?
118.70 is offline  
Old 1st Feb 2015, 08:51
  #167 (permalink)  
 
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes on 0 Posts
The interim report should have been completed by now :

Enquiry Process

The Enquiry will be conducted on the following basis:

The Enquiry will produce a written report that will be made public.
The Enquiry will start on 13th January 2015 and is expected to deliver its report no later than 14th May 2015.
The Enquiry will provide an interim report by 31st January 2015 focused on the NATS internal investigation of the 12th December 2014 incident
118.70 is offline  
Old 6th Feb 2015, 06:17
  #168 (permalink)  
 
Join Date: Jan 2009
Location: HANTS
Posts: 193
Likes: 0
Received 0 Likes on 0 Posts
It's out.Published on the NATS Intranet.Not aware if it's more widely available yet.
GAPSTER is offline  
Old 6th Feb 2015, 11:50
  #169 (permalink)  
 
Join Date: May 2000
Location: On top of the world
Age: 73
Posts: 116
Likes: 0
Received 0 Likes on 0 Posts
It's here on the CAA website :

http://www.caa.co.uk/docs/2942/v3%20...ber%202014.pdf
off watch is offline  
Old 6th Feb 2015, 22:00
  #170 (permalink)  
 
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes on 0 Posts
I wonder where the erroneous number 151 came from !

Or was that an original valid maximum system capacity that was not changed when other system modifications to expand were carried out ?

And living with the high frequency of pressing the wrong button seems peculiar.
118.70 is offline  
Old 9th Feb 2015, 16:15
  #171 (permalink)  
 
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes on 0 Posts
The Register has

UK air traffic mega cockup: BOTH server channels failed - report ? The Register

UK air traffic mega cockup: BOTH server channels failed - report
'First time ever in server's history' says independent panel

The IT cockup at the National Air Traffic Services (NATS) that grounded hundreds of flights in December occurred because both of its System Flight Server (SFS) channels went down, an independent report has revealed.

"The disruption on 12 December 2014 arose because – for the first time in the history of the SFS – both channels failed at the same time," said the NATS System Failure 12 December 2014 – Interim Report.

The cockup resulted in 120 flights being cancelled and 500 flights being delayed for 45 minutes, and affected 10,000 passengers in total.......
118.70 is offline  
Old 13th Feb 2015, 08:20
  #172 (permalink)  
 
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes on 0 Posts
And in "Computing" :

Twenty-year-old 'latent defect' in software caused December air-traffic control shutdown - 12 Feb 2015 - Computing News
118.70 is offline  
Old 19th Feb 2015, 17:15
  #173 (permalink)  

More than just an ATCO
 
Join Date: Jul 1999
Location: Up someone's nose
Age: 75
Posts: 1,768
Likes: 0
Received 0 Likes on 0 Posts
although not relevant to this failure I would have thought a triplicated, not duplicated system would have been in place
Lon More is offline  
Old 20th Feb 2015, 23:05
  #174 (permalink)  
 
Join Date: Dec 2014
Location: Up a tree
Posts: 7
Likes: 0
Received 0 Likes on 0 Posts
I think a quadrupled system would be better. Can't be too safe....
I think every system should have 10 back ups.
They all would have failed but at least they were there.
Wham Bam is offline  
Old 11th May 2015, 07:19
  #175 (permalink)  
 
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes on 0 Posts
  1. The Enquiry will start on 13th January 2015 and is expected to deliver its report no later than 14th May 2015.
The report should be with us soon !
118.70 is offline  
Old 19th May 2015, 20:30
  #176 (permalink)  
 
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes on 0 Posts
I see that the Telegraph reports that the departure of Richard Deakin from NATS is completely unrelated to the Walmsley Inquiry report :

Air traffic control boss stands down - Telegraph

Mr Deakin came under pressure to resign before Christmas when a computer malfunction on a single day led to the cancellation and delay of hundreds of flights.

The findings of an independent inquiry into the debacle were given to the chairmen of both NATS and the Civil Aviation Authority last Wednesday.

A spokesman for the air traffic service insisted that the resignation of Mr Deakin, whose £1m pay package last year also drew ire, was not linked to the inquiry.
The final report has not yet been made public.
118.70 is offline  
Old 19th May 2015, 21:54
  #177 (permalink)  
 
Join Date: Jul 2004
Location: On the wireless...
Posts: 1,901
Likes: 0
Received 0 Likes on 0 Posts
More time for his aircraft spotting...
Talkdownman is offline  
Old 20th May 2015, 15:17
  #178 (permalink)  
 
Join Date: Jan 2009
Location: HANTS
Posts: 193
Likes: 0
Received 0 Likes on 0 Posts
But not at Birmingham or Gatwick presumably
GAPSTER is offline  
Old 20th May 2015, 15:30
  #179 (permalink)  
 
Join Date: Jul 2004
Location: On the wireless...
Posts: 1,901
Likes: 0
Received 0 Likes on 0 Posts
Maybe that's the underlining problem…he's copped it in more ways than one...
Talkdownman is offline  
Old 23rd May 2015, 07:36
  #180 (permalink)  
 
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes on 0 Posts
The Times apologises :

Corrections and clarifications : May 23, 2015

We wrongly reported(Business, May 19) that Richard Deakin, who is standing down as head of National Air Traffic Services, had been dismissed. We apologise for the error.
Corrections and clarifications: May 23, 2015 | The Times
118.70 is offline  


Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.