All London airspace closed
Join Date: Jan 2008
Location: Bracknell, Berks, UK
Age: 52
Posts: 1,133
Likes: 0
Received 0 Likes
on
0 Posts
Thankyou IanW. I suspected that was the case, now if the plonkers above you could stop wittering about Network Attached Storage (and getting most of it wrong, this means you Eengr), then you can get back on topic.
NAS Host. That is the National Airspace System - Host Computer.
There are two mentions of RAID. One states that RAID was not used (but mirrored disks) with no further explanation. The other states that
File corruption occurred on the primary server which then transferred to the
hot standby as they were linked via RAID*.
hot standby as they were linked via RAID*.
Join Date: Jan 2008
Location: Bracknell, Berks, UK
Age: 52
Posts: 1,133
Likes: 0
Received 0 Likes
on
0 Posts
It is therefore a badly-written report, and you therefore will not empirically glean any information as to the inner workings of NATS as a result.
FYI, anyone referring to 'NAS' in any form related to storage would not be let near the quoting process for such a mission-critical service. 'NAS' stops being useful at the home-user/small-business arena, hence my points above. SAN, on the other hand, is more likely, but still given the vintage a distributed system based on shared fabric is not very likely.
FYI, anyone referring to 'NAS' in any form related to storage would not be let near the quoting process for such a mission-critical service. 'NAS' stops being useful at the home-user/small-business arena, hence my points above. SAN, on the other hand, is more likely, but still given the vintage a distributed system based on shared fabric is not very likely.
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes
on
0 Posts
the top brass won't be back until around 5th January.............
Guest
Posts: n/a
I am a late arrival to this discussion, but a little clarity should to be brought to the table.
NAS is virtually irrelevant at the core of mission critical systems. What we probably should be discussing is real time, Multi-Host File Sharing, mass storage replication, SAN Disks, and high speed cache, to achieve a resilient transaction based platform.
In a true multi-host, live environment, with directly connected hot-standby system "in the same room", sharing files, and with full on/off-site backup, and software release managed, test and development environments, would go some way to delivering the level of resiliency needed for this Application.
Current thinking across many systems integrators and suppliers suggest that clients are more concerned with the cost of everything, and the value of nothing, and this means that comprehensive hardware and software fault recovery, error check, correction and reporting, and resiliency slip quietly out the window.
It is therefore completely unacceptable that start-up files are exercised for the first time on a live platform. These are usually scripts which are run to pre-configure systems prior to release of a new version from dev to test, or test to live. In my opinion this is a "find an alternative career" error.
Perhaps a little of our Oriental Friends technology ethos could do with being imported, for example - "No single hardware or software system failure is acceptable WITHIN THE SERVICE LIFE OF THE PRODUCT" (What manufacturer or integrator can or would offer that to NATS today, and would they pay the price) How many full system down events is contractually acceptable to NATS?
I don't know and very few do.
NAS is virtually irrelevant at the core of mission critical systems. What we probably should be discussing is real time, Multi-Host File Sharing, mass storage replication, SAN Disks, and high speed cache, to achieve a resilient transaction based platform.
In a true multi-host, live environment, with directly connected hot-standby system "in the same room", sharing files, and with full on/off-site backup, and software release managed, test and development environments, would go some way to delivering the level of resiliency needed for this Application.
Current thinking across many systems integrators and suppliers suggest that clients are more concerned with the cost of everything, and the value of nothing, and this means that comprehensive hardware and software fault recovery, error check, correction and reporting, and resiliency slip quietly out the window.
It is therefore completely unacceptable that start-up files are exercised for the first time on a live platform. These are usually scripts which are run to pre-configure systems prior to release of a new version from dev to test, or test to live. In my opinion this is a "find an alternative career" error.
Perhaps a little of our Oriental Friends technology ethos could do with being imported, for example - "No single hardware or software system failure is acceptable WITHIN THE SERVICE LIFE OF THE PRODUCT" (What manufacturer or integrator can or would offer that to NATS today, and would they pay the price) How many full system down events is contractually acceptable to NATS?
I don't know and very few do.
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes
on
0 Posts
Independent Enquiry Terms of Reference have been published at
Enquiry Terms of Reference | Regulatory Policy | About the CAA
John McDermid seems useful , coming from the High Integrity Systems Engineering Group at York University
Professor John A. McDermid, HISE Research Group, Department of Computer Science, The University of York
Is Mike Toms the ex Planning Manager from BAA ?
Enquiry Terms of Reference | Regulatory Policy | About the CAA
....Overall the Enquiry will address:
In order to fulfil its objectives the scope of the Enquiry will focus on:
The Enquiry is jointly sponsored by and will report to the two chairs of CAA and NATS.
Enquiry Panel Members
The Enquiry panel will consist of the following members:
- The root causes of the incident on 12 December 2014 affecting the Area Control Operations Room, including the measures that had been put in place to prepare for routine changes to systems that occurred on the 11 December 2014 and for support to the military task that was re-locating onto the AC system.
- NATS’ handling of the incident to minimise disruption without compromising safety, including the measures to suppress and re-generate traffic and associated communications with airlines, airports and other stakeholders.
- Whether the lessons identified in the review of the disruption in December 2013 have been fully embedded and were effective during this incident.
- Levels of future resilience and service delivery that should be expected across the en route air traffic network taking into account relevant aviation benchmarks and costs.
- Further measures to avoid or reduce the impact of technology or process failures in the future (either by NATS or within the wider industry).
- Recommendations on how NATS can improve its response to any future service disruption caused by a system failure.
In order to fulfil its objectives the scope of the Enquiry will focus on:
- NATS’ ability to maintain a safe operation during periods of operational contingency caused by failures of its systems and how this is balanced against the disruption to normal operations.
- The functioning of the NERL operation and the interdependencies of the systems that support it including communication, surveillance and flight data and their failure modes, contingencies and operational workarounds.
- The preparation and testing of planned changes to systems and procedures linked to regular Aeronautical Information Publication updates or in association with other infrastructure changes.
- The effectiveness of NATS’ incident communications process triggered during the event both in terms of NATS’ customers (principally airlines and airports), other ATM agencies including the ATM Network Manager, the regulator, and the government.
- The linkage to previous operational failures, their handling and the lessons that have been learned from them.
- How NATS’ investment and efficiency plans have previously, and will in future, contribute to operational resilience and the speed of restoring normal working. In particular would an earlier than currently planned introduction of new technology improve resilience and be operationally feasible.
- The effectiveness of the CAA oversight arrangements that are in place and under consideration for normal operations, changes to operations and incident/contingency arrangements.
The Enquiry is jointly sponsored by and will report to the two chairs of CAA and NATS.
Enquiry Panel Members
The Enquiry panel will consist of the following members:
- Sir Robert Walmsley KCB (Chair)
- Sir Timothy Anderson KCB DSO
- Clayton Brendish CBE
- Prof. John McDermid OBE
- Mike Toms
- Joe Sultana (Director Network Management, Eurocontrol)
- Mark Swan (Group Director Safety and Airspace Regulation, CAA)
- Martin Rolfe (Managing Director Operations, NATS).
Professor John A. McDermid, HISE Research Group, Department of Computer Science, The University of York
Is Mike Toms the ex Planning Manager from BAA ?
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes
on
0 Posts
The interim report should have been completed by now :
Enquiry Process
The Enquiry will be conducted on the following basis:
The Enquiry will produce a written report that will be made public.
The Enquiry will start on 13th January 2015 and is expected to deliver its report no later than 14th May 2015.
The Enquiry will provide an interim report by 31st January 2015 focused on the NATS internal investigation of the 12th December 2014 incident
The Enquiry will be conducted on the following basis:
The Enquiry will produce a written report that will be made public.
The Enquiry will start on 13th January 2015 and is expected to deliver its report no later than 14th May 2015.
The Enquiry will provide an interim report by 31st January 2015 focused on the NATS internal investigation of the 12th December 2014 incident
Join Date: May 2000
Location: On top of the world
Age: 73
Posts: 116
Likes: 0
Received 0 Likes
on
0 Posts
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes
on
0 Posts
I wonder where the erroneous number 151 came from !
Or was that an original valid maximum system capacity that was not changed when other system modifications to expand were carried out ?
And living with the high frequency of pressing the wrong button seems peculiar.
Or was that an original valid maximum system capacity that was not changed when other system modifications to expand were carried out ?
And living with the high frequency of pressing the wrong button seems peculiar.
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes
on
0 Posts
The Register has
UK air traffic mega cockup: BOTH server channels failed - report ? The Register
UK air traffic mega cockup: BOTH server channels failed - report ? The Register
UK air traffic mega cockup: BOTH server channels failed - report
'First time ever in server's history' says independent panel
The IT cockup at the National Air Traffic Services (NATS) that grounded hundreds of flights in December occurred because both of its System Flight Server (SFS) channels went down, an independent report has revealed.
"The disruption on 12 December 2014 arose because – for the first time in the history of the SFS – both channels failed at the same time," said the NATS System Failure 12 December 2014 – Interim Report.
The cockup resulted in 120 flights being cancelled and 500 flights being delayed for 45 minutes, and affected 10,000 passengers in total.......
'First time ever in server's history' says independent panel
The IT cockup at the National Air Traffic Services (NATS) that grounded hundreds of flights in December occurred because both of its System Flight Server (SFS) channels went down, an independent report has revealed.
"The disruption on 12 December 2014 arose because – for the first time in the history of the SFS – both channels failed at the same time," said the NATS System Failure 12 December 2014 – Interim Report.
The cockup resulted in 120 flights being cancelled and 500 flights being delayed for 45 minutes, and affected 10,000 passengers in total.......
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes
on
0 Posts
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes
on
0 Posts
I see that the Telegraph reports that the departure of Richard Deakin from NATS is completely unrelated to the Walmsley Inquiry report :
Air traffic control boss stands down - Telegraph
Air traffic control boss stands down - Telegraph
Mr Deakin came under pressure to resign before Christmas when a computer malfunction on a single day led to the cancellation and delay of hundreds of flights.
The findings of an independent inquiry into the debacle were given to the chairmen of both NATS and the Civil Aviation Authority last Wednesday.
A spokesman for the air traffic service insisted that the resignation of Mr Deakin, whose £1m pay package last year also drew ire, was not linked to the inquiry.
The final report has not yet been made public.
The findings of an independent inquiry into the debacle were given to the chairmen of both NATS and the Civil Aviation Authority last Wednesday.
A spokesman for the air traffic service insisted that the resignation of Mr Deakin, whose £1m pay package last year also drew ire, was not linked to the inquiry.
The final report has not yet been made public.
Join Date: Mar 2008
Location: London
Age: 69
Posts: 148
Likes: 0
Received 0 Likes
on
0 Posts
The Times apologises :
Corrections and clarifications: May 23, 2015 | The Times
Corrections and clarifications : May 23, 2015
We wrongly reported(Business, May 19) that Richard Deakin, who is standing down as head of National Air Traffic Services, had been dismissed. We apologise for the error.
We wrongly reported(Business, May 19) that Richard Deakin, who is standing down as head of National Air Traffic Services, had been dismissed. We apologise for the error.