I work on mainframe airline Res and DCS systems, most recently for a certain carrier which had a large cross on its tail, so I can imagine with knowing dread, the kind of situation that happened last week.
I've written and tested stuff as well as it can be, loaded it and it's gone wrong. OK, we follow the fallback plan, clean up any mess, re-test and try again. It happens to everyone at some point.
The systems are damn complex but we work equally damn hard to make sure we've thought of everything before going live and we do take it personally when others say things like "outsource IT!", or "don't these programmers/engineers know what they're doing?".
We want to deliver quality all the time, because we know the business and the terrible effects of even the smallest cock-up, but sometimes it's like trying to add another storey on a building between 2 existing floors. It ain't easy, but that's the existing architecture we're working with!
Back to Friday's snagettes:
The worst kind of problem is when a software change has been loaded and it doesn't go wrong till some hours later. At that stage, the fallback option might not be on the cards. It's fall forward but the morning shift may not know exactly what happened the night before, the logs have crashed or whatever.
To try and prevent these situations, you need:
1 Decent test systems with real live system data
2 Investment in Automated Volume testing tools (programmers dislike repetitive testing and anything that automates it is a great benefit).
3 For big changes, get the right people in on the night.
4 Check out as much as you can during the quiet hours at night
5 Pay them decent compensation. They should stay behind till the morning shift comes in and handover is complete.
I don't know the set-up at ATC other than through second hand sources, so flame me if I'm jumping to conclusions, but it seems like not all of these points were actioned for the change which went wrong on Friday.
I also fear that Point 5 - Paying Overtime - was something the management wanted to avoid - or am I speaking out of turn there?