Dreamliner in emergency landing at Dublin Airport

Closed Thread Subscribe

Thread Tools

Search this Thread

26th Oct 2015, 13:20

#41 (permalink)

oldoberon

Join Date: Mar 2014

Location: wales

Age: 81

Posts: 316

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by beardy

IF it's a software problem AND it's the same software on both engines I would have expected that to impinge on ETOPs certification since the risk of the second engine doing the same thing is higher than if it were a mechanical (as opposed to design) fault. ETOPs is defined by acceptable risk of the other engine failing within a set time period and whilst demonstrated failure rate is a very good metric it should not, IMHO, be considered in isolation.

when put that way it is so logical, i do hope certifying bodies used same logic. long time since I did a long flight over water, and always preferred 4 to 2, having read your post that preference will remain in place.

26th Oct 2015, 13:56

#42 (permalink)

Ian W

Join Date: Dec 2006

Location: Florida and wherever my laptop is

Posts: 1,350

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by peekay4

Those pilots were effectively performing verification, not validation. They were testing whether or not their aircraft performed to specs, not whether the specs were correct.

NASA did many studies over the decades and surprisingly (?) found that it is actually impossible to find all safety-critical software bugs by testing!

That's because as complexity increases, the time required to test all possible conditions rises exponentially. Completely and exhaustively testing an entire suite of avionics software could literally take thousands of years.

Therefore, instead of full exhaustive testing, we selectively test what we determine to be the most important conditions to test. Metrics are gathered and analysis is performed to provide the required test coverage, check boundary conditions, ensure that there are no regressions, etc.

However, one can't prove that a piece of software "bug free" this way, because not all possible conditions are tested.

Today as an alternative, the most critical pieces of software are verified using formal methods (i.e., using mathematical proofs) to augment -- or entirely replace -- functional testing. Unlike testing, formal methods can prove design/implementation correctness to specifications. Unfortunately, formal methods verification is a very costly process and thus is not used for the vast majority (>99.9%) of code.

The rest of the code rely on fault-tolerance. Instead of attempting to write "zero bug" software, safety is "assured" by having multiple independent modules voting for an outcome, and/or having many defensive layers so failure of one piece of code doesn't compromise the safety of the entire system (swiss-cheese model applied to software).

This "fault-tolerance" approach isn't perfect but provides an "acceptable" level risk.

Exhaustive testing: Is when either the tester or the funds are exhausted, it has no bearing on the number of bugs yet to be found.

Mathematical proof of software is an example of the 'streetlight effect' more and more effort being expended looking for bugs in an area where they are simple to find but very unlikely - in the code that can be mathematically checked, rather than where they most often are which is in system design. However, it makes some companies a lot of money and delays and even prevents implementation of modern hardware and software.

Fault tolerance by voting triplex is fine until there is a three way disagreement and/or the voting software makes a mistake and shuts down the process whose software is correct and follows the output of the two other processes whose software is incorrect. This happens surprisingly often.

26th Oct 2015, 14:01

#43 (permalink)

Ian W

Join Date: Dec 2006

Location: Florida and wherever my laptop is

Posts: 1,350

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by oldoberon

Unfortunately, if all four have the same software version then all four could in theory crash and such faults do happen even on fully tested systems. Such as the F-22 Squadron Shot Down by the International Date Line

26th Oct 2015, 15:38

#44 (permalink)

oldoberon

Join Date: Mar 2014

Location: wales

Age: 81

Posts: 316

Likes: 0

Received 0 Likes on 0 Posts

yes most of them been around longer so in theory SW more proven ( I hope).

your link wow close one!!

26th Oct 2015, 16:10

#45 (permalink)

peekay4

Join Date: Sep 2014

Location: Canada

Posts: 1,257

Likes: 0

Received 0 Likes on 0 Posts

There is an overarching software design & architecture requirement that any "catastrophic failure" -- a failure resulting in the loss of the airplane and deaths of its occupants -- must be "extremely improbable".

For FAR 25 aircraft, "extremely improbable" is defined as a failure rate of no more than 1 per billion flight hours (1E-9), established by a quantitative safety assessment.

However, as we found out with the Challenger shuttle disaster, this kind of quantitative assessment can be a bit pie in the sky. Still, critical software do tend to be extremely reliable. Just remember to reboot from time to time........

26th Oct 2015, 16:20

#46 (permalink)

lomapaseo

Join Date: Mar 2002

Location: Florida

Posts: 4,569

Likes: 0

Received 1 Like on 1 Post

Quote:

For FAR 25 aircraft, "extremely improbable" is defined as a failure rate of no more than 1 per billion flight hours (1E-9), established by a quantitative safety assessment.

In General

To put this into perspective catastrophic failures (part 25) for all causes occur at rates 100 times more likely (1E-7).

I'm a lot less worried about the system causing the crash then I am the pilot's contribution

26th Oct 2015, 16:32

#47 (permalink)

Liffy 1M

Join Date: Feb 2004

Location: Dublin, Ireland

Posts: 495

Likes: 7

Received 5 Likes on 4 Posts

Quote:

There is little logic to the way things happen in DUB. Whilst the rest of the world is pretty much standard-ICAO Dublin carries on business in its own little bubble.

Most of the issues you raise about taxiway nomenclature and layout and also where the Ethiopian 787 was directed to park can hardly be laid at the door of ATC. Those are all matters for the airport authority. Amongst the recommendations of a recent AAIU report into a ground collision between two 737s at Dublin was that:

"The Dublin Airport Authority (DAA) conduct a critical review of the taxiway system at Dublin Airport, to ensure that taxiway routes are as simple as possible in order to avoid pilot confusion and the need for complicated instructions."

The report also states that Dublin Airport accepts the recommendation and will undertake a critical review of the taxiway system to ensure that taxiway routes are as simple as possible.

26th Oct 2015, 18:37

#48 (permalink)

tatelyle

Join Date: Sep 2015

Location: Nice

Posts: 19

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Exhaustive testing: Is when either the tester or the funds are exhausted, it has no bearing on the number of bugs yet to be found.

Nothing new. The problem of bugs in systems has always been with us in design, it is just that computers probably give more opportunities for error.

Ask the captain of the BA flight that lost two donkeys on short finals into LHR. That bug in the fuel system had managed to hide itself for several years, before raising its ugly head.

26th Oct 2015, 23:31

#49 (permalink)

Infieldg

Join Date: Jan 2015

Location: Delete me

Age: 58

Posts: 28

Likes: 0

Received 0 Likes on 0 Posts

I've been a software developer for 26 years and 100% bug free software can simply mean you and whoever tested the software both misinterpreted the spec in the same way, OR the analyst misinterpreted the requirements and you coded their mistake perfectly and the tester agreed. This is (literally, not kidding) why I genuinely fear passengering on an Airbus. Nothing can ever replace you guys and we shouldn't be trying.

27th Oct 2015, 15:36

#50 (permalink)

AR1

Join Date: May 2007

Location: Nottinghamshire

Age: 63

Posts: 710

Likes: 2

Received 4 Likes on 1 Post

You really need to get out more.

Before SW failure was mechanical failure. And that's not gone away either. Despite the way software can never be 100% bug free (not my assertion) you fly in an era of unprecedented safety in air travel.

Unfortunately those same technical advances also give us the ability to spout tripe in an unprecedented way. And that scares me.

27th Oct 2015, 16:41

#51 (permalink)

MG23

Join Date: Jun 2009

Location: Canada

Posts: 464

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by AR1

Before SW failure was mechanical failure. And that's not gone away either.

But you can usually detect mechanical problems early: for example, wear or cracks in metal parts. Software may work perfectly for ten years, then finally hit the rare bug that causes it to fail for no apparent reason. Worse, every instance of that software may fail at the same time all over the world (e.g. the various leap second bugs). Gears don't even know about leap seconds.

The other issue is that a third party can examine all the mechanical parts for cracks, and tell you there's a problem. A third party usually can't examine the software that runs those parts, because it's closed source. They can only test it as a black box.

There's a truly scary document online from one of the software guys who was given access to Toyota's software as an expert witness for the 'unintended acceleration' trials. Some of the things in there are quite mind-boggling, but no-one knew about them because they had no access to the software.

Software has definitely made many things far more reliable. But it's also replaced many predictable failures with unpredictable ones.

27th Oct 2015, 18:07

#52 (permalink)

beardy

Join Date: Nov 2000

Location: UK

Age: 69

Posts: 1,405

Likes: 26

Received 40 Likes on 22 Posts

Who judges the acceptable risk of software bugs?

27th Oct 2015, 18:24

#53 (permalink)

GlobalNav

Join Date: Aug 2013

Location: Washington.

Age: 74

Posts: 1,077

Likes: 278

Received 151 Likes on 53 Posts

Software Failure?

I doubt there is any such thing. Software just is whatever has been coded. Unless the memory media on which it is stored fails somehow, the software remains intact, just as coded and compiled.

Software error? Yes, and as so many have explained, testing to find every software error (AKA bug) is fairly impractical in the large bodies of complex code used in avionics, engines and such. So rather than completely exhaustive testing, though some testing indeed is done, there is required to be a disciplined software development process, the rigor of which is driven by the safety effects that a function affected by software error might be considered to have.

Highly critical functions with potentially catastrophic effects from software errors must have a "design assurance level" of A, which of course is the highest and most expensive development process.

27th Oct 2015, 20:26

#54 (permalink)

llondel

Join Date: Jan 2007

Location: San Jose

Posts: 727

Likes: 0

Received 0 Likes on 0 Posts

Quote:

It can happen with mechanical stuff too, Air Midwest 5481 back in 2003. Someone introduced a mechanical 'bug' in that they rigged the elevator cables incorrectly. It flew OK for several flights until circumstances conspired to trip the bug, in the form of a CofG too far aft and it pitched up and stalled. OK, not quite ten years, but no one detected the error and had the error not been made, it would have been recoverable - the limited elevator travel due to the error meant it couldn't cope.

28th Oct 2015, 02:46

#55 (permalink)

roulishollandais

Join Date: Jun 2011

Location: france

Posts: 760

Likes: 0

Received 0 Likes on 0 Posts

Quote:

Originally Posted by beardy

Who judges the acceptable risk of software bugs?

ref to Ariane 501 report (4.June1996 crash) from Jacques Louis Lions about the best practices.

28th Oct 2015, 04:10

#56 (permalink)

andrasz

Join Date: Sep 2008

Location: Where it is comfortable...

Age: 60

Posts: 911

Likes: 2

Received 13 Likes on 2 Posts

Quote:

Software just is whatever has been coded.

That is very simplistic and incorrect. Software comprises the original set of specifications on what the system is supposed to achieve, the algorithm (which is a translation of the specs into the particulars of the coding language used), the actual code, the set of static and dynamic data which are used by the code, and the user instructions/manual on how to operate the software.

"Bugs" can be introduced everywhere along this process, and coding bugs (where there is an actual syntax or logical error in the code) are usually the smallest percent of them, and the easiest to catch. The most difficult part are the specifications, where a professional in a particular subject needs to describe his/her knowledge to someone who is at best marginally versed in the profession, however is able to develop efficient algorithms to achieve what the specifications say. There are many things which may get lost in translation here, and the most dangerous are which were 'forgotten' from the specifications simply because a particular scenario was not considered. These scenarios are usually in the realm of valid data, as basic software design principles mandate that invalid data ranges must be considered and treated (eg. if a parameter must be positive, in a critical system there MUST be a loop which handles the case if that parameter is negative).

A further layer of "bugs" are as Microsoft once famously said, not bugs but features. Errors can be introduced in the user manual which may not correctly describe how the system works, especially in remote and unlikely scenarios. This causes the software to behave as specified, but differently than what users expect. More issues are introduced through the user interface, when the software users do things which are explicitly disallowed in the manual, but try it anyway, with totally unpredictable outcomes as those scenarios were neither considered nor tested.

From the user perspective all above are "bugs", but only a very small portion are actually attributable to the code itself.

28th Oct 2015, 12:38

#57 (permalink)

DType

Join Date: Jan 2010

Location: Edinburgh

Age: 85

Posts: 74

Likes: 0

Received 16 Likes on 9 Posts

Whenever I wrote in a manual "Whatever you do, don't press button 'A'", I had to go back to the product design and delete or protect button 'A'. Eventually, I got round to writing the manual before I started the design. That only took half a lifetime to figure out, but then I'm not the sharpest knife in the box!

28th Oct 2015, 15:08

#58 (permalink)

wanabee777

Join Date: Jan 2006

Location: Ijatta

Posts: 435

Likes: 0

Received 0 Likes on 0 Posts

I never could keep my fingers off the bloody buttons. Especially on long haul flights.

Used to drive my F/O's nuts!

Last edited by wanabee777; 28th Oct 2015 at 19:20.

28th Oct 2015, 17:03

#59 (permalink)

Nialler

Join Date: May 2008

Location: Paris

Age: 60

Posts: 101

Likes: 0

Received 0 Likes on 0 Posts

@peekay4:

Quote:

Yea, although that might be indicative of something more than just a requirements error -- pointing to a larger process breakdown.

Typically there are high level requirements, specific system / software requirements, low-level requirements, etc., which all need to be traceable up and down between them, and also have full traceability to the code, to the binary, and to all the test cases (and/or formal methods verifications as applicable).

For all data elements, there should be specifications to check for valid ranges for values (data domain), missing values (null checks), etc. Functions also need to have preconditions & postconditions on what parameter values acceptable as part of the interface contract, and assertions which must hold true.

There should've also been models of both the specifications and the design and processes to check these models for completeness.

And even if there are data errors, as mentioned before the software should be designed to be fault-tolerant and fail safe instead of simply freezing up at 400' AGL.

What you don't want to do is to fix this one specific requirement while there may be other missing/incomplete/incorrect requirements out there. So you have to take a look into the SDLC process and figure out why the requirement was missed to begin with.

YOu may have worked in the past with Orthogonal Defect Classification. This is where things get scary. In nailing down a coding error at one stage we drilled through to the conclusion that the error was a "missing typo". At the meeting we collapsed in laughter. The problem essentially conisted of the fact that a typo hadn't been propagated right throughout the development cycle. When we recovered ourselves we realised how utterly catastrophic such an error might be.

With teams using US and UK ENglish there were multiple risks of variable typos, with each being separately close enought to the other to pass muster, but with yet untested fallback routines failing in th event.

Avionic software at least appears to fall back to the backstop of handing things over to the pilot(s). The day that they stop doing so is the day that I keep my feet on the ground.

Systems are never perfect, and they don't exist in a vacuum; parallel systems may make un desired demands of them.

I'm not flying hen the person in the seat is a systems administrator; I want a pilot up there. One who can over-ride every damn system. Yes, they make mistakes, but at least they can react according to their skills, and at least their ass is on the line too.

29th Oct 2015, 05:21

#60 (permalink)

esa-aardvark

Join Date: Apr 2007

Location: moraira,spain-Norfolk, UK

Age: 82

Posts: 389

Likes: 0

Received 0 Likes on 0 Posts

Challenger disaster

Hello peekay4,
I think you will find the Challenger disaster was in the numbers.
The relevant engineers voted against flight. It flew because of the
common management idea that if it (they) flew several times then they were OK.

Closed Thread Share

First
Prev
3 / 4
Next
Last