PPRuNe Forums - View Single Post - Airbus crash/training flight
View Single Post
Old 20th Sep 2010, 23:03
  #1323 (permalink)  
infrequentflyer789
 
Join Date: Jan 2008
Location: uk
Posts: 857
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by PJ2
PBL;
Consder a system-level requirement to tolerate ny two faults(which includes Byzantine faults). Such a system requires seven-fold (3m + 1 = 3(2) + 1 = 7) redundancy.
[...]
You reference "5" sensors. In my non-trained view I am slugging through the paper with much more to read/absorb, (and the Lamport paper) so I am needing to understand how the above comments relate, if at all.
It is a while since I have read this sort of research, and I have only skimmed the references tonight, but I think I can explain the discrepancy. I am sure PBL will correct me if I have misunderstood (and any correction is welcome).

The general result is 3m + 1 redundancy to cope with m failures, however Lamport also showed that this could be reduced to 2n + 1 (this is PBLs "5 sensors") for digital systems, if the values are cryptographically signed. That makes it impossible for a faulty processor to corrupt a value and pass it on without the change being detected. Consider the byzantine generals form the analogy communicating by emails with digital signatures - the traitors cannot corrupt what they pass on.

The proof is at the very end of Lamports paper and the discussion is a bit limited (looks like the sort of thing you add into a paper then end up almost cutting back out again in order to get within word/page limits) - there is almost more on it in his comments on the page pointing to the paper.


For what it's worth, my own take on it is that there probably were, and are, people who are aware of and understand these results, at both A and B, and the level of redundancy is designed in knowing its limitations. Triple-redundant design is really only there to cope with a single failure, which in most cases it can. Chance of single failure is then supposed to be made small enough that the chance of uncorrelated double failure is acceptably low.

What bites us then is the correlated / common-mode failures. Same frozen fuel in the pipes to both engines, all static ports taped over, all pitot ports sticking into the same ice etc.

More redundancy doesn't solve these kind of issues.

Going back to the incident under discussion, how much AOA redundancy would you really need to prevent it ? PBL has raised that 5-way at least is needed for two failed (misreporting) sensors, but is that enough ?

My take:

1. the dodgy aircraft washing corrupted 2 out of 3 sensors
2. assume that failure rate carries on across the aircraft, however many sensors we have (reasonable ?)
2. therefore we have to cope with m (lying sensors) = 2/3 * n (total sensors)
3. we also know that n must be >= 2m+1

So we need to solve: n >= 2m +1 where m = 2n/3 (oh, and n > 0)

Good luck with that one.
infrequentflyer789 is offline