ATSB probes 'cosmic rays' link to QF72 A330 jet upset
This was reported in the West Australian newspaper today as "breaking news". I'm not a pilot nor involved in the aviation industry but merely a passenger with more than a passing interest in aviation. My question to those in the know is, "Is this possible, feasible or just another ho hum theory"?:confused:
|
Another Factual Report was released by the ATSB today. Here is the link:
MEDIA RELEASE : 18 November 2009 - 2009/16: ATSB Second Interim Factual Report into the Qantas Airbus A330-303 in-flight upset, 154 km west of Learmonth WA, on 7 October 2008 |
Hi,
That's very interesting ... I wonder when the International Space Station will go upside down .. and fall on the Earth ? :} But maybe there they know .. and have some protections ? :rolleyes: |
To save others the time....link to article
ATSB probes 'cosmic rays' link to Qantas jet plunge - The West Australian |
"Is this possible, feasible or just another ho hum theory"? We want to do more calculations so need more transistors running faster = more power more power = more heat To keep everything working OK the individual transistors must be smaller. A side effect of smaller transistors is they are affected more by radiation and are more likely to flip a bit. What happens next is depends on what bit is flipped, it can be short term if in RAM and recovered by rebooting alternatively it can be long term if in ROM or flash affecting the program or stored data eg locations. It is known to be a serious problem for space applications and special chips are used although they don't have the computing power of newer consumer type chips. The RCA 1802 chips in Voyager have outlasted RCA. |
ATSB Interim Factual Report No.2
|
Originally Posted by BorneoFly
(Post 5323660)
My question to those in the know is, "Is this possible, feasible or just another ho hum theory"?:confused:
|
hahaha.. and everybody said I was crazy with my tin foil hat!
on a serious note, It will be interesting to read the reoprt after they finish this tangent of their investigation. I would be flabbergasted if a solar flare on the sun could bring an aircraft down.. Putting my tin foil hat back on... I always knew CASA/ATSB were apologists for Qantas... but attributing an incident to cosmic forces would be an interesting low! :hmm: |
I wonder when the International Space Station will go upside down .. and fall on the Earth ? I assumed that aircraft would have similar protection in place to the ISS with regards to multiple backup flight control systems, but if they're engineered differently then cosmic rays could still pose a problem. Not particularly likely though, the odds of a cosmic ray hitting the wrong thing are spectacularly tiny |
Los Alamos Helps Industry By Simulating Circuit Failures From Cosmic Rays
Los Alamos Simulates Circuit Failures From Cosmic Rays....snip... We can't fully predict the effect of these interactions, which makes having a standardized way to test circuits extremely valuable" Wender said. "Very similar devices show radically different failure rates due to neutron interactions, and we have some evidence that the smaller transistors and lower operating voltages in newer devices produce higher failure rates....snip.... In the case of the latest, totally computer-controlled aircraft, these tiny cosmic gremlins could cause trouble, especially because the problem gets worse as atmospheric shielding dwindles at higher altitudes. At sea level, the shielding provided by the air is equivalent to more than ten feet of concrete shielding. The neutron flux at LANSCE, 7,000 feet above sea level is approximately three times greater than at sea level; and at 40,000 feet, the cosmic-ray neutron flux is several hundred times greater than the neutron flux seen on the earth's surface....snip... The Laboratory and NASA recently placed a complete aircraft control system in the LANSCE beam and linked it locally with a computer simulation for a Boeing 737. A future experiment will examine whether pilots can compensate for control system upsets during simulated flight, by remotely linking a computer undergoing tests in the ICE House to the flight simulator located in the NASA System Airframe Failure Emulation Testing and Integration Laboratory at the Langley Research Center. Los Alamos is collaborating in NASA's development of the SAFETI Laboratory, with networked links to individual NASA labs for aircraft structures, cockpit motion and propulsion systems http://www.ewh.ieee.org/r6/scv/rl/ar...3-talk-ref.pdf |
dkaarma,
I always knew CASA/ATSB were apologists for Qantas |
Actually, this isn't too far fetched. Many years ago, in a previous life, we had a civil computer being adapted to a military program. One of the tests was to change every bit in the program code from a one to a zero or vice versa and ensure that nothing bad would happen. (The system halting in this test was not considered "bad."
I've read the material on the QANTAS A-330 upset and am favorably impressed with the ATSB's work in this case. Dick Newman |
This is all very cute but beside the point. The thing is any component may fail and the Airbus has more than one ADIRU. A failure of a single component whether caused by cosmic rays or little green men should never lead to near-catastrophic results. The computer should have been able to detect an ADIRU disagree and identify the bad data, or if not possible just discard the AOA data altogether.
What happened is simply unacceptable |
"the Airbus has more than one ADIRU"
Interesting. The B777 has only one ADIRU, plus a secondary unit called SAARU. The latter will take over attitude and airdata indications (but no lat/long IRS data) in case of a(n) (partial) ADIRU failure, but I wonder how Boeing has solved the problem of identifying an ADIRU failure in the first place. Majority vote is not an option with only one unit installed, is it? Or does the B777's ADIRU have more elaborate internal fault recognition capabilities and more built-in redundancy than the Airbus's? (I can imagine that it is probably merely a matter of defining what components constitute a 'unit', but then the use of the same acronym is sort of puzzling.) Sorry for being off-topic... |
vovachan and xetroV,
You have me scratching my head.... I'm an ancient from the Concord era, when the integrated circuits were still so huge, that a single cosmic ray, neutron or alpha particle couldn't really upset the electronics. But components could fail. So (to stay with flight control computers, analog in those far-off days), each computer had two virtually identical channels, dubbed "command" and "monitor", and comparators all down the chain (and those were duplicated too) checked that "C" and "M" told exactly the same story. If they didn't ... "boing" and the computer disengaged. Then, on the other side of the aircraft, a second computer, until then in standby, would take over. Checks for passive failures (like a comparator failing "healthy") were dealt with by preflight BITE (built-in tests), and some of those tests were repreated just before an autoland, reducing the "period at risk" to only minutes. I wasn't directly involved in the earliest DAFS, but by looking over their shoulder, I saw most of the same principles were applied. So what's been going on since? Even with ROMs, RAMs and everything else today being far far smaller, I still would think the probabilty of two particles hitting the same spot in two "halves" of a system provoking an identical spike, that then would be missed by the comparators, would be infinitesimal. So has there been a fundamental change in architecture? And wouldn't that kind of change be equally bad at catching component and software failures as "cosmic ray" events? I can see the line of thinking of the ATSB... I would have be tempted too, if everything afterwards worked perfectly, and there was no way of reproducing the fault. But there still seems to be something wrong with that reasoning.... CJ |
xetroV- The 777 does indeed only have one ADIRU unit but that unit consists of multiple accelerometers and laser gyros. This redundancy within the unit didn't prevent an incident to an MAS 777 doing something very similar to the Qf incident-also off the coast of WA. I don't think its a problem unique to one manufacturer or another but an indication of the lack of understanding of how software interacts.
|
"Cosmic ray" = IT equivalent of "Gremlins", i.e. joking term for an unexplained failure and especially one caused by human factors.
|
vovachan and xetroV, You have me scratching my head.... Also electronics are prone to transient failures. You take it to the repair shop, they plug it in and it works fine. |
OK, ancient here again....
In my days, when two 'halves' of a computer disagreed, it was "ping", " "boing", "click", and the (analogue) computer took itself off-line, with usually a blinking light on the CWS (central warning system) as well, and handed over to the pilot, who then had the choice of staying in manual, or engaging the standby on the 'other side'. Only during the last minutes of an autoland, the failed computer would hand over automatically to n° 2, which would already be synchronised, and would already have been tested and found healthy. It woiked well, mostly because the probability of two identical components on two sides failing in the same way within a few minutes could be shown to be in the order of 10-9 to 10-12, depending on the "time at risk". From the little I know about DAFS, much the same was achieved initially with the two 'halves' using different processors, diifferent languages for the software, and different compilers. Sure, if the software spec was wrong, there could still be problems, but that was no different in the analogue-and-logic world. So what's happened since? Leaving a computer in control of an aircraft while responding to "data spikes" gives me the cold shivers...... yet that seems what has been happening.... Can anybody elucidate....? CJ |
"Cosmic ray" = IT equivalent of "Gremlins", i.e. joking term for an unexplained failure and especially one caused by human factors. I am not sure, but you would have to think for this to be published (Cosmic Rays) that it is designed to take the blame away from a nasty - potentially catastrophic software/hardware fault within the ADIRUs. Yet surely, saying that the 330 is subject to random cosmic rays would have to be even less reassuring. If they had said the ADIRU can be replaced due 'this' (ie whatever fault they find) particular hardware fault, then most people would be satisfied - but now the whole jet can be susceptible to complete lack of control from unseen random cosmic rays! FFS...Really? I can just imagine the punters now (or the random sandwhich shop worker interview) "I cant hop on an Q airbus again now due to cosmic rays" |
I am not sure, but you would have to think for this to be published (Cosmic Rays) that it is designed to take the blame away from a nasty - potentially catastrophic software/hardware fault within the ADIRUs. Yet surely, saying that the 330 is subject to random cosmic rays would have to be even less reassuring. If they had said the ADIRU can be replaced due 'this' (ie whatever fault they find) particular hardware fault, then most people would be satisfied - but now the whole jet can be susceptible to complete lack of control from unseen random cosmic rays! FFS...Really? I can just imagine the punters now (or the random sandwhich shop worker interview) "I cant hop on an Q airbus again now due to cosmic rays" In any event, "cosmic rays" is utter bullocks. Its not like they were suddenly invented. Stay tuned |
Simple solution for those Oz types: why speculate about such important safety matters when the technology is available (and has been for decades) ? Just mandate a Wilson Cloud Chamber in every cockpit, with detection system linked to the flight computer for instant pilot awareness.
Better yet, wrap the plane with said Cloud Chamber, for more complete coverage. ayyyyyy .... :ugh: |
Before we all talk the cosmic ray theory to death, remember that a great deal of accident investigation is going down many leads to see what didn't happen. When I first began working on NTSB teams, I was surprised by how much effort was spent explaining what couldn't have happened.
After you have discarded all the impossible explanations, whatever is left, no matter how improbable must be the truth --- Sherlock Holmes. Dick Newman |
....But Sherlock Holmes was a figment of someones imagination wasnt he? :}
|
But Dick, we haven't ...
... eliminated all the possible explanations.
We are rightly concerned when a pilot dozes off without warning, but when an ADIRU goes into dozing mode it is a non-critical rare event? In my experience, when a computer goes into doze mode and has to be rebooted, either the hardware failed, something in the software did other than what the programmer intended, or the system design failed to take into account all the possible consequences of all the programmers' different intentions. Hardware faults caused by cosmic rays should happen at a statistically predictable rate depending on known parameters. Dozing faults can be caused by software. For example, a process may end up in a tight loop (unintended) or when memory is tight, several processes may end up waiting (intended) for other processes to release memory - and they don't release it (also possibly intended). This type of fault is statistically more likely on computers that run for longer than average between reboots. If something like dozing can happen, how can we be sure enough that something else other than what is intended will not happen? |
One way is to use more sophisticated watchdog timers that check the computer is awake and not spending all it's time looping. If the correct actions aren't taken the hardware gets reset (or something less drastic).
|
Rightbase,
cwatters has the right answer... The kind of real-time software in digital autopilots, etc. is very different from 'data-processing' software, be it PCs or mainframes, which is mostly interrupt-driven. Watchdog timers are small bits of independent hardware which have to be reset at regular intervals (say 100 msec, possibly less). Any fault, software or hardware, that results in the watchdog not being reset in time (such as "hanging up" in a loop, as you mentioned), will prompty produce a failure warning, and cause the computer to disconnect. CJ |
The kind of real-time software in digital autopilots, etc. is very different from 'data-processing' software, be it PCs or mainframes, which is mostly interrupt-driven. OKAY, but from the report: One type of fault event associated with the ADIRU model is known as ‘dozing’. Once ‘dozing’ commences, the ADIRU stops outputting data for the remainder of the flight. |
"Dozing" ... not really a technical term any of my more knowledgeable software engineer friends have heard of. From what they say:
There are without a doubt watchdog timers which reset parts of the system and restore the system in to a meanginful and known state - known here means stable. The way processes in these systems are organised is NOT the same as a home PC but more or less fixed at design-time so timing and other interrelations are known and can be tested for or even proven. Dozing appears to mean - according to some - that the ADIRU placed itself into a known state where the functions provided are effectively suspended. Why it ended up in such a state is the question - that is what set of events resulted to ADIRU to "fail" in that way. Fail means "fail safe". As I understand there are two other ADIRUs and voters - were there failures there as well because failure of one ADIRU shouldn't cause upset. fc101 E145 driver --- some text rephrased from sources who know more saftey critical systems than me. |
Of course, I don't mean to suggest that we shouldn't worry about the software code. ADIRUs have a failure rate of the order between 1/1000 and 1/10000. We still need triple redundancy to avoid a catastrophe. We need to ensure that independent computer software errors do not go uncorrected, whether caused by an ADIRU failure or be a cosmic ray upsetting a single bit.
In general, we've done a pretty good job of not have the software make mistakes in calculations. Where we may have fallen short is in writing our requirements to take these ADIRU failures or other single events into account. I was distressed during my previous employment when my boss reacted to the QANTAS upset with "Well, it was only an ADIRU failure<"when the response should have been "How could an ADURU failure make its way through to the flight control surfaces. Dick |
Hmm, Intel thinks this is a real problem...
From an article in New Scientist (March 2008):
"But Intel thinks we may still be living on borrowed time: "Cosmic ray induced computer crashes have occurred and are expected to increase with frequency as devices (for example, transistors) decrease in size in chips. This problem is projected to become a major limiter of computer reliability in the next decade. " Their patent suggests built-in cosmic ray detectors may be the best option. The detector would either spot cosmic ray hits on nearby circuits, or directly on the detector itself.When triggered, it could activate error-checking circuits that refresh the nearby memory, repeat the most recent actions, or ask for the last message from outside circuits to be sent again. But if cosmic ray detectors make it into desktops, would we get to know when they find something? It would be fun to suddenly see a message pop up informing a cosmic ray had been detected. I haven't seen any recent figures on how often they happen, but back in 1996 IBM estimated you would see one a month for every 256MB of RAM." Although I'm not directly involved in aircraft avionics, the problem of cosmic ray effects on computing devices is REAL. Don't dismiss this as goofy pseudo-science - there is a lot of money being spent investigating this. - GY :ooh: |
If I may put my 5 cents worth in (used to be a penny)? There is a general misrepresentation of the colloquial term 'cosmic rays'. Did I say anything about the 'media stock phrases and cliches' handbook? Wash my mouth out!
This discussion concerns high energy particles, and a reading of Cosmic ray - Wikipedia, the free encyclopedia will bring one up to speed. They are singularities, and although they can occur in 'showers', read high_incidence_of, they *are problematic, and how much so depends on each individual particle's very variable energy level. They are not just a threat to electronics, but also to DNA and indeed any of your cells. On the well _known _in _the _trade basis that such an particle can 'take out' an individual electronic component, whether temporarily if low energy or sometimes permanently if high energy, any problem should be an isolated event that can in no way known to wo/man be specifically guarded against, short of using lead wrapping on all boxes. As another ancient here says, the design must fully guard against any individual failure. On a related matter, here’s snippet of information related to Airbus’s design philosophy. I haven’t seen this mentioned since my engineering course on the second lot of free range A320s. (Gosh! Have they been flying for *that long.) It was stated then that Airbus went to what I would have thought were excessive pains to diversify the build parameters and supply sources of all duplicated equipment. We were told by an Airbus rep that duplicate suppliers were given design parameters which they were free to achieve electronically anyway they chose, but obviously to tight aviation constraints. The ultimate black boxes. The idea was that a *design flaw in one element of the control architecture would be isolated to one item in the control chain by default. To the best of my recall this philosophy was applied across the entire airframe, and I have been surprised at reports that certain Airbus aircraft have finished up flying with all pitots from the same manufacturer. That certainly was not the original designers' intent. No doubt the cost of extensive duplication of non-identical but similarly functioning components has attracted the attention of the financial fine tuners. <sigh> (Written from the future as this appears, the comment re pitots seems rather relevant to the current (20100820) threat mulling the AF447 loss. Amended by Jencluse.) |
PS I stand corrected:
The flight computer does filter and compare AOA data coming from the 3 ADIRUs, but there is a scenario when it can be fooled: • there were at least two short duration, high amplitude spikes • the first spike was shorter than 1 second • the second spike occurred and was still present 1.2 seconds after the detection of the first spike. |
Originally Posted by Dick Newman
I was distressed during my previous employment when my boss reacted to the QANTAS upset with "Well, it was only an ADIRU failure" when the response should have been "How could an ADURU failure make its way through to the flight control surfaces"
|
Originally Posted by Lookleft
xetroV- The 777 does indeed only have one ADIRU unit but that unit consists of multiple accelerometers and l@ser gyros. This redundancy within the unit didn't prevent an incident to an MAS 777 doing something very similar to the Qf incident-also off the coast of WA. I don't think its a problem unique to one manufacturer or another but an indication of the lack of understanding of how software interacts.
I agree with your statement about software interaction. This is becoming increasingly important as more and more aircraft systems are being integrated and interconnected, while at the same time the required navigation performance and vertical separation are continuously being reduced, as the skies get busier. At the very least, accurate and quick internal error detection algorithms should provide smooth systems degradation that is immediately obvious and totally transparent to the flight crew. Sudden uncommanded autopilot upsets are not what I call "fail passive" (let alone "fail safe"). |
A&WST had an article about the criteria 'space hardened' electronic if I can remeber to look I'll post--but CJ has given me alot to think about :)
PA |
during the incident flight the bad ADIRU produced 42 data spikes, 40 of which were caught by the computer except the 2 which caused the upset. All these dozens of spikes did not make the computer realize the ADIRU was bad. |
A&WST had an article about the criteria 'space hardened' electronic if I can remeber to look I'll post--but CJ has given me alot to think about http://images.ibsrv.net/ibsrv/res/sr...lies/smile.gif ECSS-E-ST-40C - Software general requirements ECSS-Q-ST-60C Rev.1 - Electrical, electronic and electromechanical (EEE) components Although you have to log in to see them (registration is free). Slightly more generalised versions are online here: https://escies.org/ReadArticle?docId=167 |
It's very important to know if your error correction circuit is being triggered and take some action. |
I am not a pilot, just an interested observer. Is this incident relevant to what happened on the Qantas aircraft?
Incident: US Airways A333 over Atlantic on Nov 17th 2009, computer issues By Simon Hradecky, created Friday, Nov 20th 2009 14:30Z, last updated Friday, Nov 20th 2009 14:30Z A US Airways Airbus A330-300, flight US-740 from Philadelphia,PA (USA) to Madrid,SP (Spain), was enroute at FL390 about 350nm east of Philadelphia overhead the Atlantic about 40 minutes into the flight, when the crew announced they needed to return and was cleared to turn to the left. About 40 seconds later during the turn the crew declared emergency and requested to descend. About another 5 minutes later while levelling at FL300 the crew reported, that everything had returned to normal explaining, that they had experienced computer problems they were unable to resolve and they had been "missing control". The emergency was cancelled, the airplane continued back to Philadelphia. The airplane landed safely on Philadelphia's runway 09R about 75 minutes after the onset of trouble. |
All times are GMT. The time now is 15:57. |
Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.