PPRuNe Forums

PPRuNe Forums (https://www.pprune.org/)
-   Tech Log (https://www.pprune.org/tech-log-15/)
-   -   Boeing 787 integer overflow bug (https://www.pprune.org/tech-log/560793-boeing-787-integer-overflow-bug.html)

SAMPUBLIUS 3rd May 2015 02:01

RE OVERFLOW BUG
 
High and Flighty/FAA said

'...That'll be bad news if all four of the GCUs aboard a 787 were powered up at the same time, because all will then shut down, “resulting in a loss of all AC electrical power regardless of flight phase.”>>
But normally there is a few second or minutes between start of one engine and the second engine. And then the APU will be shut off after climbout . Which then leads to the following

One engine GCU times out and shuts of its two generators. No biggie- other engine and APU can easily carry the load. But a few minutes later, 2nd engine generator system cuts out . Oh well we still have APU to start engines ? Then an few minutes later, the APU generator times out ??

Or is the GCU involved a single point join so that its timer overloads - and the battery system cuts in with ' nearer my god to thee ' ??:8

roulishollandais 3rd May 2015 03:02

Building a software is like building a house, the first thing you have to do is to list all the materials/variables that you need, defining size, use, purposes, movement, range aso. The integer counter is one of the easiest variable to verify all along the software design and realisation. No need of sophisticated stats, only very very basic methods with paper and pencil in your armchair. No excuse after Ariane501 crash and report.NO !

cattletruck 3rd May 2015 03:35

Congratulations to the tester that found the bug. Good testers think outside the box as this one had done, 248 days was perhaps an unlikely scenario but by bringing it up as a test fail it really got everyone's attention of what a simple oversight can do.

Once fixed it's going to take another 248 days to re-run the test.

Anyhow, methinks a 787 version 1.0 probably flies better if you reboot it first.

poorjohn 3rd May 2015 05:30


Once fixed it's going to take another 248 days to re-run the test.
I'm pretty sure it can be qualified "by inspection".

Guptar 3rd May 2015 09:46

Very interesting thread, something I had never even thought about. I just wish I had even the faintest idea of what you guys are talking about.

So, can someone answer a couple of questions for a simple guy.

Why does a GCU have an integer counter, does it need to count something to measure time or cycles of something?

If all modern computers are coded in 64 bit sizes, why did Boeing stick with 32 bit.

I gather, from googling it (nothing of which I understood anyway), integer counters are fairly common in computing software, so how do banks not have this problem as their computer hardware boxes have times between power downs measured in years.

If software is not hand coded, ie someone pounding away on a keyboard writing lines of code, how is it written if it's not hand coded.

All this stuff, sounds like you're talking about the warp drive of the Starship Enterprise.

I have such a headache now!

Amadis of Gaul 3rd May 2015 11:04


Originally Posted by p.j.m (Post 8962491)
Boeing must be using Windows programmers these days.

Pilot: "Hello Help desk - the aircraft has lost power"
Indian "have you rebooted?"

Hey, that's racist!

poorjohn 3rd May 2015 13:26


Why does a GCU have an integer counter, does it need to count something to measure time or cycles of something?
That's a question for the hardware guys.

If all modern computers are coded in 64 bit sizes, why did Boeing stick with 32 bit.
A "counter" is basically a unit of data storage in memory and the associated software that manipulates the value being stored, e.g. increments the value and tests it against limit(s). Computers typically have instructions that let them access memory in chunks smaller than the default size, and to not waste memory (which for critical real-time devices can be expensive) the programmer selects a size appropriate to the need.

I gather, from googling it (nothing of which I understood anyway), integer counters are fairly common in computing software, so how do banks not have this problem as their computer hardware boxes have times between power downs measured in years.
This 787 counter counted units of time, so it was a timer. You'd have to know what it was for and why it was designed to force the hardware it controlled into some inoperational mode when the value became zero. It could have been a valid reason e.g. the device had reached a critical time limit where it had to be shut down and lubricated and the problem the program designer didn't allow for was that that service could have taken place without powering off the device and resetting the timer/counter.

If software is not hand coded, ie someone pounding away on a keyboard writing lines of code, how is it written if it's not hand coded.
Programmers may insert into their own program software modules written by other programmers. Hand-coded, but by others' hands.
(The design fault here is that the software does not count characters I've typed within a quote, so I have to say something I didn't need to say outside the quote or it will flog me because my message was "too short".)

HighWind 3rd May 2015 18:33


Why does a GCU have an integer counter, does it need to count something to measure time or cycles of something?
That's a question for the hardware guys.
I'm not working with areospace, but in my field of engineering (wind turbines) it differently would have integer counters.
Often those systems run at a constant scan rate, and software filters and timers are used to slow down the reaction of the system in a configurable manner.
Since systems have boxes connected together with communication links, monitoring of broken communication links have to be implemented. (Typical as Timeouts).
Another purpose could be for shutting tings down in case of faults, e.g. stop the engine if the lubrication pressure is lower than 2 bars for 5 secs.
Timers is also used to delay, and prevent erratic state change of an output, i.e. prevent a valve from being turned off/on every 10ms. Scan.
(Persistent) Counters are also used for statistics for maintenance and trouble shooting.
It is good system-engineering practice to separate/compartmentalize safety critical control, from datalogging for diagnostics.

If all modern computers are coded in 64 bit sizes, why did Boeing stick with 32 bit
The size of a counter value is primary related to software, and not hardware architecture.
Using a 64bit desktop microprocessor in such an environment is often a bad idea, if possible micro-controllers like ARM cortex is used instead.
A bigger complex CPU use more power, generates more heat, and is 100 times more unreliable than a small microcontroller.
An Intel desktop CPU is only on the market for 3 years, and industrial/aerospace products have to be supported for 20 years.
Some of the newer micro controllers like the TMS570 have features that make safety certification easier.

rh200 3rd May 2015 20:56

The basics of what type of variable you declare depends can upon several things

1) You misunderstand the requirements.

2) you are sloppy.

3)architecture coupled with the above.

It used to be that people tried to keep their code small, with todays cpu's and resources, people have become very sloppy. But there are a couple of places (probally more) that I know which forces me to declare small variable types.

1) Where the output of the variable, results in and excessive amount of data to capture and store.

2) Using micro controllers. These usually have limited on board space. I would imagine in an environment such as these, with access to the best, they would still be constrained.

And there could be many other reasons.

underfire 3rd May 2015 22:40

FAA directive issued for 787
 
(CNN)The headaches for Boeing over its 787 Dreamliner continue.

The Federal Aviation Administration on Friday issued a directive mandating "a repetitive maintenance task" for that model of airliners due to issues with its power supply. Specifically, the FAA explained testing revealed that 787s could lose all AC electrical power after being continuously powered for 248 days, a problem that -- if left unchecked -- would leave an aircrew unable to control the plane.

The order took effect immediately, with the federal agency finding that there's no good reason to delay the decision.

FAA finds Boeing Dreamliner could lose all power, issues maintenance mandate

Radix 4th May 2015 05:22

..........

peekay4 4th May 2015 06:47


Why does a GCU have an integer counter, does it need to count something to measure time or cycles of something?
The purpose of an integer counter is to provide a standard measurement of time.

Remember that hardware can run at varying speeds, so we can't rely on hardware cycle speed to measure time. E.g., suppose today a CPU runs at 1 GHz, but tomorrow a replacement CPU comes out at 2 GHz, so each hardware cycle is now twice as fast. We don't want all of our time measurements to be suddenly be off by a factor of two!

Therefore a counter is provided which always increases at a predictable, set time period (called the tick time period) regardless of the underlying hardware speed.

A common tick period is 100 Hz. I.e., the time counter will always increment once every 1/100th of a second, regardless of the speed of the hardware. An elapsed time of 100 ticks means 1 second has passed, on any hardware.

Most real-time systems are completely tick based. At each and every tick, the system "kernel" is activated and every running task re-scheduled for execution based on their priority and allocated processing time budget (also measured in ticks).


If all modern computers are coded in 64 bit sizes, why did Boeing stick with 32 bit.
Boeing probably had little to do with this bug. The affected GCUs would have been supplied by a third-party company.

And that third-party company probably used a Real Time Operating System (RTOS) supplied by yet another company.

My guess is this integer overflow is probably in the RTOS or related code. The bug might have been discovered in some completely unrelated software (maybe not even aviation software) using the same RTOS.

The speculation is that the buggy code is a 32-bit signed counter measuring 100 Hz ticks. So with one bit taken for the sign (+/-), that leaves 31-bits for the counter and 2^31/(60*60*24*100) = 248.55 days.

roulishollandais 5th May 2015 01:33


Originally Posted by Peekay4
Boeing probably had little to do with this bug. The affected GCUs would have been supplied by a third-party company.

And that third-party company probably used a Real Time Operating System (RTOS) supplied by yet another company.

My guess is this integer overflow is probably in the RTOS or related code. The bug might have been discovered in some completely unrelated software (maybe not even aviation software) using the same RTOS.

If you use software from a third party, you need not only the soft or the RTOS but the totality of its documentation and the whole test data. The furnisher of the RTOS or the software may design them for a toy, but Boeing uses them for an aircraft.
The certifiers are at fault too , they have to verify that documentation and test data are there and tests have been done actually after implementation. It seems easy to ask a third party to share the work, in fact you have to verify all the links .
To be sure the work is done you have to pay when you received everything and it is OK. Everyone must sign his work as complete. Certifiers should not have certified the B787 before all the tests are done and on the table.

TURIN 6th May 2015 10:16

Back in the real world...

It takes about 20 minutes to downpower and reboot the a/c. Not good on a quick turn round but if the a/c has just come out of the shed after an A check, no big issue.
It is common practice to park the a/c without power if it is not required for several hours.

Just another card on the check.

vapilot2004 6th May 2015 12:10

The 'lazy' certification issue RH mentions is truer today than ever before. More and more reliance on manufacturer-designed testing regimes for the regulators regarding airborne computer systems has the odd chicken coming home to roost in times recent. (past few decades)

This lack of complete knowledge of the widest operational range (extremes/faulty sensors/etc) at the confluence of hardware/software interface has the potential for the occasional dicey consequence - particularly after human factors are added into the melange.

roulishollandais 10th May 2015 16:41

integer overflow
 
if you don't want to read the totality of that report :

Originally Posted by Ariane 501 full report (12 pages only to read)
The internal SRI software exception was caused during execution of a data conversion from 64-bit floating point to 16-bit signed integer value. The floating point number which was converted had a value greater than what could be represented by a 16-bit signed integer. This resulted in an Operand Error.

The data conversion instructions (in Ada code) were not protected from causing an Operand Error, although other conversions of comparable variables in the same place in the code were protected.

The error occurred in a part of the software that only performs alignment of the strap-down inertial platform. This software module computes meaningful results only before lift-off. As soon as the launcher lifts off, this function serves no purpose.

The alignment function is operative for 50 seconds after starting of the Flight Mode of the SRIs which occurs at H0 - 3 seconds for Ariane 5. Consequently, when lift-off occurs, the function continues for approx. 40 seconds of flight. This time sequence is based on a requirement of Ariane 4 and is not required for Ariane 5.

The Operand Error occurred due to an unexpected high value of an internal alignment function result called BH, Horizontal Bias, related to the horizontal velocity sensed by the platform. This value is calculated as an indicator for alignment precision over time.

The value of BH was much higher than expected because the early part of the trajectory of Ariane 5 differs from that of Ariane 4 and results in considerably higher horizontal velocity values.


blackbeard1 10th May 2015 18:51

Ariane 501 and Cluster
 
There is no such thing as a "free launch" or lunch, I was involved in Cluster and almost 10 years of my life went up in smoke.
Cluster (spacecraft) - Wikipedia, the free encyclopedia)

roulishollandais 12th May 2015 10:23

@blackbeard1
10 years of your life in smoke from that crazy overflowing bit but wide misfunction in that rocket project ! Condolences !

Boeing may probably find other things to care…:ugh:

atakacs 12th May 2015 11:21

Wasn't cluster II re-launched a few years latter and still happily operating ?

blackbeard1 12th May 2015 12:21

Cluster
 
Cluster was rebuilt and launched from Baikonur and is still working and giving good scientific as you said. I am now retired, as are most of the original team, sadly some have died but it is good to know that the original design and objectives are still giving good scientific results.

ESA Science & Technology: Cluster

roulishollandais 22nd May 2015 00:38

Thank you blackbeard1 for that wonderful Cluster and link.
I had pleasure to learn more from boreal auroras and the last studies.

Sunamer 22nd May 2015 08:16

"32bit signed value used as a counter running at 100Hz?"

I was wondering, why would you need to have a signed value if it is a simple counter...
Unsigned one would give twice the range of the signed one... 248*5 = more than a year. :}

"If you aircraft was powered for more than a year, don't forget to power cycle it..." kind of thing

The reaction to this non issue from outlets like CNN was just... :yuk:

EEngr 22nd May 2015 15:34


I was wondering, why would you need to have a signed value if it is a simple counter...
Because signed integer math and conditional logic can give you positive/negative interval values. As in one event occurred before or after another. And there may be places in the code where this would be expected.:8

dClbydalpha 24th May 2015 09:43


non issue
Sorry but this is anything but a non-issue, looking at the information in the publice domain, this is a systematic design failure.

1. The GCU control system fails after ~7000 hours.
2. It is a common mode failure so no credit can be given to multiple systems.
3. The failure leads to loss of all AC.
4. Loss of all AC is at least HAZARDOUS.

Therefore a target of 1x10-7 is fulfilled by a design stuggling to meet 1x10-4

Firstly the overflow error should be trapped at source. It adds complexity to design, but it needs to be done in safety critical systems.
Secondly it appears the safety analysis has not fully analysed all the software failures ... if the software design process guidelines for safety critical systems had been followed then this should have stood out like a sore thumb. This is the kind of thing that happens when people use the analysis from old designs, without re-validating the original assumptions against the new design.

In mechanical terms, if a fastner repeatedly loosens in flight there is something wrong, it is not acceptable to say that it didn't come totally undone so as long as we tighten it up each time it is ok, the fastener should be redesigned.

I have not seen a statement from Boeing that denies any of the 4 assumptions i have made, but I emphasise that I have no detailed knowledge so this is based only on the public domain information ... but based on that it really worries me, because it isn't a "bug" it is a systematic failure.

roulishollandais 24th May 2015 17:51

dClbydalpha,
I agree with that.
Once again learn the lesson from Ariane501 report : Not only it is easy to avoid integer overflow being very methodic, but the report showed that a long list of other failures have been leading to the fatal 37. second. Any item of that list should have avoid the rocket destruction.

DeafOldFart 24th May 2015 22:06

Er..... how about tacking another 32 bit address on, to make it 64 bit count.... should take us into intergalactic durations....
Or running a slower clock speed, like the 1khz machines I cut my teeth on...

roulishollandais 25th May 2015 08:44

Hello msbbarratt,
Excellent post ! :)

In which case the spec was junk
In the case of Ariane501 somebody said the spec said that in case of double IRS failure stop the trajectory calculation... So that spec was not very smart !

And we often read on PPRuNe "It worked as designed".
The difficult for the IT analyst is to guess where something could be missing or wrong in the specification ! And we have to warn the people who is building the spec : "That could happen, do you want to accept that ?" because we know the hidden side of the system and architecture that the final boss is not aware with (like DeafOldFart suggesting to replace the B787 overflowed 32 bits integer by a 64 bits integer or modify the frequency :})

Let us hope it is the cheapest case for Boeing but probably it will not be the case as certifiers did jump over the bug too..:{

dClbydalpha 25th May 2015 10:53

I have just finished reading the linked item below.

http://www.faa.gov/about/plans_repor...port_final.pdf


As suspected, the usual observations are there. Lack of ownership of requirements, inadequate v&v coverage and use of previous design experience without re-validating the design assumptions.

Nothing new, and that is what concerns me. Not disasterous as an individual item, no outright condemnations, but as the report shows that the GCUs were a deep-dive item, the process seems to be struggling with managing the complexity and nature of these next generation projects. In this case the inevitable system level impact of a low-level design decision was not spotted, perhaps due to the amount of responsibility boundaries that had to be crossed between.

Uplinker 25th May 2015 11:20

Forgive me because I am not a software programmer, but any airborne safety critical system - such as a GCU - that is required to work should not be even slightly open to being compromised or shut down by just a clock, or a clock malfunction.

The GCU's in this case do not fail, they are switched off because a clock says so. What does a mere clock know about the generator load, the CSD oil temperature and pressure, the servicability of the other electrical systems in the network etc?

To have a healthy system shut down because a mere timer or a timer fault says so is crazy!!

How was it ever allowed to be designed this way?

EEngr 25th May 2015 14:53


However this all seems to have been some sort of surprise, and it shouldn't be.
This is he primary problem as I see it. The fact that the spec/design/test process appears to have a large hole in it through which this bug slipped needs to be investigated further.

The whole GCU reset every 248 days, by itself, is a non issue. That (like many other maintenance items) can easily be taken care of once the issue is known and few people would care. Some might. Every maintenance step, no matter how trivial, incurs a cost to document and track at the operator's expense. So even one extra check box would raise a few questions. Particularly if they understood how trivial the fix would have been back in the design stage.

But what with the industries increasing reliance on manufacturers self certification and the regulators hesitance at questioning anything process related within a company, I'm not hopeful that other bugs haven't slipped through as well.

roulishollandais 25th May 2015 17:52

Thank you dClbydalpha

roulishollandais 25th May 2015 18:01

EEngr
The issue may be very different if your system is analogical (Concorde) or digital (Ariane5, Airbus320 family, B787).
In the first you may have saturation, in the latter unknown consequence of carry/overflow indicator like the destruction of the rocket (8 billions FF) for an unused variable BH.

roulishollandais 26th May 2015 06:53

"Tout va très bien, Madame la Marquise"!

Despite some people were retired other teams had been working on Ariane4 V33 and on Ariane5... But they were focused on terrorism instead of science ! They had not an enough IT level of knowledge:mad: and were leading hidden geopolitical aim...:suspect:

Their was a confusion between the fact that both IRS do not work, and how that diagnostic is done -with a double crazy carry, followed by a long list of failures and loss of rigor with excess of optimism, trusting in the first positive statistic results instead of tracking the best proof.

Uplinker 26th May 2015 06:55

Clock, counter, whatever.

My point remains the same. We simply cannot have safety critical and perfectly functional systems shutting down because of mere "housekeeping trivia". This needs to be addressed. Safety critical systems should never be shut down by mere admin processes.

If it overheats: maybe. If the oil pressure drops: maybe. If it over speeds: yes. But an overflowing clock/counter? Definitely not!

I am a current line pilot, and although I am not a software programmer, I have written simple software programs, so I know all too well that a computer will very literally only do what you tell it to. It will not do what a human would do. It will not make assumptions or "know" the consequences of its actions or non actions. Something as important as a main generator should not be subject to anything more than a simple logic network which keeps it operational as long as its basic parameters remain within limits.

SAMPUBLIUS 26th May 2015 14:29

Computers are Super-Fast Idiots
 

so I know all too well that a computer will very literally only do what you tell it to. It will not do what a human would do. It will not make assumptions or "know" the consequences of its actions or non actions.
Amen Amen.

To err is human- to really screw up takes a computer.

The above comment is/was the point of my initial post in this thread.
Other comments along that line also apply.

There should/must be NO way an ' administrator ' should be able to shut down a critical system without recourse. PERIOD:mad:

EEngr 26th May 2015 15:55


Something as important as a main generator should not be subject to anything more than a simple logic network
Good luck with that. Modern aircraft have electrical systems far too complex to operate for a 'simple logic network'. And airlines are not going back to the days of a flight engineer with a panel full of gauges and switches. 'Software' is the only practical way of controlling and reconfiguring such a system to account for generators going on or off line and bus reconfiguring for various external power or autoland configurations.

What we need are sound software development processes that catch these simple kinds of mistakes and get them fixed. Or at least exposed to examination before a product is put into service. This isn't a big deal in the embedded s/w world. The RTOS (Real Time Operating Systems) vendors have been producing libraries that handle such trivial things for years. In everything from my TV set to a controller in a nuclear power plant. My question is: Who has the clout to hold Boeing's feet to the fire to adopt such processes?

roulishollandais 26th May 2015 17:36


vendors have been producing libraries that handle such trivial things for years. In everything from my TV set to a controller in a nuclear power plant. My question is: Who has the clout to hold Boeing's feet to the fire to adopt such processes?
don't dream , others sectors are not perfect ! Fukushima is a desaster where during years they refused to respect the warning of some hydrologists who said the water is the first threat against the plant !
Where a fault may be done once somebody will do it.
We have to learn from our faults, sins and other mistakes...

Radix 26th May 2015 20:45

..........

peekay4 26th May 2015 21:03

There is no such thing as a perfect process or a perfect system. And furthermore, expecting (or depending on) perfection is the wrong thing to do, because it is unrealistic.

In fact, during certification of (new) aircraft, there is an acknowledgement that some defects will remain.

Hence, defects such as these -- while should have been caught -- are not indicative of a process breakdown, certification breakdown, etc., but simply a reflection of reality.

The effects of any potential defect, however, should not be catastrophic. So what should be expected is a "graceful degradation" when failures do occur.

Actually a better analogy might be "defense in depth" used in security practice -- having multiple layers so that even a complete failure of one layer does not bring down the entire system.

The real question is then: even given a quadruple GCU failure taking down all four AC busses (due to this bug or some other malfunction) -- will that crash a 787?

Someone more familiar with 787s can correct me, but I think the answer is generally NO, as there is still the DC bus which will automatically run from batteries, before the ram air system kicking in (or possibly from APU as well.)

DozyWannabe 26th May 2015 23:49

Time for a bit of a reality check, I feel...


Originally Posted by roulishollandais (Post 8963377)
No excuse after Ariane501 crash and report.NO !

In all fairness, the Ariane 501 scenario is a completely different kettle of fish from what we're talking about here. The former was a case of a hard-coded logical error involving number format translation and bit-depth conversion, whereas the latter was a case of integer counter overflow - however the crucial difference is that the former error occurred in a part of the program which was always expected to be executed, whereas the latter is very much an edge case (i.e. a scenario which is unlikely to occur in the real world). In practical terms we're talking about a scenario in which the aircraft has not once been in a "cold and dark" state for two-thirds of a year (ref: TURIN at post 54).


Originally Posted by cattletruck (Post 8963384)
Congratulations to the tester that found the bug. Good testers think outside the box as this one had done...

Heh - I very much doubt that it was a single tester. Real-time software testing works rather differently from other disciplines. I suspect that it would more likely have been part of a suite of edge-case regressions intended to be added from the start.


Once fixed it's going to take another 248 days to re-run the test.
Nope, far more likely that the testing suite can increment the counter at any rate desired. :) Remember, it's not the counter itself that is the root of the issue as much as it is the dependent systems' ability to interpret the rollover correctly.


Originally Posted by Guptar (Post 8963617)
If all modern computers are coded in 64 bit sizes, why did Boeing stick with 32 bit.

Home/business computing and real-time/safety-critical computing are entirely different worlds. I'm not going to go into detail now, but it's worth pointing out that safety-critical systems tend to use obsolete hardware because of its proven nature and significantly lesser complexity. (Engineering maxim : more complexity means more things that can go wrong). More to the point, using a 64-bit signed integer would just have kicked the "can" (problem) down the road.


Originally Posted by msbbarratt (Post 8988412)
2) the specification defines an up time consistent with normal aircraft operations, < 248 days, in which case the software would have to have been tested against it before it was certified

3) as per 2) but someone has also taken the trouble to go beyond the spec in their testing and discovered the true system up time.

My "money" would be on this.


However this all seems to have been some sort of surprise, and it shouldn't be. It should be there in black and white in the paperwork. And it may well be the case that it is all written down in the right place, but that someone else simply hasn't read it. I'm expecting that to be the case, actually.
Possibly - I was thinking that they were applying additional "layers" of edge case testing based on the likelihood of the scenario occurring as development time became less critical.


Originally Posted by msbbarratt (Post 8989650)
We can't even keep a GCU running for 249 days.

Er, I'd argue not only that we can, but also that we just did - by applying standardised software reliability metrics and techniques that we've been developing and perfecting for decades.

As another software person, I'm also well aware of the limitations you're talking about - but we're not talking about the same kind of inherently dynamic logic required for a "self-driving" car or a fully-automated aircraft here, we're talking about bog-standard systems monitoring logic behaviour in scenarios which are extremely unlikely to occur in the real world.


Originally Posted by EEngr (Post 8990217)
What we need are sound software development processes that catch these simple kinds of mistakes and get them fixed.
...
Who has the clout to hold Boeing's feet to the fire to adopt such processes?

Again, I'd argue that the very fact we're discussing this now means that Boeing (and/or their subcontractors) already have those processes in place. We're not talking about a glaring software mistake that slipped through the cracks, it's far more likely to be a missed edge-case in the specification - and the reason it wasn't covered until now is precisely because we're talking about an extremely unlikely real-world scenario. As I said above, we're in the realms of a hypothetical scenario in which the aircraft has not been powered down ("cold and dark") for *eight months*. Furthermore that each of the power units were brought online around the same time and all of them were kept running for the entirety of those eight months.

As you quite rightly state, modern aircraft systems are incredibly complex these days, and it's therefore much more sensible to focus testing on the most likely scenarios first and then adding layers of testing for less likely scenarios as the development and lifecycle of the product continues.

Don't get me wrong, this was undoubtedly an "oops" - I'm sure that several people who worked on these systems are now a little wiser and will swear to be more thorough in their work for the rest of their lives. Nevertheless, it's important that we all try to retain a little bit of perspective!

[I'd also be willing to bet money that this would barely have troubled the media had a few journalists not been fishing for B787 issues i nthe wake of the battery problems...]


All times are GMT. The time now is 15:26.


Copyright © 2026 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.