Go Back  PPRuNe Forums > Flight Deck Forums > Rumours & News
Reload this Page >

Boeing 737 Max Recertification Testing - Finally.

Rumours & News Reporting Points that may affect our jobs or lives as professional pilots. Also, items that may be of interest to professional pilots.

Boeing 737 Max Recertification Testing - Finally.

Old 25th Mar 2023, 09:51
  #1001 (permalink)  
 
Join Date: Dec 2019
Location: OnScreen
Posts: 415
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by MechEngr
.....
In hindsight it was a terrible idea to allow the AoA system to produce valid, but incorrect AoA data. In hindsight it was terrible the FAA approved that.
Sure, though that was 50+ years ago, such a concept was completely normal in those days and got inherited based with the original airframe, etc.

One of the many, many reasons, this whole B737 should never have reached the MAX version. This looks "hindsight", though a proper engineering company would understand the whole B737 design had become obsolete and should not be given another life-extension.
WideScreen is offline  
Old 25th Mar 2023, 10:04
  #1002 (permalink)  
 
Join Date: Dec 2019
Location: OnScreen
Posts: 415
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by MechEngr
....
At the time of ET-302 it wasn't a latent defect. It was clearly exposed.
No, it wasn't.

The instructions only worked, when it was immediately recognized what the issue was. After that, the airplane was doomed.

The Boeing instructions not even considered the aspect of delayed recognition of the problem.

You know, there are reasons why the B737MAX was grounded for 18+ months (and in China 2+ years):

THERE ARE NO SUITABLE INSTRUCTIONS POSSIBLE to realistically overcome the AoA vane mishap.

IIRC later on, in a flight simulator, experienced pilots did have to react within 4 seconds after an AoA vane mishap, to be able to save the aircraft.
WideScreen is offline  
Old 25th Mar 2023, 10:39
  #1003 (permalink)  
fdr
 
Join Date: Jun 2001
Location: 3rd Rock, #29B
Posts: 2,944
Received 847 Likes on 251 Posts
Originally Posted by WideScreen
No, it wasn't.

The instructions only worked, when it was immediately recognized what the issue was. After that, the airplane was doomed.

The Boeing instructions not even considered the aspect of delayed recognition of the problem.

You know, there are reasons why the B737MAX was grounded for 18+ months (and in China 2+ years):

THERE ARE NO SUITABLE INSTRUCTIONS POSSIBLE to realistically overcome the AoA vane mishap.

IIRC later on, in a flight simulator, experienced pilots did have to react within 4 seconds after an AoA vane mishap, to be able to save the aircraft.

We may need to go back to the AOM and AD to confirm exactly what the exact wording was, however, the system logic, flawed as it was, would not be unrecoverable if the crews had been given enough information to counter a runaway trim in the first instance with a manual trim input to ensure that the trim returned to a normal range, before the STAB CUTOUT SW....... CUTOUT was selected. If the first action was to go to CUTOUT before putting the aircraft into an "in trim" condition, then, yes, it was possible that the speed, trim and residual elevator authority in conjunction with the minimal torque available by the manual trim wheel for a severe out of trim case, would result in a stabiliser that would overpower the manual trim system unless the elevators were unloaded, a technique which is pretty exciting for the passengers and pilots alike to see the world big in their windows. IIRC, the AD included comments related to being in trim, but did not at any time expand on the criticality of that action, and the industry awareness of the limitation of 60's accepted trim architecture that barely had a fully compliant backup in the absence of the knowledge related to being out of trim v manual trim torque constraints.... this was an ill considered document, and had potential to result in a bad outcome, blaming the ET302 pilot for merely being a pilot and not being Tex Watson is hardly the standard of excellence that the Old Boeing, pre contamination with MDD management, "The New Boeing" the one that sacked QA managers for doing their job, that managed systems that resulted in fasteners on the B787 being a different size on the east coast and west coast... (Coriolis?) who gave the MCAS, the KC46 debacles, VOL I and VOL II, and generally messed up a proud company... yeah.

The weasel words applied did not make it easy for a crew confronted with a change that was very recent and which had not been trained or explained in depth to the flight crew.

The evidence is that of the 3 events, the one the sector prior to Lion Airs splash, and the ET302 one, where 2 of those beat the driver, and one driver set had the new instructions, it would seem to have been quite reasonably, nay, necessary to go back and sort it out at fort fumble.

All Corporate management of TBC since 1995 bears responsibility for the damage imposed to the engineering reputation of that company, and should be held accountable, their actions damaged shareholder value and there was repeated evidence that they were heading into the weeds in their myopic management practices.

fdr is offline  
Old 25th Mar 2023, 13:59
  #1004 (permalink)  
 
Join Date: Dec 2019
Location: OnScreen
Posts: 415
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by fdr
We may need to go back to the AOM and AD to confirm exactly what the exact wording was, however, the system logic, flawed as it was, would not be unrecoverable if the crews had been given enough information to counter a runaway trim in the first instance with a manual trim input to ensure that the trim returned to a normal range, before the STAB CUTOUT SW....... CUTOUT was selected. If the first action was to go to CUTOUT before putting the aircraft into an "in trim" condition, then, yes, it was possible that the speed, trim and residual elevator authority in conjunction with the minimal torque available by the manual trim wheel for a severe out of trim case, would result in a stabiliser that would overpower the manual trim system unless the elevators were unloaded, a technique which is pretty exciting for the passengers and pilots alike to see the world big in their windows. IIRC, the AD included comments related to being in trim, but did not at any time expand on the criticality of that action, and the industry awareness of the limitation of 60's accepted trim architecture that barely had a fully compliant backup in the absence of the knowledge related to being out of trim v manual trim torque constraints.... this was an ill considered document, and had potential to result in a bad outcome, blaming the ET302 pilot for merely being a pilot and not being Tex Watson is hardly the standard of excellence that the Old Boeing, pre contamination with MDD management, "The New Boeing" the one that sacked QA managers for doing their job, that managed systems that resulted in fasteners on the B787 being a different size on the east coast and west coast... (Coriolis?) who gave the MCAS, the KC46 debacles, VOL I and VOL II, and generally messed up a proud company... yeah.
Yep, it WAS recoverable, when diagnosed properly right away (IE, just upfront know what was going to happen, the yearly sim session). Diagnose fast and react fast, and the official Boeing procedure would work. Be not 100% prepared (and as such react slow), things get hairy. When the AP drops out, together with the cockpit cacophony, things get confusing right away, and it is pretty likely, an out of trim situation will not be the first priority (normally only a nuisance, though requiring attention to get a proper trim balance), to correct. And once you have 20 kg out of trim yoke force, it takes a lot of trim time to get neutral. Combine that with the picky situation, that the B737 nicely goes easily way out of trim in the other direction (Rostov-on-Don) and it is certainly understandable, that all but very experienced pilots are hesitating to keep trimming (The Lion air 2-nd attempt and maybe ET302 too).

We should not forget, a Parkinson alike trim usage would save the show, though that was left out of the Boeing documentation, maybe, because it was unknown, or just, because that would bring a significant change in the trim-computer to light and require the additional training ?

Originally Posted by fdr
The weasel words applied did not make it easy for a crew confronted with a change that was very recent and which had not been trained or explained in depth to the flight crew.

The evidence is that of the 3 events, the one the sector prior to Lion Airs splash, and the ET302 one, where 2 of those beat the driver, and one driver set had the new instructions, it would seem to have been quite reasonably, nay, necessary to go back and sort it out at fort fumble.
?

Originally Posted by fdr
All Corporate management of TBC since 1995 bears responsibility for the damage imposed to the engineering reputation of that company, and should be held accountable, their actions damaged shareholder value and there was repeated evidence that they were heading into the weeds in their myopic management practices.
Yep, how to make a company great again, get Trump style leadership !
WideScreen is offline  
Old 25th Mar 2023, 16:55
  #1005 (permalink)  
Psychophysiological entity
 
Join Date: Jun 2001
Location: Tweet Rob_Benham Famous author. Well, slightly famous.
Age: 84
Posts: 3,263
Received 24 Likes on 15 Posts
Being an old retired guy, I had the time to read in on every post in PPRuNe's threads (plural). It was a long haul and would take a month . . . or two, to summarise.

Just a few memories to ponder, in no particular order:-

I found what might be the only mention of MCAS in a South American pilot's handbook. Four or five shortened lines on a right hand page. It was found by chance. When I posted it on PPRuNe folk looked at my link, but as I recall, no one had found another reference to MCAS world-wide. (as of back then)

The Ethiopian captain might well have been more affected by the chaos having had minimal and vague explanations of a mysterious system. One thing that would be burning into his brain would have been that the other aircraft crashed. This is not just an idle response to an above post, but something I pursued at length back then. I still doubt that any instruction in that 5 months was equivalent to part of a type rating written.

"it's as though STS is working in reverse". An odd quote - more indicative of the pilot's state of mind than reasoned systems analysis.

That nine second MCAS pitch down run.

After some considerable time, Sully's quote. "That could have claimed me."

My own self-opinionated original thoughts . . . slowly weighed down by the vivid descriptions of chaotic sights and sounds. Memories of how distracting 20 mins of stick-shaker had been for me. Just the stick-shaker, everything else spot on normal. Later, I was astonished at how it had soaked into my brain.

World wide lack of awareness about the Toronto 707 hand cranking - and how close it had been to disaster. And now the 47' horizontal stabilizer has to be cranked by a wheel with a smaller radius. This is not a linear burden.

Not our members of course, but an almost world-wide lack of understanding about losing the Pickle Switch function after switching the two switches that all good pilots would have switched - and doing it in a microsecond.

For weeks on Quora I posted much what I'd learned on PPRuNe. I had to be careful, for some hours there was just me, thousands of hits world wide. Some Boeing skippers let us know how American pilots would have done it. Soon everyone and their uncle was a Boeing MAX instructor. The point of all this is the confusion. I'd take an hour to write a few lines, yet still manage to confuse someone. Good reporting certainly deserved that Pulitzer Prize.


Loose rivets is online now  
Old 26th Mar 2023, 12:24
  #1006 (permalink)  
 
Join Date: Dec 2003
Location: Tring, UK
Posts: 1,823
Received 2 Likes on 2 Posts
What struck me at the time was that there wasn’t an immediate trigger for action in the way the fault presented itself. The 737 trim is active all the time during flight; in fact it is unusual for the trim wheels to *not* be in motion for any length of time. STS, MCAS, config changes, CofG changes, etc.

The Boeing checklist trigger for trim runaway at the time was “continuous uncommanded trim motion”, which guards against an electromechanical runaway, but that wasn’t what happened - it was a software failure that only moved the trim under certain circumstances. How could you tell the difference between, say, STS doing its job and and an MCAS failure? The answer is, in the short term you couldn't, and abnormal operation appeared the same as normal operation unless you had a long diagnosis period, by which time it was too late.
FullWings is online now  
Old 26th Mar 2023, 13:07
  #1007 (permalink)  
Psychophysiological entity
 
Join Date: Jun 2001
Location: Tweet Rob_Benham Famous author. Well, slightly famous.
Age: 84
Posts: 3,263
Received 24 Likes on 15 Posts
I wish I'd written that.
Loose rivets is online now  
Old 26th Mar 2023, 18:37
  #1008 (permalink)  
 
Join Date: Oct 2019
Location: USA
Posts: 815
Received 151 Likes on 82 Posts
Originally Posted by FullWings
What struck me at the time was that there wasn’t an immediate trigger for action in the way the fault presented itself. The 737 trim is active all the time during flight; in fact it is unusual for the trim wheels to *not* be in motion for any length of time. STS, MCAS, config changes, CofG changes, etc.

The Boeing checklist trigger for trim runaway at the time was “continuous uncommanded trim motion”, which guards against an electromechanical runaway, but that wasn’t what happened - it was a software failure that only moved the trim under certain circumstances. How could you tell the difference between, say, STS doing its job and and an MCAS failure? The answer is, in the short term you couldn't, and abnormal operation appeared the same as normal operation unless you had a long diagnosis period, by which time it was too late.
STS tries to ensure that the trim load is zero. This is why the Lion Air crew reported of MCAS "STS is running backwards" because it was adding to the trim load and not making it go away. The fact that an unexpected 10, 20, 30, 40, 50 ,60 pounds of trim load was on the wheel is enough to tell there is a trim problem and using the wheel trim switch countered the trim load occurred to the first Lion Air crew and the captain of the second Lion Air crew, who apparently thought using it was obvious enough he didn't mention it to the First Officer.

How would the crew know there was an electromechanical failure? Do they rip the wiring apart looking for the short circuit before turning off the trim switches? How long is "continuous?" STS doesn't run at top speed for 30 solid seconds, which is more than enough to put 100 pounds on the wheel. Trim will stop at the upper or lower limits of travel, so by definition it cannot be "continuous." I had a recent electrical issue in my house - power would cut out and come back on - from a loose wire at the distribution transformer waving in the breeze and sometimes making a short circuit to ground. If a similar situation happened, intermittent, but interfering trim problem by wiring defect, say by chafing, or a loose bit of solder in a trim switch, would that also be a hands-up, cannot be solved situation?
MechEngr is online now  
Old 26th Mar 2023, 18:54
  #1009 (permalink)  
 
Join Date: Jun 2002
Location: Saigon SGN/VVTS
Posts: 6,625
Received 58 Likes on 42 Posts
The 737 trim is active all the time during flight; in fact it is unusual for the trim wheels to *not* be in motion for any length of time.
That was the one thing that surprised me many years ago when I had a jumpseat ride in a 200 - the almost continual noisy motion of the trim wheels.
India Four Two is online now  
Old 26th Mar 2023, 19:40
  #1010 (permalink)  
 
Join Date: Dec 2003
Location: Tring, UK
Posts: 1,823
Received 2 Likes on 2 Posts
Originally Posted by MechEngr
STS tries to ensure that the trim load is zero. This is why the Lion Air crew reported of MCAS "STS is running backwards" because it was adding to the trim load and not making it go away. The fact that an unexpected 10, 20, 30, 40, 50 ,60 pounds of trim load was on the wheel is enough to tell there is a trim problem and using the wheel trim switch countered the trim load occurred to the first Lion Air crew and the captain of the second Lion Air crew, who apparently thought using it was obvious enough he didn't mention it to the First Officer.

How would the crew know there was an electromechanical failure? Do they rip the wiring apart looking for the short circuit before turning off the trim switches? How long is "continuous?" STS doesn't run at top speed for 30 solid seconds, which is more than enough to put 100 pounds on the wheel. Trim will stop at the upper or lower limits of travel, so by definition it cannot be "continuous." I had a recent electrical issue in my house - power would cut out and come back on - from a loose wire at the distribution transformer waving in the breeze and sometimes making a short circuit to ground. If a similar situation happened, intermittent, but interfering trim problem by wiring defect, say by chafing, or a loose bit of solder in a trim switch, would that also be a hands-up, cannot be solved situation?
I am trying to point out that the initial symptoms of the MCAS failure were, for all intents and purposes, so similar to normal operation of the trim system that it wouldn’t immediately trigger a SOP disconnect of power to the stabiliser. It ran for a bit, then stopped, then did some more, and was able to be countered by use of the manual trim switches. None of that screams “runaway” until you look at it post-hoc with system knowledge that was not disseminated to line crews at the time.

A (plausible) electromechanical failure would be when nothing you can do with the normal flight deck controls can stop the trim running in a particular direction, so swift intervention is necessary before it goes to the stops. On my current type (777) you get a warning as soon as the monitoring picks this up. If you disconnect the trim every time it moves automatically, you’d do it shortly after takeoff on every flight. There is no indication to the pilots as to whether it’s MCAS, STS or even the other pilot doing the trimming, apart from the speed, and that doesn’t really help much; an intermittent fault would, again, look like normal operation until it really showed its hand.

In a critical, high workload phase of flight, near the ground, experiencing something novel that doesn’t easily categorise and requires cognition and an accurate mental systems model (not present, through no fault of the pilots) to diagnose would confuse even experienced operators. That’s why we use rule-based behaviour for Time Critical Events, such as RTO, GPWS, Windshear and Trim Runaway, but these are triggered by specific criteria which are learnt and practiced by rote because there is not time for pontification. Sadly, I think the accident crews never really got beyond the startle/react phase as there were too many audible, tactile and mental distractions to allow much in the way of a diagnostic loop to develop.
FullWings is online now  
Old 26th Mar 2023, 20:22
  #1011 (permalink)  
 
Join Date: Jan 2004
Location: Canada
Age: 63
Posts: 5,172
Received 133 Likes on 60 Posts
Sadly, I think the accident crews never really got beyond the startle/react phase as there were too many audible, tactile and mental distractions to allow much in the way of a diagnostic loop to develop.
I think this is a very important part of the issue that doesn't get enough attention. The pilot was not just dealing with a sudden change in pitch forces caused by MCAS, he was also dealing with stick shaker activation and multiple alarms. He had only seconds to correctly identify the cause, MCAS activation, before the airplane became unrecoverable. The truly tragic aspect of this is that due to Boeing butt covering they were not provided with the information they needed. I can almost see the Lion Air crash as a case where the engineering swiss cheese holes lined up without any one person being in a position to have enough information to say definitely "this needs to be fixed". However everything about the single point of failure = likely loss of control due to MCAS was known by the time of the Ethiopian accident, and yet Boeing minimized the problem because they did not want to admit any liability and pay for the cost of correcting the problem. It was only when another plane load of people died were they forced to act.

Since with Boeing it is all about the dollars, maybe they should have thought of the old adage "If you think paying for safety is expensive, try paying for the accident"
Big Pistons Forever is offline  
Old 26th Mar 2023, 20:49
  #1012 (permalink)  
 
Join Date: Jul 2013
Location: Within AM radio broadcast range of downtown Chicago
Age: 71
Posts: 815
Received 0 Likes on 0 Posts
Originally Posted by MechEngr
STS tries to ensure that the trim load is zero. This is why the Lion Air crew reported of MCAS "STS is running backwards" because it was adding to the trim load and not making it go away. The fact that an unexpected 10, 20, 30, 40, 50 ,60 pounds of trim load was on the wheel is enough to tell there is a trim problem and using the wheel trim switch countered the trim load occurred to the first Lion Air crew and the captain of the second Lion Air crew, who apparently thought using it was obvious enough he didn't mention it to the First Officer.
Or: "who for some reason didn't mention it to the F/O, possibly out of growing startle reaction, or because he thought it was obvious enough"?
Serious question, I'm not arguing that "it was obvious enough" was not the reason, rather asking whether it's a necessary inference?
WillowRun 6-3 is online now  
Old 27th Mar 2023, 00:29
  #1013 (permalink)  
Psychophysiological entity
 
Join Date: Jun 2001
Location: Tweet Rob_Benham Famous author. Well, slightly famous.
Age: 84
Posts: 3,263
Received 24 Likes on 15 Posts
Handing over to the FO to free up a bit of brain-load and he doesn't do the one thing - use the Pickle Switches - that might well have given the clue how to save the aircraft.


Originally Posted by MechEngr View Post
STS tries to ensure that the trim load is zero. This is why the Lion Air crew reported of MCAS "STS is running backwards" because it was adding to the trim load and not making it go away.
An important point. Would the captain be referring to the load while hand flying, or just the spin direction of the manual trim wheel? My highlight.

PPRuNe 2nd Nov 2018
. . . The purpose of the STS is to return the airplane to a trimmed speed by commanding the stabilizer in a direction opposite the speed change . . .
The speed of course was a bit of electronic guesswork by then.
Loose rivets is online now  
Old 27th Mar 2023, 03:10
  #1014 (permalink)  
 
Join Date: Mar 2005
Location: N/A
Posts: 5,882
Received 362 Likes on 192 Posts
FullWings, your post #1010 is the most succinct explanation of the state of affairs the crew faced that I've seen.
megan is offline  
Old 27th Mar 2023, 05:50
  #1015 (permalink)  
 
Join Date: Oct 2019
Location: USA
Posts: 815
Received 151 Likes on 82 Posts
Loose rivets, OK - been reading a lot more.

The speed target for STS appears to be either a previously set speed or the speed the plane was going when a pilot last let off the trim switch - sounds like an action similar to cruise control in a car. Lock it in at a set speed, but if I trim that speed up or down, the cruise control uses that new speed. However, it's smarter in that one of the problems is needing to handle the undamped phugoid which it does by reacting more quickly than the natural oscillation of the plane.

From the 737 page Flight Controls :
Speed trim is applied to the stabilizer automatically at low speed, low weight, aft C of G and high thrust. Sometimes you may notice that the speed trim is trimming in the opposite direction to you, this is because the speed trim is trying to trim the stabilizer in the direction calculated to provide the pilot with positive speed stability characteristics. The speed trim system adjusts stick force so the pilot must provide significant amount of pull force to reduce airspeed or a significant amount of push force to increase airspeed. Whereas, pilots are typically trying to trim the stick force to zero. Occasionally these may be in opposition.
Per B-737 Speed Trim System
By the sounds of everything, the Cessna 172 behaves the same way: When you get off the trim speed, a stick force develops. The STS only increases this stick force because otherwise it's too weak to meet certification.
From that thread - it was to solve the problem that at aft CG and high thrust there isn't enough trim reaction force to meet the minimum gradient of 3 pounds per 1 Degree AoA change. If the CG was at the Center of Pressure no stick force is required for any AoA change - hence this moves the trim opposite to the pilot input as the CG approaches that (hopefully unreached) condition. That is, if the pilot pulls back to slow the plane the STS supplies nose down trim to encourage the pilot to speed it back up.

---
It appears the effect of STS should have been to push the nose up as the plane accelerated and MCAS was pushing the nose down; the opposite. While STS doesn't move to relieve trim loads, it moves to reset the speed to where the trim load is zero unless the pilot is pulling or pushing.
MechEngr is online now  
Old 27th Mar 2023, 05:55
  #1016 (permalink)  
 
Join Date: Dec 2019
Location: OnScreen
Posts: 415
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by FullWings
.....
Sadly, I think the accident crews never really got beyond the startle/react phase as there were too many audible, tactile and mental distractions to allow much in the way of a diagnostic loop to develop.
It's not only the overloading cockpit cacophony, but also the conflicting alarms in itself (IE over speed and stall warning at the same time, and probably some more), as well, the AP dropping out and the friendly "you have control" to the pilot(s). All at the same time. Go figure.

And with the stick-shaker shaking your teeth out, there is little muscle tension monitoring capacity left, to determine, whether the aircraft is out of trim, until the yoke forces get in the order of magnitude of the stick-shaker forces. This happens in seconds, so yeah, before you realize it, the yoke force gets immense and the whole beyond recovery.
WideScreen is offline  
Old 27th Mar 2023, 05:59
  #1017 (permalink)  
 
Join Date: Dec 2019
Location: OnScreen
Posts: 415
Likes: 0
Received 0 Likes on 0 Posts
Originally Posted by MechEngr
.....
---
It appears the effect of STS should have been to push the nose up as the plane accelerated and MCAS was pushing the nose down;
......
I have my serious doubts about that. MCAS overrules STS, otherwise the MCAS would never have the opportunity to do its intended work (MCAS as well as MCAS-2).
WideScreen is offline  
Old 27th Mar 2023, 07:49
  #1018 (permalink)  
 
Join Date: Oct 2019
Location: USA
Posts: 815
Received 151 Likes on 82 Posts
I didn't say it did. Perhaps I need more words. Let me clarify to unwind your concern.

The effect expected by the pilot from STS was to push the nose up and, instead, MCAS pushed the nose down, appearing to the pilot that it was operating opposite which would have been reason for reporting it that way to maintenance.
MechEngr is online now  
Old 27th Mar 2023, 13:27
  #1019 (permalink)  
Psychophysiological entity
 
Join Date: Jun 2001
Location: Tweet Rob_Benham Famous author. Well, slightly famous.
Age: 84
Posts: 3,263
Received 24 Likes on 15 Posts
I went back to the 2nd Nov 18 and read ManaAdaSystem with more care. His paste is from an FCOM? Must be right, surely? Gasp! This is exactly the kind of answer chatCPT churns out when it runs out of specific knowledge. (chatCPT can become the wandering mind of infant artificial intelligence.)

The thread goes on with good blokes trying to make head or tail of it. I'll come back tonight when I've had a drink.

B-737 Speed Trim System
Loose rivets is online now  
Old 27th Mar 2023, 16:31
  #1020 (permalink)  
 
Join Date: Jul 2003
Location: An Island Province
Posts: 1,257
Likes: 0
Received 1 Like on 1 Post
"Systems are designed and constructed from components that are expected to fail.
As the complexity of a system increases, the accuracy of any single agent's (person's) own model of that system decreases rapidly.
"

A quote from a report on coping with complexity in IT malfunctions. Many similarities with operator and design issues as the Max, except for the timescales and number of people involved.

Other 'cherry picked' quotes; read the full report for context.
  • Each anomaly arose from unanticipated, unappreciated interactions between system components.
  • There was no 'root' cause. Instead, the anomalies arose from multiple latent factors that combined to generate a vulnerability.
  • The vulnerabilities themselves were present for weeks or months before they played a part in the evolution of an anomaly.
  • The events involved both external software/hardware
  • The vulnerabilities were activated by specific events, conditions, or situations.
  • The activators were minor events, near-nominal operating conditions, or only slightly off-normal situations.

Surprise
In all cases, the participants experienced surprise. … mainly discoveries of previously unappreciated dependencies that generated the anomaly or obstructed its resolution or both. The fact that experts can be surprised in this way is evidence of systemic complexity and also of operational variety.
A common experience was "I didn't know that it worked this way." People are surprised when they find out that their own mental model of The System doesn't match the behavior of the system.

More rarely a surprise produces astonishment, a sense that the world has changed or is unrecognizable in an important way. This is sometimes called fundamental surprise … four characteristics of fundamental surprise that make it different from situational surprise:


1. situational surprise is compatible with previous beliefs about ‘how things work’; fundamental surprise refutes basic beliefs;
2. it is possible to anticipate situational surprise; fundamental surprise cannot be anticipated;
3. situational surprise can be averted by tuning warning systems; fundamental surprise challenges models that produced success in the past;
4. learning from situational surprise closes quickly; learning from fundamental surprise requires model revision and changes that reverberate.

This adjustment of the understanding of what the system was and how it worked was important to both immediate anomaly management and how post-anomaly system repairs add to the ongoing processes of change.
Uncertainty and escalating consequences combine to turn the operational setting into a pressure cooker and workshop participants agreed that such situations are stressful in ways that can promote significant risk taking
.

Reread the surprise section with alternative viewpoints; operators were surprised, manufacturer, regulator, self; which types of surprise.
Pprune - surprise; a forum for ill considered post-mortems.

Experts are typically much better at solving problems than at describing accurately how problems are solved. Eliciting expertise usually depends on tracing how experts solve problems. … experts demonstrated their ability to use their incomplete, fragmented models of the system as starting points for exploration and to quickly revise and expand their models during the anomaly response in order to understand the anomaly and develop and assess possible solutions.

… focused on hypothesis generation.
[ not seeking to follow SOPs existent or not ] These efforts were sweeping looks across the environment looking for cues. This behavior is consistent with recognition primed decision making.

organizations which design systems... are constrained to produce designs which are copies of the communication structures of these organizations.

The alerts draw attention but they are usually not in themselves, diagnostic. Instead, alerts trigger a complex process of exploration and investigation that allows the responders to build a provisional understanding of the source(s) of the anomalous behavior that generated the alert.


It is unanticipated problems that tend to be the most vexing and difficult to manage.… unappreciated, subtle interactions between tenuously connected, distant parts of the system.

Don't overlook the end sections; how much dark debt is the industry carrying. An ever increasing amount due to automation and operational complexity, yet constant limited human performance.

"dark debt"; vulnerability was not recognized or recognizable until the anomaly revealed it. … found in complex systems and the anomalies it generates are complex system failures

Dark debt is not recognizable at the time of creation. … it is a product of complexity, adding complexity is unavoidable as systems change
.

Ref https://snafucatchers.github.io
,

Last edited by alf5071h; 27th Mar 2023 at 16:42.
alf5071h is offline  

Thread Tools
Search this Thread

Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.