Consider The Common Cause Method

2015 • Maintenance • Preventive Maintenance

Consider The Common Cause Method

EP Editorial Staff | November 16, 2015

What appear to be stand-alone issues in different areas of a plant may actually have some things in common. Rooting out those factors is the only way to prevent such problems from recurring.

By Randall Noon, P.E.

Common-cause analysis is a relatively straightforward root-cause method. Sometimes referred to as “determining the cause of causes,” it’s used to identify an often latent, overarching causal factor creating problems in various places or departments within an organization. Occasionally, what appears to be a stand-alone problem in one area results from the same factor that is creating what appear to be stand-alone problems in another area (or areas).

Logically, the fact that the same basic problem is occurring in different departments is a significant clue that the culpable causal factor is something shared or otherwise common to the affected departments. This is akin to two strangers becoming similarly sick, one in the morning and one in the afternoon, by drinking from the same contaminated well. A doctor concludes each has ingested a high amount of unfavorable Escherichia coli. Unfortunately, while this diagnosis is accurate in both cases, people will continue to become ill until the common cause—the contaminated well—is identified and addressed.

Two organizational characteristics frequently prevent ready identification of a common causal factor:

The tendency of a department’s administration to solve its own problems and resist outside help. This might be due to pride, e.g., “we solve our own problems in this department, thank you very much.” Or it might be related to a perception of competency, as in “real managers don’t need help to put their own houses in order.”
The tendency of a department’s administration to mind its own business. This might be due to nothing more than lack of familiarity with another department’s day-to-day problems. Or it could be associated with a perception by some managers that another department head might be meddling or encroaching upon their territories.

In any case, the result is the same. The problem is solved as best as possible within the administrative envelope and resource limitations of the affected department. If, however, the causal factor originates outside the department’s envelope, only the symptoms occurring within the affected area are addressed. The fundamental deficiency originating outside the department’s envelope remains in place. Alas, much like a contaminated well, the common cause continues to create problems.

Not-too-hypothetical example

The following example of a problem solved by common-cause analysis demonstrates how the method can be applied.

Tom, Rick, and Sherry make up a three-person department tasked with writing procedures, instructions, and work orders for the entire company. In the past year, there were several instances in different areas of the operation where infrequently performed work wasn’t done correctly. These situations resulted in significant rework and, on occasion, regulatory problems.

Each affected department investigated its own problem and determined that the cause was a type of human-performance error. One department blamed non-compliance of workers with the given procedure. Another attributed its issue to a lack of self-checking by personnel. A third department questioned the competency of the supervisor in charge of the problematic work—which led to the inclusion of a career-limiting letter in his file for not providing sufficient oversight.

In the sports world, when a basketball goes out of bounds during play, the referee establishes fault based on which team touched it last. A similar rule appears to have been applied in this workplace example: In each case, the last person to touch the work was blamed for the work “going out of bounds.” While this is a simple way to determine fault, it generally doesn’t prevent recurrence of the problem.

Consequently, after operations in this not-too-hypothetical example experienced a larger than expected number of “human-performance errors” over the course of a year, a third party in the company undertook a common-cause analysis. Individual causal analysis reports that documented each department’s investigation—15 in all—were gathered and reviewed as a group to determine any commonalities. The first-cut review of these reports identified the following common factors:

In 6 of 15 cases, electricians performed the work.
In 8 of 15 cases, mechanics performed the work.
In 1 of 15 cases, a welder performed the work.
All 15 cases involved reading a document and then executing the work instructions contained in the document.
All 15 cases involved work tasks that were performed infrequently, and by crews often composed of people who had not done the work before. Thus, workers could not depend upon memory.
All 15 of the cases had supervisory oversight and all of the workers had pre-job briefings. Three supervisors were involved in all 15 cases.

In critically assessing these first-cut findings, with respect to original departmental conclusions, the third-party reviewer made the following conclusions:

— Procedure non-compliance, which was cited as the primary causal factor in several cases, is a fact—not a cause. It does not explain why the workers did not comply with the procedure. The fact that there were 15 similar instances of non-compliance among a variety of workers is incongruous with their otherwise good work records. In logic, this type of conclusion is called “affirming the antecedent.” The conclusion that the procedure wasn’t followed follows readily from the initial problem statement that the work wasn’t done correctly. No investigation was needed to figure that out. Consequently, the conclusion provided nothing useful going forward to prevent recurrence.

— A lack of self-checking, which was cited as a primary causal factor in several cases, is fundamentally an assumption. The line of thought assumes that, had the workers self-checked “properly,” mistakes would not have been made. There are two deficiencies in this reasoning:

The first is the assumption that self-checking will catch errors. More often than not, this is not true. Various studies indicate the probability that a person who made the error will then detect it during a self-check review is too low to be considered a substantive error-prevention method. For example, an experienced proofreader will often catch simple grammar and spelling errors that the author has made despite the author having personally self-checked the document several times. Likewise, the common spell-check feature embedded in most word-processing programs regularly demonstrates that self-checking of spelling doesn’t work very well.
The second deficiency is that finding errors based upon self-checking requires that the standard against which the check is being made is correct. For example, a person who believes that connecting a blue wire to lug A is the right thing to do, will not detect the fact that actually the red wire should be connected to lug A. He can self-check the connection many times, but as long as he believes the blue wire should be connected to lug A, the error will be repeated. A bit of further consideration notes that the conclusion of a lack of self-checking when a procedure is involved is no different than the conclusion of procedure non-compliance. This is another variant of affirming the antecedent.
Lack of oversight was cited in one case. The supervisor and workers in this situation reviewed the procedure prior to starting it and all agreed how the work should be done. The supervisor, however, was held accountable because he had been in charge of the work. A finding of accountability is not generally synonymous with a finding of cause. Further, assuming that the group of 15 events may have had a common causal factor, a lack of oversight in one case doesn’t explain any of the other 14 instances.

Importance of the written word

To determine if there was something about the written procedures, instructions, and work orders that played a role in inducing errors, the various documents involved in the events were vetted. First the authorship of each document was determined. While none of the 15 reports involved documents written by Tom, two documents written by Sherry and 13 written by Rick were involved.

Since Tom, Rick, and Sherry more or less shared writing duties equally, if none of the documents were involved, it would be expected that 5 +/- x documents from each writer would be involved, where x might be 1 or 2. Since 13 is 2.6 times the expected average if the documents were not involved, this was a clear line of inquiry to follow.

An examination of the writing process found that all three writers were meticulous in following the established protocols for checking content, spelling, and punctuation. (Note: At this point, to shorten the story a bit and quickly report the findings, some of the investigative steps have been omitted.)

Eventually, it was noted that Rick was consistently writing documents that were determined to be at the 12th grade level, or higher, as measured quantitatively using the Flesch-Kincaid readability test. Sherry was writing consistently at the 9th to 10th grade level, and Tom was writing at the 5th to 7th grade level.

Moreover, Rick had another writing trait that contributed to misinterpretation. He regularly used the connective, “and/or.” The other two writers did not use the “and/or” connective. In some cases, “and/or” had the meaning of “do both A and B.” In other cases, the person doing the work could choose which action to take, as in “You can do A, or you can do B.” Therefore, the simple sentence “You can drill a dimple in the shaft for a set-screw to affix the pinion gear on the shaft and/or you can use a retaining wire around the pinion gear” could be interpreted to mean:

Drill a dimple and use a set-screw, and install the retaining wire on the pinion gear. That is, do both.
Drill a dimple and use a set-screw, but you don’t have to install the wire on the pinion gear.
Install the wire on the pinion gear, but you don’t have to drill a dimple and use a set-screw.

Thus, a hypothesis was put forward that Rick’s writing style was the common cause. Two factors apparently made Rick’s documents more prone to execution errors:

They were more difficult to read and interpret.
Use of the connective “and/or” caused confusion.

Still, a question remained: Since all three writers were required to have a second party read their work for content, why wasn’t this problem detected prior to implementation?

The answer was that Rick, an engineer, gave his work to other engineers to review. Those engineers did not notice that the reading level was at the 12th grade or higher. They were used to material written at that reading level. On the other hand, Tom and Sherry gave their work to supervisors in the electrical and mechanical departments who had come up through the ranks.

Also, the engineers that reviewed Rick’s documents could determine which alternative was indicated by the “and/or” by context, and often used that connective in their own writing. Hence, the protocol for checking documents was found to be deficient in that it didn’t require having a person from the same audience that would use the document to review it.

Given the fact that Tom and Sherry had been with the company for many years, a further test of the hypothesis would be that prior to Rick’s documents being used to do work, the number of procedure compliance problems would be quantifiably less. And, yes, prior to Rick’s work being implemented, the number of similar human-performance problems was found to be lower. Subsequently, the problem was fixed by:

Instructing Rick to write shorter, less complex sentences and to check his work against the Flesch-Kincaid readability index.
Directing Rick to discontinue using the “and/or” connective.
Revising the document review protocol to require that a member of the audience who would actually use the document review it for content, spelling, and punctuation.

Bottom-line

Keep in mind that the conclusions reached in the departmental causal factor investigations in this example did not determine why otherwise good workers were making incorrect decisions. The common-cause method, however, did uncover factors that contributed to the making of those decisions. MT

Randy Noon is a Root Cause Team leader at Nebraska’s Cooper Nuclear Station near Brownville, NE. A licensed professional engineer in the United States and Canada, this noted author and frequent contributor to Maintenance Technology has been investigating failures for more than three decades. Contact him at rknoon@nppd.com.

To learn more from Randy Noon and other industry experts regarding various problem-investigation issues, approaches, and techniques, see:

“The Scientific Method”

“The Shortest Distance Between Success and Failure”

“Finding the Root Cause Isn’t Always the Solution”

“Why Some Root-Cause Investigations Don’t Prevent Recurrence”

“MTBF > MTBM”

“Detection of Cooling Water Intrusion into Standby-Power Diesel Engines”

“Failure Analysis of Machine Shafts”

“Get to the Root Cause”