Problem-solving and particularly the part that focuses on root cause analysis (RCA) has always been one of the topics that has had my special interest. I have coached many problem-solving teams, and besides the sometimes superficial use of problem-solving tools, there have always been two questions that have slumbered in my head, viz. (1) whether you could speak of one root cause, or that you should speak of multiple (root) causes; and (2) whether you should speak of the root cause or rather the root condition. Based upon the way my mentors have trained and coached me in problem-solving, as well as a lot of self-education and practice, I will try to explain how rigorous problem-solving logic (using an example) can help us answer these questions. At the same time, I hope the example and the logic will be of use in your problem-solving efforts or your coaching thereof.
Problem-Solving Logic
Problem-solving and particularly the part that focuses on root cause analysis (RCA) has always been one of the topics that has had my special interest. I have coached many problem-solving teams, and besides the sometimes superficial use of problem-solving tools, there have always been two questions that have slumbered in my head, viz.:
- whether you could speak of one root cause, or that you should speak of multiple (root) causes; and
- whether you should speak of the root cause or rather the root condition.
I started to explore these questions with the idea to turn this into a blog post. Ultimately it has become a series of blog posts rather than one. So, reader beware: this will be a long read although I tried to chop the story into more readable chunks. But to grasp the whole thing, we will need to go through some basic things first, so that we can develop a common language. As you are most probably aware: “there’s no instant pudding”.
Based upon the way my mentors have trained and coached me in problem-solving, as well as a lot of self-education and practice, I will try to explain how rigorous problem-solving logic (using an example) can help us answer these questions. At the same time, I hope the example and the logic will be of use in your problem-solving efforts or your coaching thereof.
In the first part, we will set the stage by defining our starting points, some basic initial concepts (like problem, cause, agent, target, event and Tripod Beta’s causal diagramming technique) and we will introduce our example that we will use throughout the series.
In the second post, we will introduce some more concepts (necessary condition, defensive and control barriers) and again apply these concepts to our example.
In the third post of the series, I will then dive into tracing back the causal event chain introducing and applying concepts like causing events, the initial causing event, and the initial active cause.
The fourth post will take us to the systemic level of our problem and I will further explore the concept of barriers and make the link to standards in Lean thinking. At this point in the logic, I will also introduce the problem of occurrence and the problem of non-detection and how this can help us in finding the root of the problem.
In the fifth post then, I will again dive into the causal event chain, but now at the systemic level. And I will discuss the problem of people not adhering to the standard.
Finally, in the sixth and final blog post, I will summarize the analysis, say something about prioritizing counter-measures and conclude on the two questions. I will end the series with some words on how this can help you in your problem-solving efforts and your coaching of problem-solving teams.
So, off we go!
Improving the System of Work
In Lean thinking, we try to continually improve our system of work, through the improvement of our standards by which we work. And we do so by solving problems. Solving problems in Lean means proposing, validating, implementing and ultimately standardizing counter-measures that eliminate the causes of our problems. Linking this to our question in this blog post, this implies that root causes of problems should relate to our system of work, materialized by our standards. Additionally, in Lean we do not seek to blame the individual person for problems. This stems from the Respect for Humanity principle in Lean. So, we focus on the system-related causes for problems.
Logical Thinking
I still remember that in one of the areas at one of my previous employers where we did report outs on our structured problem-solving efforts, there was a huge banner, saying “LOGICAL THINKING”. I still consider this to be at the core of rigorous and effective problem-solving. Therefore, I will introduce some typical logic and related concepts that are (or: should be) used in root cause analysis (RCA).
Furthermore, I will make use of some of the concepts used in the TriPod Beta incident investigation method. Tripod Beta is an incident and accident analysis methodology made available by the Tripod Foundation via the Energy Institute in the UK. It was originally developed by Shell International Exploration and Production B.V. as the result of Shell-funded academic research in the 1980s and 1990s. You can find more on Tripod Beta at the Energy Institute’s website at http://publishing.energyinst.org/tripod.
I introduce an example here that I will use throughout this series of posts to illustrate the concepts and the logic in problem analysis.
In our example, we are asked to investigate a problem where a person fell of a vehicle, hit a hard floor with the head and subsequently suffered a head injury.
The Cause of a Problem
Let’s start with the basics first.
A problem: I define a problem an unwanted or undesirable situation (i.e., a situation where the actual state differs from the desirable state, in Lean often described by the standard). A problem comes into existence through a specific event (when and where a change takes place).
More on this view on a problem you can read in my earlier blog post on the two problem types (https://dumontis.com/2014/11/two-problem-types/). These two types are also sometimes referred to as the gap-type (“what happened” in a specific case, type I) or the setting type (“what is in the way” to get to a new situation, type II). In this post, I focus on type I, gap-type of problems.
To cause: to cause as a verb means to make something (especially something bad, unwanted or undesirable) happen. It is an act that produces, brings about or gives rise to a change, an event where something becomes different, introducing the problem.
In this act or event, TriPod Beta distinguishes between a target (or object), being a person or thing, that was changed and to which the problem is related, and an agent (or hazard), being the thing that acted upon or changed the target. This change brings about the problem in an incident, which is the event where the agent and target come together in space and time. The incident is an unplanned and unwanted happening involving the release of the agent to the target or the exposure of the target to the agent. It leads to the target being changed into an undesirable state (the problem).
To distinguish between all kinds of events that take place before, after and at the same time, but that did not change the target, and the event in which the target was changed giving rise to the problem, we will call this latter one the causing event. When there is a sequence of events, we call the final causing event that we are investigating the prime event.
From a grammatical point of view, in the active voice, the subject and verb relationship is straightforward: the subject is the do-er. The actor or agent is the subject in the active voice and the target is the object. In the passive voice, however, subject and object change places in the grammatical sense. So, then suddenly the agent becomes the object and the target the subject. I therefore strongly discourage the semantic use of the word object for the target, as the grammatical use of the word object does not always coincide with the semantic use of the same word. This can be confusing.
Let’s try to dissect our case, using the concepts we introduced so far. The undesirable state (and problem) clearly is the person with the head injury. As it is the person that was changed, it is the person that is the target. The change was brought about by the floor, being the agent in this case. But although stating that the floor (the agent and subject from a grammar point of view) injured the person (the target and grammatically seen the object) using the active voice may be correct logically, it doesn’t sound right in normal conversation as the floor is not typically seen as active. When using the passive voice in normal conversation, we would say that the person (the target and grammatically now the subject) was injured by the hard floor (the agent and now the object).
To summarize, the hard floor was the agent and the person the target in the prime causing event of the person being injured by the hard floor, that led to the head injury problem.
Now let’s turn ourselves to the important concept of a cause. Normally, the cause of a change is said to be the agent, actor, and subject of the active voice.
In our case, the agent is the hard floor. It was the hard floor that actually brought the head injury into existence. At the same time, intuitively we would hardly call the hard floor the cause. The hard floor is something that was also there before and after the causing event, and also in other areas. It is hardly specific to the problem at hand. It is a state or condition, and not really active as we have already seen.
The target, however, was active and exposed itself to the hard floor as a result of a fall. Furthermore, the person falling can be seen as a problem as well, as it is an undesirable state. In this case, therefore, we would rather call the actively falling person the cause.
So, a cause can therefore can be defined as the active agent or target in the (prime) causing event, whereby the active agent or target in itself represents a problem as well (preceding the subsequent (prime) causing event in time and space, and in itself again brought into existence via a preceding causing event). Therefore, to define a cause as the agent in the causing event, in my opinion, is incorrect. It can be either an agent or a target.
To further clarify things, I also suggest adding the adjectives active and passive to agent and target. A causing event can then be said to be caused by an active agent (and a passive target), an active target (and a passive agent), or an agent and a target that are both active.
Building upon and at the same time somewhat extending TriPod Beta’s use of symbols in visualizing the way a problem came into being (a so-called causal diagram), our example looks like this:
Figure: the hard floor (the agent) acts upon the falling person’s head (the object) when the person’s head hits the floor (the prime event), giving rise to the head injury (the problem).
Next Post: Continuing the Logic
In this first post, I have set the stage by defining the starting points and some basic initial concepts (like problem, cause, agent, target, event and Tripod Beta’s causal diagramming technique). I have also introduced an example that I will use throughout the series. In the second post of this series, I will continue to introduce some more concepts (necessary condition, defensive and control barriers) and will again apply these concepts to the example.
Hi Rob,
Great to hear & keep learning from you!!!
All the very BEST!!!
Renato Sanctis