The Root Cause is the factor, or factors, that when eliminated or changed, will prevent the recurrence of a given or like problem.
Root Cause Analysis (RCA) is a technique to identify, document and implement cost-effective solutions to eliminate the actual cause of a problem, not just the symptoms, and prevent recurrence of the same or like problem.
Why is Root Cause Analysis (RCA) Important?
- Improve customer satisfaction
- Improve our processes/procedures
- Prevent recurrence of problems
- Make the best use of people and resources
When is Root Cause Analysis (RCA) performed?
Perform the analysis as soon as possible after service is restored to facilitate documentation of all pertinent details:
- A complete description of the business impact
- Technical details, including error codes, etc.
- Timeline of events, including actions taken and the responsible personnel
Problem Cause vs Root Cause Analysis (RCA)
A Problem Cause Investigation is required for all problems and failed changes. It is the description of how the problem was resolved and should be documented in the Resolution text of the problem record (or change record). Documentation should include:
- Cause(s) of a problem
- Contributing factor(s)
- Description of how service was restored
- Description of resolution of the problem
Root Cause Analysis (RCA) is determining the true cause of the problem, as opposed to the symptoms which appear as a result of the problem. This is a more in-depth analysis that is performed in a separate RCA record. Documentation in the RCA record should include at least the following:
- Root cause(s)
- Contributing factor(s) and symptoms
- References to incident(s) and problem(s
- Description of permanent fix and future prevention
A Root Cause Analysis (ie. “formal RCA”) is to be documented:
- Major Incidents (Priority 1 and 2)
- A special request based on: Examples – Trend analysis, Service Level reviews, or Problem analysis requested by the business
- Failed changes causing an unplanned outage
- Any Unauthorized changes
The following are the guiding principles when implementing a Root Cause Analysis Methodology:
Define the Problem
- A problem is any deviation from an expected norm. That is, any event resulting in a loss or potential loss of the availability or performance of a managed IT resource or its supporting environment. (This includes errors related to systems, networks, hardware, software, and applications. This can also include problems identified during the implementation of (failed) changes. The recognition of problems can come from any point in the environment and can be identified using a variety of automated and non-automated methods.)
- Clearly identify and describe the problem. (What was affected (resource name)?, What was the impact (i.e. system down)?, Who was impacted?, When did the problem happen?, How long did the problem last?) (1- Need to focus your RCA and preventative actions on specific issues. 2- Need to insure focus is not on just solving the symptoms, but getting to the actual root of the problem. 3- During the root cause analysis multiple problems may appear that should be addressed separately but captured as action items. 4- Restate the problem and resolution in the RCA document to ensure everyone understands what the issue was when it occurred. )
List Presumptive Causes
- Presumptive causes are identified at the beginning of the investigation. They are the initial suppositions or thoughts on the root cause of the problem. Thorough root cause analysis may later show they are only symptoms or contributing factors. (Example of presumptive cause: When the cable modem connection is plugged into a workstation, the green light does not appear. The cable is plugged into another the and it works, so the cable can be ruled out. It appears the problem is the network adaptor card which is a hardware issue. However further discussion leads to discover the device driver had been changed when diagnosing a faulty router. Once the setting was revised to the original device driver, the light appeared and connectivity was obtained. The device driver was determined to be the root cause. The network adaptor and cable were presumptive causes.)
5-WHY Decomposition Methodology
- The ‘Five Whys’ is the simplest method for root cause analysis. Take each presumptive cause and ask ‘why’ continuously until you exhaust that line of questioning. (Five Whys is the most commonly used method for determining root cause. The root cause is usually found by the fifth ‘why’ but can take more or less iterations, depending on the problem.)
- Note: If there are multiple presumptive causes, you should complete the ‘five whys’ for each one.
- Identify Root Cause(s) and Contributing Factor(s)
- Root Cause(s), if eliminated or changed, will prevent the recurrence of a given or like problem. Contributing factor(s) alone would not have caused the problem but are important enough to need corrective action to improve the quality of the process. They could also be items that made problem determination or recovery more difficult.
- For each item identified during the ‘five whys’ decomposition, Ask these questions to determine whether the Root Cause has been discovered or just a Contributing Factor:
- If this item is fixed, will it prevent the problem from recurring? If Yes, then Root Cause Yes and Contributing Factor No.
- Did this item delay restoration of service or recovery? If No, then Root Cause No and Contributing Factor Yes.
- Did this item delay problem determination? Or make it more difficult? If No, then Root Cause No and Contributing Factor Yes.
Identify Action Plans
- What must be changed to prevent the recurrence of each root cause and contributing factor? (Hardware, Software, Procedures, Resources).
- Circumvention – Immediate action taken to restore function.
- Future Prevention – Additional actions are required to insure the problem does not recur. May require Change Management.
- Action plans are used for future prevention. It is possible that the immediate action (circumvention) might be the only action for future prevention, however, if it is deemed a temporary solution and/or other items need to be addressed to eliminate contributing factors, an action plan should be put in place.
- Each action item should contain: Description of the activity, Target completion date, Person responsible for implementation.
Communicate Lessons Learned
- RCA meetings, Team meetings, Outlook, Team newsletters, Updated procedures and checklists, Data Repositories.
- Use a quantitative approach to capture and assess the impact.
- Use a qualitative approach to document the details of the impact.
Summary: Finding the Root Cause will prevent the recurrence of a given or like problem. Identifying and implementing good action plans results in improved customer satisfaction, availability, and maximizes productivity. Root Cause Analysis is not the assignment of blame; it helps ensure continuous improvement
The Five Whys, Is a technique that can help with solving problems, Provides us with a structure we can use to understand the relationships between possible causes of the problem, Gives us a framework for planning what data to collect, and Serves as a visual display.