Root cause analysis
Root Cause Analysis (RCA) — the umbrella discipline of structured techniques (5 Whys, Ishikawa fishbone, Fault Tree Analysis, Failure Modes and Effects Analysis, Apollo, Pareto, change analysis, barrier analysis, human-performance investigation) for identifying the underlying cause(s) of an observed problem so that corrective and preventive action can prevent recurrence. Required in substance by every regulated framework: 21 CFR 820.100 / QMSR §4 (devices), 21 CFR 211 / ICH Q10 (pharma), ISO 9001 §10.2 (quality generally), ISO 13485 §8.5.2 / §8.5.3 (devices), MDSAP Chapter 4 (CAPA). The single most-cited area in FDA 483s.
01What RCA actually is
Root Cause Analysis is the umbrella discipline of structured techniques used to identify the underlying cause(s) of an observed problem — a non-conformance, a deviation, an out-of-specification result, a customer complaint, an adverse event, a near miss, a process capability drift — so that corrective and preventive action can prevent recurrence rather than merely suppress the symptom. RCA is not a single technique; it is a portfolio of techniques (5 Whys, Ishikawa fishbone, Fault Tree Analysis, Failure Modes and Effects Analysis, Apollo, Pareto, change analysis, barrier analysis, human-performance investigation), each suited to particular problem types, and a methodology for selecting, applying, validating and documenting them rigorously enough to support a defensible CAPA.
Every regulated framework requires RCA in substance, even where the literal phrase is not used. FDA 21 CFR 820.100 requires procedures for 'analysing processes, work operations... and other sources of quality data to identify existing and potential causes of non-conforming product or other quality problems'. QMSR (effective 2 Feb 2026) preserves and clarifies the obligation through ISO 13485 incorporation. ICH Q10 §3 makes CAPA a pharmaceutical-quality-system element with explicit reference to 'a structured approach to investigation' and 'root cause determination'. ISO 9001 §10.2 requires evaluation of the need for action to eliminate the causes of non-conformities. ISO 13485 §8.5.2 / §8.5.3 mirror the device-specific obligations. MDSAP Chapter 4 (CAPA) is the most-sampled MDSAP chapter and the most-cited source of grade-3+ findings.
02Why RCA is the single most-cited inspection area
CAPA has been the single most-cited area in FDA Form 483 inspections of medical-device manufacturers every year for over two decades. The substantive criticism is consistent across the decades: 'corrective action does not adequately address the underlying cause', 'root-cause analysis is inadequate', 'preventive action is not extended to other products that share the same cause', 'effectiveness verification does not confirm the underlying cause has been addressed', 'repeat occurrences of similar non-conformances'. MDSAP findings catalogue similar patterns under Chapter 4. Notified Bodies cite the same under EU MDR Annex IX QMS audits. Pharmaceutical 483 patterns under 21 CFR 211 (especially 211.192 production-record review and 211.100 written procedures) carry equivalent themes.
The reason is structural. RCA is hard — it requires the investigator to suppress the natural human tendency to converge on a cause-that-makes-sense before enumerating alternatives, to look beyond the operator on the line, to follow evidence rather than intuition, to widen the scope from the specific event to the systemic factor that allowed it, and to design effectiveness verification that would actually detect failure of the fix. The shortcut — pattern-match against a familiar template, stop at training gap, retrain the operator, close the CAPA — is fast and seductive and exactly what regulators have been documenting for thirty years.
03The technique portfolio — which one to use when
| Technique | Best for | Strength | Weakness |
|---|---|---|---|
| 5 Whys | Routine moderate-complexity events; one or a few active branches; clear evidence chain available. | Cheap, fast, accessible, well-understood, suitable for daily front-line use. | Single-thread bias; misses multi-cause chains; weak without fishbone pairing; misused stops at operator error. |
| Ishikawa fishbone (cause-and-effect) | Enumerating categories of cause before convergence; structured team session. | Forces enumeration; team-friendly; prevents 5-Whys single-thread trap; surfaces unconsidered branches. | Output is enumeration not conclusion; needs a follow-up technique (5 Whys per branch) for convergence. |
| Fault Tree Analysis (FTA) | High-severity, multi-cause events; safety-critical systems; quantifiable failure probabilities. | Rigorously handles AND/OR combinations; supports quantitative analysis; recognised in IAEA, NUREG, IEC 61025. | Heavy; needs trained facilitator; over-engineered for routine problems. |
| Failure Modes and Effects Analysis (FMEA) | Proactive enumeration of failure modes in design (DFMEA), process (PFMEA), or monitoring (FMEA-MSR per AIAG/VDA 2019). | Comprehensive; risk-prioritised; pre-event; integrates with ISO 14971 device risk management. | Proactive only; not a reactive RCA technique on its own but feeds the candidate-cause list for one. |
| Apollo Root Cause Analysis (ARCA) | Complex events; safety-critical; high-stakes investigations; multi-organisation incidents. | Cause-and-effect map with action / condition pairs; rigorous evidence requirement; cross-disciplinary. | Training-intensive; software-licensed; heavy for routine problems. |
| Pareto analysis | Identifying which causes drive most occurrences; selecting where to invest preventive action. | Quantitative; data-driven; 80/20 prioritisation; pairs with histograms and run charts. | Requires sufficient data volume; not a technique for single-event RCA. |
| Change analysis (Kepner-Tregoe IS/IS-NOT) | Events that occurred after a recent change; comparing what is vs what is not affected. | Surfaces the differentiator; ideal for post-change deviations; structured team format. | Limited utility when no recent change is identifiable. |
| Barrier analysis | Safety / hazard events where defences failed; healthcare event analysis. | Identifies which barriers existed, which failed, which were missing; supports defence-in-depth design. | Suited to safety / hazard contexts; less applicable to routine deviations. |
| Human Performance Investigation (HPI / HOP) | Events with significant human-action contribution; learning-team investigations. | Treats human action as a symptom of system design; reduces operator-blame bias; modern safety science. | Cultural change required; not a substitute for a structured RCA technique. |
04The RCA process — the ten-step canonical flow
- Trigger — non-conformance, deviation, OOS, complaint, adverse event, near miss, audit finding, statistically significant trend. Logged within the procedural clock (typically 1 business day for opening the investigation).
- Containment / immediate action — protect against further harm or escalation while the investigation runs. Quarantine affected lots, hold downstream batches, suspend the line, isolate the equipment, freeze the released product. Documented as a discrete action with its own approval and effectiveness check.
- Problem statement — quantified, specific, time-bounded. What happened, when, where, on what product, against what specification, observed by whom, with what immediate impact. The single most under-invested step.
- Investigation team — the operator who saw the event, the supervisor on shift, the engineer who owns the process, the QA reviewer, plus subject-matter expertise (microbiology / analytical / software / supplier-quality depending on the problem). Operator presence is non-negotiable.
- Evidence gathering — audit trail (Part 11 / Annex 11 compliant), batch record / DHR, equipment logs, environmental data, training records, change-control history, prior CAPAs on the same product / process / equipment / supplier, supplier-quality data, complaints data, internal-audit findings. The evidence list is the foundation of every later step.
- Technique selection — based on severity, complexity and recurrence (see table above). Documented with rationale. For routine problems, fishbone + 5 Whys is the default; for safety-critical or recurrent problems, escalate.
- Cause identification — apply the chosen technique. Validate each step against evidence. Branch where multiple chains are credible. Converge on one or more root causes only when supported.
- Cross-check — independent peer review, second technique, or independent reviewer not in the investigation team. The single biggest defence against the technique-of-choice-bias.
- CAPA design — corrective action addresses the immediate cause for the affected event; preventive action addresses the systemic cause across all affected products / lines / sites; scope-assessment documented; both with owners, due dates and approval.
- Effectiveness verification — observable, measurable, time-bound. The plan is part of the CAPA approval, not an afterthought. Typical window: 3-12 months post-implementation; trigger: predefined metric or audit observation; closure: explicit pass/fail with documented justification.
05Evidence rigour — the inspectable foundation
Inspectors sampling a CAPA do not assess the reasoning in the abstract; they assess whether each step of the reasoning is grounded in evidence that they can themselves inspect. The audit-trail review is the most powerful single tool — a Part 11 / Annex 11 audit trail captures who did what when, which equipment was used at what state, which materials were dispensed at what mass, which procedure version was effective, which training records were current. An investigation that does not reference the audit trail at every relevant step is treated with suspicion.
Batch records (pharma / dietary supplements / food) and DHRs (devices) are the second pillar — the official record of what was done on the line, what materials went in, what tests were run, what deviations were noted, what review was performed. Equipment logs (calibration, maintenance, cleaning, qualification status) and environmental data (temperature, humidity, particulate, microbial) round out the technical evidence. Training records, change-control history, prior CAPAs on the same product / process / equipment / supplier, supplier-quality data and complaints data round out the systemic evidence.
A typical inspection follow-up question: 'show me the audit trail entry for this dispense event'; 'show me the calibration history for this scale'; 'show me the change control that updated this procedure'; 'show me the training record for this operator on the current procedure version'. If the investigation says 'the operator was trained' and the training record does not confirm currency on the procedure version effective at the time of the event, the investigation is unanchored.
06Scope assessment — the most under-done step
A root cause is rarely confined to the affected product, line or site. The scope-assessment step asks: what other products use this process? what other lines run this equipment? what other sites have the same SOP? what other suppliers have the same control gap? what other software systems use this validated component? Each generalisation tests whether the preventive action must extend further. A CAPA whose preventive action is limited to the affected product when the root cause is systemic is, in regulator language, 'inadequate'.
Documented scope assessment is itself an inspectable artefact. It should enumerate the affected scope explicitly (products, lines, sites, suppliers, equipment classes, software components), the basis for inclusion or exclusion, and the resulting preventive-action plan. Generic 'no other products affected' statements without a basis are 483 magnets.
07Effectiveness verification — closing the loop
Effectiveness verification is the single discipline that separates a CAPA programme that prevents recurrence from one that documents the past. The plan must be defined at CAPA approval, not invented at closure. It must be observable (data exists or can be collected), measurable (against a pre-defined criterion) and time-bound (with a specific post-implementation window — typically 3 / 6 / 12 months by severity).
Worked example: root cause was calibration SOP cadence inadequate for the equipment drift profile. Effectiveness verification: (a) 6-month review of all scale calibration records — zero out-of-tolerance events at the working-weight check; (b) OOS rate for products dispensed on this scale class trended against the prior 24 months — no statistically significant increase; (c) internal audit of the change-control checklist confirms the equipment-changeover impact-assessment is being executed for all subsequent equipment changes. Each criterion has a target, a data source, a measurement window and an explicit pass/fail decision. Pass closes the CAPA; fail re-opens the investigation.
08Common RCA failure modes (the 483 / Notified Body / MDSAP catalogue)
- Problem statement vague — no quantification of magnitude, frequency or scope.
- Investigation team excludes the operator — QA-only investigations inherit QA's filtered view.
- Technique not chosen — investigation is free-text narrative without structured analysis.
- Single-thread reasoning — multiple credible branches not enumerated; fishbone skipped.
- Evidence not pulled before reasoning — investigation built on memory and intuition.
- Audit trail not referenced — Part 11 / Annex 11 system in place but not used as the evidence backbone.
- Chain stops at operator error — 'operator did not follow procedure' with no further iteration.
- Root cause not validated by cross-check — one technique, one team, one conclusion.
- Corrective action does not address the identified cause — 'too expensive to fix systemically, so we'll fix locally'.
- Preventive action scope too narrow — limited to affected product when root cause is generalisable.
- Scope assessment absent — 'no other products affected' without basis.
- Effectiveness verification deferred to 'we'll see' rather than defined plan.
- Effectiveness verification declared pass without supporting data.
- Repeat occurrences not detected — closed CAPAs not tracked for recurrence on the same product / process.
- No trend analysis across CAPAs — multiple unconnected CAPAs against the same systemic cause.
- Management review does not surface CAPA-system performance.
09RCA and risk management — the bidirectional loop
For medical devices, ISO 14971 §10 (production and post-production information) requires that field data — including complaints, adverse events, service data, returns, vigilance reports — feed back into the risk-management process. A complaint that reveals an unanticipated hazard or a higher-than-anticipated probability of an existing hazard requires re-evaluation of the risk-management file. RCA is the investigation engine that produces the substantive cause statement that the risk-management file consumes. The bidirectional loop is inspectable: complaints → RCA → CAPA → RMF update (if applicable) → design / process change (if applicable) → effectiveness verification → next PMS report.
For pharma, ICH Q9 quality risk management plays an equivalent role — RCA outputs feed risk-based decisions on batch disposition, on process re-validation, on supplier re-qualification. ICH Q10 explicitly connects CAPA to knowledge management and to continual improvement.
10Metrics worth tracking
- CAPA cycle time (trigger → root cause approved → CAPA closed → effectiveness verified)
- % of investigations using a structured technique (with the technique named)
- % of investigations with operator on the team
- % of investigations attributing root cause to operator error alone
- % of investigations using multi-branch fishbone vs single-thread 5 Whys
- Scope-assessment completeness (% of CAPAs where scope extended to other products / lines / sites)
- Effectiveness verification on-time rate; pass rate; fail-and-reopen rate
- Repeat-after-CAPA rate (closed CAPAs that re-open within 12 / 24 months on same product / process / supplier / equipment)
- Cross-CAPA trend signal-detection cadence (e.g. monthly Pareto of root causes)
- External finding rate citing inadequate RCA (483 / Notified Body / MDSAP grade)
- CAPA closure on-time rate by severity tier
- Audit-trail reference rate (% of investigations citing specific audit-trail entries)
11RCA and management review
ISO 13485 §5.6 and the equivalent pharma / food QMS requirements make CAPA-system performance a mandatory management-review input. Management cannot delegate RCA discipline to QA — it must observe CAPA-system health through the metrics, identify systemic patterns (repeat root causes across CAPAs, repeat operator-error attributions, scope-assessment under-investment, effectiveness-verification fail rate), and direct resource allocation accordingly. A management review that consumes CAPA counts without the underlying root-cause patterns is itself an inspection finding.
12How V5 Ultimate runs RCA end-to-end
V5 Ultimate models RCA as a controlled workspace inside the CAPA module. On trigger (deviation, NCR, OOS, complaint, adverse event, audit finding, trend signal), the system opens an investigation record with a procedural clock, a containment-action sub-record, a problem-statement template that enforces quantification fields, and a team-builder that requires at least one operator alongside QA.
Evidence gathering is pre-staged — the system surfaces the relevant audit-trail entries, batch record / DHR sections, equipment logs, training records, change-control history and prior CAPAs on the same product / process / equipment / supplier with one click. The investigator selects evidence to attach to specific reasoning nodes; the evidence reference becomes a permanent part of the investigation record.
Technique selection is guided — severity and complexity inputs surface the recommended technique with the rationale (fishbone + 5 Whys for routine; FTA for safety-critical; Apollo for cross-organisation; change analysis when a recent change is identified). The chosen technique opens its structured workspace (branching tree for 5 Whys, category enumeration for fishbone, AND/OR gates for FTA). Cross-check is enforced — investigations cannot be approved without either a second technique applied or an independent peer-review e-signature.
CAPA design enforces scope assessment as a discrete field with an enumerated list of related products / lines / sites / suppliers / equipment classes; closing without scope assessment is blocked. Effectiveness verification plans are defined at approval with observable, measurable, time-bound criteria; the system schedules the verification, surfaces the data, requires the pass/fail decision and routes fail back to investigation re-opening. Trends across CAPAs (root-cause Pareto, repeat-after-CAPA rate, operator-error attribution rate) feed the management review automatically.
Frequently asked questions
Q.Is one RCA technique enough?+
No. Cross-checking with a second technique or by independent peer review is the single biggest defence against the technique-of-choice bias. Routine investigations typically use fishbone + 5 Whys (one for enumeration, one for convergence). High-severity or recurrent investigations should additionally use FTA, Apollo, or a second-team peer review. A single-technique investigation is a 483 risk on complex events.
Q.Can RCA stop at human error?+
Almost never. Modern Human Performance / HOP doctrine treats human action as a symptom of system design — procedures, training, equipment usability, supervision, environment, scheduling, fatigue, distraction, communication. An RCA that stops at 'operator error' has not asked why the system allowed the error; FDA, MDSAP and Notified Bodies all cite this pattern as inadequate. Acceptable closure requires demonstrated investigation of the system-of-work factors.
Q.What is the difference between proximate cause and root cause?+
Proximate cause is the immediate event in the causal chain — the specific action or condition that produced the observed problem. Root cause is the systemic factor that allowed the proximate cause to occur and that, if eliminated, prevents recurrence. Corrective action addresses the proximate cause for the affected event; preventive action addresses the root cause for all affected scope. Both are needed; collapsing them is one of the most common failure modes.
Q.How long should an investigation take?+
Procedural clocks vary by framework. FDA OOS guidance establishes a phased approach with Phase I (laboratory investigation, typically days) and Phase II (full manufacturing investigation, typically weeks). Pharma deviations typically target 30 days. Device CAPAs targeted to 60-90 days depending on severity. The clock is documented in the SOP and tracked per investigation; exceeding the clock requires documented justification and is a metric for management review. Speed must not compromise rigour — a rushed RCA that misses the root cause produces a 483 finding for inadequate investigation.
Q.When should we escalate beyond 5 Whys?+
When severity is high (death, serious deterioration, public-health implication); when the event is recurrent (already had 5 Whys done and recurred); when the system is safety-critical (life-supporting, life-sustaining, surgical); when multiple causal chains co-exist; when the cause spans organisations (manufacturer + supplier + service provider); when human action is significant (use Human Performance investigation); when a recent change is suspected (use change analysis). The technique-selection guide should be in the CAPA SOP.
Q.Does RCA apply to near misses?+
Yes — and aggressively so. Near misses are the cheapest investigation opportunity in the QMS: same root-cause-detection benefit, no actual harm, no batch impact. Mature CAPA programmes treat near miss reporting as a leading indicator and run RCA at lower triggering thresholds for near misses than for actual non-conformances. The management-review trend on near-miss reporting rate is a culture metric.
Q.Who owns RCA — QA or operations?+
Both. QA owns the methodology, the documentation, the inspectability and the closure approval. Operations owns the event context, the evidence at the moment, the implementation of the CAPA. The investigation team must include both — QA-only investigations miss the on-floor context; operations-only investigations miss the systemic perspective. Management owns the system-level patterns through management review.
Primary sources
- 21 CFR 820.100 — Corrective and preventive action
- FDA QMSR — Final Rule (effective 2 Feb 2026)
- ICH Q10 — Pharmaceutical Quality System (with §3 CAPA)
- FDA Guidance — Investigating Out-of-Specification Test Results (2006)
- ISO 9001:2015 §10.2 — Nonconformity and corrective action
- ISO 13485:2016 §8.5.2 / §8.5.3 — Corrective + Preventive action
- MDSAP Audit Model — Chapter 4 (CAPA)
- AIAG / VDA — Failure Mode and Effects Analysis Handbook (2019)
- IAEA / NUREG — Fault Tree Handbook (NUREG-0492)
- ASQ — Root Cause Analysis
Further reading
- 5 WhysThe most common single technique used inside an RCA — covered with the rigour regulators expect.
- CAPAWhere RCA output lands — corrective + preventive action plan and verification.
- CAPA effectivenessThe loop back — if RCA missed the cause, effectiveness verification fails.
- DeviationPharma trigger that requires an investigation with documented RCA.
- OOSOut-of-Specification investigations under FDA 2006 guidance — Phase II is structured RCA.
- Non-conformanceDiscrete-manufacturing trigger.
- Customer complaintsField-detected triggers — RCA on the device + on the manufacturing process.
- ISO 14971Device risk management — RCA feeds the RMF when post-market data reveals an unanticipated hazard.
- How V5 Ultimate runs RCA end-to-endTechnique selector by severity + complexity, evidence-gated tree, cross-product scope assessment, effectiveness loop.
Explore this topic
Root cause analysis sits inside this topic cluster in our glossary. Every neighbour is one click away.
Root-cause toolkit, SPC, capability and the rest of the QA practitioner's bench.
V5 Ultimate ships with the Root cause analysis controls already wired in — audit trail, e-signatures, validation evidence. Free trial, no credit card, onboard in days, not months.
