Analysis of Competing Hypotheses (ACH)

Status: Documented, not shipped · Evidence: X · Family: Assumption and belief challenge · Verdict: reject (2026-06-11)

What it is

Analysis of Competing Hypotheses (ACH) is Richards J. Heuer Jr.’s procedure for judging which of several rival explanations best fits a body of evidence, developed at the CIA in the 1970s and published in Psychology of Intelligence Analysis (Center for the Study of Intelligence, 1999). The mechanism is an evidence-by-hypothesis disconfirmation matrix:

Enumerate a full set of competing hypotheses (including the unpleasant ones), rather than evaluating the favorite alone.
List every significant item of evidence and argument bearing on any of them.
Build a matrix with hypotheses across the top and evidence down the side, and mark each cell consistent, inconsistent, or not applicable.
Weigh each item of evidence by its diagnosticity: evidence consistent with every hypothesis discriminates nothing and should carry no weight.
Work to disprove hypotheses rather than confirm them, and tentatively accept the hypothesis with the least inconsistent evidence, not the one with the most consistent evidence.
Sensitivity-check the conclusion against the few most diagnostic items, report the relative standing of all hypotheses rather than only the winner, and name indicators that would signal a different hypothesis becoming true.

The durable cognitive idea inside the ritual is real and important: confirmation accumulates cheaply because most evidence is consistent with many explanations, so the discriminating question is which evidence is incompatible with which hypothesis. ACH operationalizes that idea as matrix-wide inconsistency scoring. The method is institutionally everywhere: it anchors the US intelligence community’s structured-analytic-techniques tradecraft, the UK Professional Head of Intelligence Assessment (PHIA) training, Heuer and Pherson’s Structured Analytic Techniques for Intelligence Analysis, and dedicated software (the PARC ACH tool, built with Heuer at the Palo Alto Research Center).

When it helps / when it misleads

Where it plausibly helps. The front half of the procedure - forcing a full set of rivals onto the table before any evidence is weighed - attacks a genuine failure mode (single-hypothesis satisficing), and the matrix leaves an auditable trail of which evidence was held to bear on what, which is useful for communicating disagreement inside a team. The only experimental populations that showed a benefit are inexperienced ones: Lehner, Adelman, Cheikes and Brown (2008) found ACH reduced confirmation bias for participants without intelligence-analysis experience but not for experienced analysts, and Folker (2000) found military analysts using a simplified hypothesis-testing matrix outperformed intuition on one scenario in a small experiment.

Where it misleads. The controlled record (next section) says the core promise - that running the matrix debiases the analyst - largely does not hold, and three specific failure modes recur:

The ritual certifies the conclusion. Otzipka (2025) found participants applying ACH became significantly more confident in their guilt assessments without any accompanying accuracy or bias benefit. A procedure that raises confidence while not improving judgment is worse than no procedure on exactly the high-stakes calls it is marketed for.
The matrix is not the active ingredient. Dhami, Belton, De Werd, Hadzhieva and Wicke (2024) found the ACH-style layout specifically (hypotheses in columns, evidence in rows) did not reduce confirmation bias and did not increase sensitivity to evidence credibility, while a transposed layout did reduce bias - the one part of ACH that is unmistakably ACH failed its own test.
Trained users skip the steps. Dhami, Belton and Mandel (2019) found ACH-trained analysts did not follow the procedure’s steps, and the method may have increased judgment inconsistency and error. A long-standing methodological critique points at the mechanism: counting inconsistencies treats evidence items as independent and equally weighted, which they almost never are.

When NOT to use: as a debiasing guarantee. Whatever value the matrix has as a bookkeeping and communication device, the evidence says it does not deliver the bias protection that is its stated purpose, and it can add unearned confidence on top.

What the evidence says

The honest grade is X (poor or contradictory), confirming the registry’s preliminary grade. ACH is the unusual case where the actual move - not a cousin, not a transfer from another domain - has been tested repeatedly in controlled experiments, on its target populations (intelligence analysts, legal decision-makers, forensic evaluators), against its stated purpose (mitigating confirmation bias), and the record is null-to-negative:

Dhami, Belton and Mandel (2019, Applied Cognitive Psychology 33(6), 1080-1090). Fifty intelligence analysts randomly assigned to use ACH or not on a hypothesis-testing task with probabilistic ground truth. ACH-trained analysts did not follow all the steps, the evidence for confirmation-bias reduction was mixed, and ACH may have increased judgment inconsistency and error. The single most direct test of the method on its home population.
Whitesmith (2019, Intelligence and National Security 34(2)). The ACH variant taught to the UK intelligence community had no statistically significant mitigative effect on serial position effects or on confirmation bias in an intelligence-analysis scenario; expanded in her book Cognitive Bias in Intelligence Analysis (Edinburgh University Press, 2020).
Maegherman, Ask, Horselenberg and van Koppen (2021, Applied Cognitive Psychology 35(1), 62-70). 191 law students; ACH instruction produced no benefit over generic bias information in evidence-seeking on a criminal case (participants in both conditions already chose disconfirming questions). The authors suggest the method’s effectiveness “continues to be found to be problematic.”
Karvetski and Mandel (2020, Judgment and Decision Making 15(6), 939-958). 227 participants judged posterior probabilities from uncertain evidence (varying source reliability and information credibility) across six cases. Participants who used ACH first were no more additive, no more Bayesian-coherent, no more choice-consistent, and no better at avoiding conservatism than controls; ACH users were in fact slightly less reliable across isomorphic cases. A second randomized null on probabilistic judgment quality, from one of the same researchers.
Dhami, Belton, De Werd, Hadzhieva and Wicke (2024, Cognitive Research: Principles and Implications 9, 37). Study 1 (N = 161): the ACH-style matrix layout did not reduce confirmation bias and conferred no benefit in sensitivity to evidence credibility; a transposed layout (hypotheses in rows) did reduce bias. Study 2 (N = 62): most Dutch military analysts did not exhibit confirmation bias in the first place, undercutting the problem framing ACH is sold on.
Otzipka (2025, Applied Cognitive Psychology 39(5), e70115). 222 law students assessing an ambiguous homicide investigation; no confirmation bias emerged to debias, but applying ACH significantly increased participants’ confidence in their guilt assessments - confidence without accuracy.
Otzipka and Volbert (2026, PLOS One). 159 participants with credibility-assessment knowledge as mock expert witnesses; adversarial allegiance appeared in evidence emphasis, and applying ACH did not significantly influence credibility ratings or evidence selection - no debiasing of adversarial allegiance.

Against this stand two partial positives, both on the matrix’s weakest claim and both predating the negative wave: Lehner, Adelman, Cheikes and Brown (2008, IEEE Transactions on Systems, Man, and Cybernetics - Part A 38(3), 584-592), where ACH reduced confirmation bias only for participants without intelligence-analysis experience, and Folker (2000, Joint Military Intelligence College occasional paper), a small experiment in which a simplified hypothesis-testing matrix beat intuition on one scenario. A novices-only, small-sample positive fringe does not offset randomized and replicated nulls on the method’s own population and purpose.

Two honesty notes. First, what X does and does not mean here: this is not “undertested” (that would be C); it is tested and found wanting - the strongest possible documentation case and the weakest possible shipping case. Second, what the grade is not based on: ACH’s institutional adoption (US tradecraft primers, PHIA, the PARC tool, decades of training curricula) is adoption evidence, not outcome evidence, and has not been counted toward the grade. One external research run argued for building ACH because “its structural artifact is unparalleled” while conceding the human evidence is mixed; by this library’s rules that is grade laundering - artifact elegance is not an evidence tier - and the argument is rejected. Every numeric claim above maps to a named source; no unverifiable statistics were used.

Why it is / is not a skill here

Verdict: Reject (status: excl), confirming the preliminary registry verdict after independent verification of its claimed record. The decisive ground is the evidence gate: X never ships in this library, and ACH grades X on direct, controlled, repeatedly replicated tests of the actual move.

What makes this rejection worth documenting is that ACH passes the distinctness test it is usually rejected on. No shipped skill builds an evidence-by-hypothesis disconfirmation matrix weighted by diagnosticity:

vs think-evidence-vs-inference-sort (closest shipped neighbor, medium overlap): the sort classifies the claims inside one line of reasoning into evidence, inference, and assumption; it has no rival-hypothesis axis and no diagnosticity scoring. ACH would have added the cross-tabulation.
vs think-red-team-light (medium-low): shares the disconfirmation spirit, but red-team-light attacks one proposal adversarially; it does not hold a set of rivals simultaneously against a common evidence base.
vs think-decision-option-review (low, structural only): that matrix scores actions against preference criteria; ACH scores explanations against evidence consistency. The grid shape is the only resemblance.
vs think-what-would-have-to-be-true and think-ladder-of-inference-check (low): one option’s truth conditions, and one conclusion’s reasoning trace, respectively; neither compares rivals.

Distinctness cannot rescue a failed evidence gate. The reverse - shipping it because nothing else does this - is exactly the build-on-artifact-quality argument rejected above.

Hard walls against the cluster siblings (vetted in the same batch, sharing the rival-hypothesis / configurational causal space):

vs process-tracing (cand, P, build): the nearest sibling, and the wall matters in both directions. ACH cross-tabulates all evidence against all hypotheses and ranks rivals by matrix-wide inconsistency counts; process tracing types each piece of within-case evidence by its necessity and sufficiency for a causal claim (hoop, smoking-gun, straw-in-the-wind, doubly decisive) and updates explicitly, case by case. The evidence verdicts also diverge honestly: ACH’s move has direct controlled trials and they are null-to-negative (X), while process tracing claims methodological validity in case research and has not been controlled-tested as a debiasing protocol (P). ACH’s exclusion therefore does not transfer to process tracing, and the defensible core ACH gestures at - weigh evidence by its power to discriminate rivals - survives in the catalog through that candidate, with per-item test-typing in place of the inconsistency arithmetic the trials condemned.
vs qualitative-comparative-analysis (cand, P, reject): different unit of analysis entirely. QCA codes many cases as condition configurations in a truth table and derives necessary or sufficient recipes across cases; ACH scores many evidence items against rival explanations of a single situation. Beyond “both are grids,” no mechanism is shared.
vs system-archetypes (cand, C, build): no mechanism contact. Archetype-matching operates on feedback structure (its real collision is with shipped think-causal-loop-diagrams and think-iceberg-model, not with anything in ACH’s space). No wall needed beyond noting the spaces are disjoint.

Why reject rather than fold: a fold requires a shipped skill whose mechanism subsumes this one, and none does - the mechanism is distinct, it is just discredited. Why excl rather than flag: flag is for methods worth including with caveats; a method whose randomized record on its own population shows no benefit and possible harm (inconsistency, error, unearned confidence) is excluded on the merits, with this dossier as the public reasoning.

The learning value of this decision: ACH is the most institutionally entrenched method this library has rejected - taught to two national intelligence communities, embedded in tradecraft doctrine, supported by dedicated software - and the controlled record still says the matrix does not do the one thing it promises. Pedigree, adoption, and a genuinely distinct artifact are all real here, and none of them is outcome evidence. If you want disconfirmation discipline today, the shipped homes are think-red-team-light (attack one thesis), think-evidence-vs-inference-sort (audit the claim structure), and think-what-would-have-to-be-true (test a named option’s load-bearing conditions); if rival-explanation discrimination ships here, it will be through the better-grounded process-tracing candidate.

Lineage and who to read

ACH was developed by Richards J. Heuer Jr. in the mid-1970s at the CIA’s Directorate of Intelligence and published in Psychology of Intelligence Analysis (CIA Center for the Study of Intelligence, 1999), chapter 8. Heuer and Randolph Pherson systematized it for the structured-analytic-techniques canon in Structured Analytic Techniques for Intelligence Analysis (CQ Press, 2011; 3rd ed. 2021), and the PARC ACH software (Palo Alto Research Center, with Heuer, mid-2000s) gave it a tool form. It entered doctrine through the US intelligence community’s tradecraft primers and the UK’s PHIA training. For the empirical reckoning, read Mandeep K. Dhami, Ian K. Belton and David R. Mandel (the 2019 randomized test and the 2024 layout studies), Christopher W. Karvetski and David R. Mandel (the 2020 probabilistic-coherence null), Martha Whitesmith (the 2019 experiment and the 2020 Edinburgh University Press book, the fullest book-length treatment of why ACH fails its debiasing brief), and the legal-domain line of Enide Maegherman and colleagues (2021) and Jana Otzipka and Renate Volbert (2025, 2026). “Analysis of Competing Hypotheses” is a generic descriptive term from a US government publication in the public domain; there is no trademark holder, so the entry is documented with attribution and is not flagged as branded.

Named sources

Richards J. Heuer Jr., Psychology of Intelligence Analysis (CIA Center for the Study of Intelligence, 1999), ch. 8. The origin: the eight-step procedure, diagnosticity, and the least-inconsistency decision rule. (Foundational; not outcome evidence)
Robert D. Folker Jr., Intelligence Analysis in Theater Joint Intelligence Centers: An Experiment in Applying Structured Methods (Joint Military Intelligence College occasional paper, 2000). Small experiment; a simplified hypothesis-testing matrix beat intuitive analysis on one scenario. (Weak positive, small N)
Paul E. Lehner, Leonard Adelman, Brant A. Cheikes and Mark J. Brown, “Confirmation Bias in Complex Analyses,” IEEE Transactions on Systems, Man, and Cybernetics - Part A 38(3) (2008): 584-592. ACH reduced confirmation bias only for participants without intelligence-analysis experience; no effect for experienced analysts. (Mixed)
Mandeep K. Dhami, Ian K. Belton and David R. Mandel, “The ‘analysis of competing hypotheses’ in intelligence analysis,” Applied Cognitive Psychology 33(6) (2019): 1080-1090. Fifty analysts randomized; steps not followed, mixed bias effects, possibly increased inconsistency and error. (Negative; the key randomized test)
Martha Whitesmith, “The efficacy of ACH in mitigating serial position effects and confirmation bias in an intelligence analysis scenario,” Intelligence and National Security 34(2) (2019); and Cognitive Bias in Intelligence Analysis: Testing the Analysis of Competing Hypotheses Method (Edinburgh University Press, 2020). No statistically significant mitigation of either bias. (Negative)
Enide Maegherman, Karl Ask, Robert Horselenberg and Peter J. van Koppen, “Test of the analysis of competing hypotheses in legal decision-making,” Applied Cognitive Psychology 35(1) (2021): 62-70. 191 law students; no ACH benefit over generic bias information. (Null)
Christopher W. Karvetski and David R. Mandel, “Coherence of probability judgments from uncertain evidence: Does ACH help?,” Judgment and Decision Making 15(6) (2020): 939-958. 227 participants; ACH did not improve additivity, Bayesian coherence, choice-consistency, or conservatism, and slightly reduced reliability across isomorphic cases. (Null on probabilistic judgment quality)
Mandeep K. Dhami, Ian K. Belton, Peter De Werd, Velichka Hadzhieva and Lars Wicke, “Effects of task structure and confirmation bias in alternative hypotheses evaluation,” Cognitive Research: Principles and Implications 9, 37 (2024). The ACH-style matrix layout did not reduce confirmation bias or improve credibility sensitivity; a transposed layout did; most military analysts showed no confirmation bias at baseline. (Negative on the ACH-specific layout)
Jana Otzipka, “The Analysis of Competing Hypotheses in Legal Proceedings,” Applied Cognitive Psychology 39(5) (2025): e70115. 222 law students; no baseline bias to debias, and ACH significantly increased confidence in guilt assessments. (Negative-leaning: confidence without accuracy)
Jana Otzipka and Renate Volbert, “The analysis of competing hypotheses and expert witness testimony: Counteracting adversarial allegiance in witness credibility assessments?,” PLOS One (2026). 159 mock expert witnesses; ACH did not significantly influence credibility ratings or evidence selection; no debiasing of adversarial allegiance. (Null)

Was this page helpful?

Thinking Framework Skills v0.8.0 · 56 frameworks