Skip to content

Cognitive bias checklist

Status: Documented, not shipped · Evidence: C · Family: Assumption and belief challenge · Verdict: reject (2026-06-09)

A cognitive bias checklist takes a decision or a conclusion you have already reached and runs it against a list of named cognitive biases - anchoring, confirmation bias, sunk-cost, availability, overconfidence, base-rate neglect, and so on - asking of each, “is this one operating here?” The move is recognition by taxonomy: instead of open-ended self-criticism, the analyst walks a fixed catalogue of known failure modes and flags the ones that seem to apply, then adjusts. It is the lightweight, self-administered end of a much larger debiasing tradition; the appeal is that a checklist is cheap, repeatable, and needs no facilitator.

A sharp distinction has to be drawn at the start, because it decides everything downstream. There are two different things both called a “bias checklist,” and only one of them is the candidate here:

  • (1) A list of biases to scan a decision against - the recognition/awareness sense, “which of these biases am I committing?” This is the candidate’s stated mechanism (“run a decision against relevant biases”).
  • (2) A domain-specific procedural checklist that forces a counter-bias working method - “always compute the base rate before estimating,” “always seek one disconfirming source,” the Kahneman-Lovallo decision-audit style. This is not a list of biases at all; it is a fixed procedure that happens to neutralise a bias as a side effect, and where it works it works by being that specific procedure (a base-rate step, a disconfirmation step), not by enumerating biases.

This dossier grades sense (1), the bias-recognition scan, because that is what the registry entry describes. Sense (2) is real and more defensible, but it is not a single move - it is whatever specific counter-bias procedure you bolt on, and several of those already ship here (natural-frequency Bayesian framing, reference-class forecasting, premortem, evidence-vs-inference sort).

It helps, modestly, as a prompt for breadth when someone is about to commit and has done no self-criticism at all: a checklist guarantees you at least consider sunk-cost and confirmation bias rather than forgetting them, the way any recall-prompt beats free recall. For a fast, low-stakes sanity pass it is better than nothing, and practitioners report it makes them “slow down.”

It misleads, or simply fails, in the ways the evidence section documents in detail:

  • You cannot see your own bias by looking inward. The whole move assumes introspective access to your own reasoning errors, and that assumption is the single best-replicated finding against it (the bias blind spot, below). Running the checklist on yourself, the most common use, is the use the evidence most directly undercuts.
  • Naming the bias does not disarm it. Knowing you are anchored, or that $19.99 reads as much less than $20.00, does not stop the pull (the G.I. Joe phenomenon). Recognition is not correction; a checklist delivers recognition and stops there.
  • It invites motivated mislabelling. A fixed taxonomy is just as available for dismissing an inconvenient objection (“that critic is just loss-averse”) as for catching your own error - the checklist has no mechanism to keep you honest about which way you apply it.
  • It can manufacture false confidence. Having “checked for biases” can license the original decision rather than improve it - the audit becomes a ritual that certifies the answer you already wanted.

The honest summary: a bias checklist is a recall aid for failure modes, not a debiasing intervention. The situations where it adds real value are the ones where it stops being a list of biases and becomes a specific counter-procedure - which is then better run as that procedure.

The honest grade is C (conceptually plausible but under-tested), revised down from the catalog’s prior P tag. The reason for the revision is the heart of this dossier: the robust, nameable evidence in the debiasing literature is for interventions that are not a self-administered bias-recognition checklist, and the evidence that bears directly on the checklist-scan move is either absent or negative.

What the record supports - but for the wrong intervention (transferred evidence, capped). There is genuinely good evidence that some debiasing works. Sellier, Scopelliti and Morewedge (2019, Psychological Science) ran a field experiment: graduate students trained on a single 80-100 minute serious video game (“Missing: The Pursuit of Terry Hughes”) were about 29% less likely (58.8% vs 72.2% choosing the inferior hypothesis-confirming option) to fall to confirmation bias on an unannounced later business case, and the effect held weeks later. A 2025 Nature Human Behaviour systematic review and meta-analysis (Swaryandini et al., 54 RCTs, 383 effect sizes, 10,941 participants) found educational debiasing interventions produced a small but significant reduction in committing biases (g = 0.26, 95% CI 0.14 to 0.39). Both are real and both are positive. Neither tests a checklist. The Sellier intervention is an interactive game with play-teach loops, after-action reviews, personalised feedback, and practice problems; the authors explicitly attribute its large effects to “the personalized feedback and practice” it delivers - the high end of Fischhoff’s hierarchy, not the low end. The Nature meta-analysis is of classroom education, and it flags exactly the two caveats that matter here: some biases (the representativeness heuristic) resisted intervention, and “the depth and transferability of learning beyond classroom settings” is uncertain. Leaning on either to grade a checklist would be laundering a heavier, different cousin’s robustness into this move; the conservative rule forbids it, and so these cap at supporting “debiasing can work,” not “this checklist works.”

What the record says about the checklist move specifically - and it is not encouraging. Fischhoff’s foundational “Debiasing” (1982, in Kahneman/Slovic/Tversky, Judgment under Uncertainty) lays out four escalating levels of debiasing treatment: (A) warn that bias exists, (B) describe the typical bias, (C) give personalised feedback, (D) run an extended training programme. A bias checklist is level A/B - tell people the biases exist and describe them - which is the level Fischhoff and the subsequent literature find weakest. Two well-replicated findings cut directly against the self-scan:

  • The bias blind spot (Pronin, Lin and Ross, 2002, Personality and Social Psychology Bulletin; directly replicated in a 2024 preregistered Brazilian sample, Collabra: Psychology): people rate themselves as markedly less susceptible to biases than others, and - critically - they continue to insist their self-assessments are accurate even after the relevant bias and its mechanism are explained to them. Because we judge our own bias by introspection (which returns nothing, since the processes are not consciously accessible) and others’ by behaviour, a self-administered checklist is pointed exactly where introspection is blind.
  • The G.I. Joe phenomenon / fallacy (Kristal and Santos, Harvard Business School working paper 21-084; building on Gendler and Santos at Yale): “knowing is half the battle” is false for many biases. Some biases are informationally encapsulated - knowledge of the bias cannot penetrate the representations or affect that generate it - so awareness, which is all a checklist delivers, leaves the behaviour intact.

On retention and transfer. Korteling, Gerritsma and Toet (2021, Frontiers in Psychology), a systematic review of bias-mitigation retention and transfer, concludes “there is currently insufficient evidence that bias mitigation interventions will substantially help people to make better decisions in real life conditions.” Notably, that same review points toward sense-(2) procedural aids (“checklists or premortems”) as the more promising path - i.e., not the bias-recognition scan, but strict working methods that circumvent biased thinking. Mixed direct evidence exists for narrow, well-built procedural checklists (for example Hallihan and colleagues’ design-science work validating a checklist-style intervention that reduced availability bias in professional designers), but those are domain-specific procedures, not a general “scan for biases” list, and other controlled tests of cognitive forcing in medicine found no significant reduction in diagnostic error.

Excluded on the evidence rule. A figure surfaced in search aggregation claiming that adding “structural checks” raises debiasing effectiveness from roughly 10% to 40-60%. It traces to no nameable primary source in the materials reviewed; per this library’s rule it is excluded and has not influenced the grade - stated here so the absence is on the record rather than laundered into a number.

Net. The conceptual case is plausible and the breadth-prompt value is real, so this is not poor-or-contradictory (X). But the controlled evidence is for heavier, different interventions (transferred, capped), and the move’s central assumption - introspective self-detection of bias - is specifically contradicted by the best-replicated findings. That profile is C, not the solid practitioner P of a tool with genuine uncontested adoption for its actual function.

Verdict: Reject (status: excl). This overturns the catalog’s prior cand / build / P tag. The decision rests on two independent grounds, either of which would be enough.

Ground one - the evidence gate. A standalone skill needs evidence for its move. The move here is the self-administered bias-recognition scan, and that specific move is the most-undercut intervention in the debiasing literature (Fischhoff level A/B; contradicted on its core assumption by the bias blind spot and the G.I. Joe phenomenon). The library’s identity is honest evidence grading; shipping a standalone “run your decision against a list of biases” skill at a P that the evidence does not support - while the only robust evidence is for a heavier game and for classroom education on specific biases - would be exactly the laundering this library exists to prevent. The catalog has rejected on this combined ground before (six-thinking-hats flagged X on weak evidence; key-assumptions-check rejected as overlap).

Ground two - distinctness below the ~20% ceiling. Even granting the move, it does not clear distinctness against the shipped catalog. Its working mechanism - take one conclusion/decision and surface where reasoning went wrong - is already produced, and produced better, by skills that ship:

  • vs think-ladder-of-inference-check (the closest auditing neighbour): the ladder already reconstructs how a single conclusion was reached - observable data, the data actually selected, the meaning and assumptions added - then flags the riskiest leap and tests an alternative interpretation. That is a structured audit of a conclusion for distortion, and selection-plus-interpretation is where most of the checklist’s biases live (confirmation, availability, anchoring all show up as selective data and added meaning). The ladder does it as a causal reconstruction rather than a taxonomy lookup, which is the more rigorous form of the same move. High overlap.
  • vs think-red-team-light (the closest adversarial neighbour): red-team-light constructs the strongest case against a thesis and ranks objections by force. A bias checklist is, functionally, “objections drawn from a fixed taxonomy of cognitive failure modes” - a templated, weaker subset of what red-team-light already does open-endedly. If anyone wants the bias-taxonomy prompt at all, it belongs as an optional checklist inside red-team-light’s objection-generation step, not as a separate skill. High overlap.
  • vs think-evidence-vs-inference-sort (not named in the brief but directly adjacent): labelling claims evidence / inference / assumption catches the unwarranted-leap family of biases by construction.
  • vs think-decision-option-review: lower overlap (it compares options rather than auditing one decision), but note it already internalises a specific bias guard - the false-precision flag and “what would flip it” - which is the procedural, sense-(2) way to handle a bias, done well.

The residual that is genuinely checklist-specific is only the taxonomy lookup (the named list of biases). A list is a prompt, not a cognitive move; it is the kind of asset that folds in as an optional reference inside an existing skill (the way the 6M/8P checklist folds into issue-tree), not a skill of its own.

Why reject rather than fold: a clean fold needs one shipped skill whose mechanism is essentially identical (as fishbone folds into issue-tree). Here the overlap is real but diffuse - the move is partly red-team-light, partly ladder-of-inference-check, partly evidence-vs-inference-sort - with no single skill that mechanically subsumes it, and it independently fails the evidence gate on its own merits. That combination (under-evidenced standalone move plus diffuse sub-threshold overlap) is a reject on the merits, not a single-target fold. If the move is wanted operationally, the home is an optional bias-prompt within red-team-light or ladder-of-inference-check; and the more defensible sense-(2) counter-procedures already ship as their own skills (natural-frequency Bayesian framing for base-rate neglect, reference-class forecasting for the planning fallacy and optimism, premortem for overconfidence).

The learning value of this decision: “be aware of your biases” is the most popular debiasing advice and one of the least supported. A famous, intuitive, universally-recommended move can still fail both gates at once - the evidence says self-scanning for bias largely does not work, and what little the move adds over the catalog is a list, not a method. Rejecting it keeps the catalog honest and points users at the specific counter-procedures that actually move the needle.

The idea that decisions are systematically distorted by biases is Amos Tversky and Daniel Kahneman’s (heuristics-and-biases programme, from 1974; Judgment under Uncertainty, 1982; Kahneman, Thinking, Fast and Slow, 2011). The popular “checklist of biases” descends from that work and from decision-audit practice (Kahneman, Lovallo and Sibony’s “before-you-make-that-big-decision” checklist in Harvard Business Review, 2011, which is closer to sense (2): a process audit, not a bias-recognition list). For the honest limits, read Baruch Fischhoff’s “Debiasing” (1982) for the four-level hierarchy that places a checklist at the weak end; Pronin, Lin and Ross (2002) on the bias blind spot for why self-scanning is pointed at a blind area; Kristal and Santos (and Gendler and Santos) on the G.I. Joe fallacy for why awareness is not correction; and Korteling, Gerritsma and Toet (2021) for the retention-and-transfer verdict and the pivot toward procedural aids. For the positive side - what real debiasing looks like - read Sellier, Scopelliti and Morewedge (2019) and the Swaryandini et al. (2025) meta-analysis, and note that both describe interventions far heavier than a checklist. “Cognitive bias” and “debiasing” are generic scientific terms in common use; there is no trademark and no branded owner, so this entry is documented descriptively and is not flagged as branded.

  • Baruch Fischhoff, “Debiasing,” in D. Kahneman, P. Slovic & A. Tversky (eds.), Judgment under Uncertainty: Heuristics and Biases (Cambridge University Press, 1982), pp. 422-444. The foundational treatment; four escalating levels of debiasing (warn / describe / personalised feedback / extended training), with a bias checklist at the weak A/B end. (Foundational)
  • Emily Pronin, Daniel Y. Lin & Lee Ross, “The Bias Blind Spot: Perceptions of Bias in Self Versus Others,” Personality and Social Psychology Bulletin 28 (2002): 369-381. People see bias in others but not themselves, and persist even after the mechanism is explained; preregistered direct replication in a Brazilian sample, Collabra: Psychology 10 (2024). The strongest evidence against self-administered scanning. (S, replicated)
  • Ariella S. Kristal & Laurie R. Santos, “G.I. Joe Phenomena: Understanding the Limits of Metacognitive Awareness on Debiasing,” Harvard Business School working paper 21-084 (building on Gendler and Santos, Yale). Awareness of a bias does not, by itself, correct it; some biases are informationally encapsulated. (Conceptual/empirical)
  • Anne-Laure Sellier, Irene Scopelliti & Carey K. Morewedge, “Debiasing Training Improves Decision Making in the Field,” Psychological Science 30(9) (2019): 1371-1379. A serious-game training intervention (not a checklist) cut confirmation bias ~29% on an unannounced field case; effect attributed to personalised feedback and practice. The positive evidence is for the heavy end, not the checklist. (Strong - but for a different intervention)
  • Ghassani Swaryandini et al., “Systematic review and meta-analysis of educational approaches to reduce cognitive biases among students,” Nature Human Behaviour 9(12) (2025): 2510-2538. 54 RCTs, 383 effect sizes, 10,941 participants; small significant effect g = 0.26 (95% CI 0.14-0.39); representativeness heuristic resistant; transfer beyond the classroom uncertain. (Meta-analysis - of education, not a checklist)
  • J.E. (Hans) Korteling, J.Y.J. Gerritsma & Alexander Toet, “Retention and Transfer of Cognitive Bias Mitigation Interventions: A Systematic Literature Study,” Frontiers in Psychology 12 (2021): 629354. Insufficient evidence that bias-mitigation interventions improve real-life decisions; points toward strict procedural aids (checklists, premortems) over bias-recognition training. (Systematic review)
  • Daniel Kahneman, Dan Lovallo & Olivier Sibony, “Before You Make That Big Decision…,” Harvard Business Review (June 2011). The decision-audit checklist; closer to sense (2) - a process audit run by a third party - than to a self-administered bias-recognition list. (Practitioner)

Excluded on the evidence rule: the claimed “structural checks raise debiasing from ~10% to 40-60%” figure traces to no nameable primary source in the materials reviewed and is not counted toward the grade.

Was this page helpful?
Thinking Framework Skills v0.8.0 · 56 frameworks