Consider the Unknowns

Confidence tracks the strength and coherence of the evidence you actually considered, and people systematically neglect what is missing - the consumer-psychology literature calls this bias omission neglect. A judgment built on three observations feels as solid as one built on thirty if the three cohere. Every other belief-challenge move works on material that is PRESENT: claims made, assumptions held, counterarguments available, failures imaginable. Consider-the-unknowns works on material that is ABSENT. The durable move is to make the absence itself an object of attention - list the relevant variables you do NOT have, classify each as resolvable or genuinely unobservable, rate how much each would change the call, and then re-state your confidence against that mapped gap. The output is a known-unknowns ledger: the judgment, the relevant unknown variables with their bearing and obtainability, a flag on the ones worth resolving before committing, and a re-rated confidence with the delta and the reason its size is what it is.

When to Use

A one-off, consequential judgment is being made from thin or partial evidence, and no usable base-rate class exists: a competitive read, a market-entry call, a diagnosis from incomplete data, a hiring or vendor judgment where the file is mostly silence.
A confident call rests on a small, coherent slice of evidence, and it is worth knowing how much of that confidence comes from the evidence held versus from never having looked at what is missing.
The judge is plausibly overconfident. The controlled evidence shows the move is selective - it cut confidence where people were overconfident and left well-calibrated judgments alone, closer to targeted medicine than to a blanket confidence tax.

When NOT to Use

Do not use it when a genuine reference class exists. Base rates beat introspective gap-mapping; route to think-reference-class-forecasting. Mapping unknowns when you could just look up the outside view is the slower, weaker path.
Do not use it to widen a numeric interval. The nearest controlled test found post-estimate reasoning prompts largely ineffective for interval overprecision; mechanical widening, re-elicitation, and calibration training did better (Ferretti, Montibeller and von Winterfeldt, 2023). This is medicine for item- and domain-level confidence, not interval width. Selling it as interval-width repair is overclaiming.
Do not use it when the unknowns are cheap to resolve. Go get the information. Cataloging resolvable unknowns instead of resolving them is procrastination with a worksheet.
Do not use it on an already-calibrated or underconfident judge. The evidence shows little effect there, and on an anxious, underconfident call the ledger only feeds doubt.
Do not run it on a low-stakes, reversible decision. An unknowns audit on a two-way door is process for its own sake. Triage one-way-vs-two-way first, and time-box the ledger when you do run it - the space of things you do not know is unbounded, so the ledger covers RELEVANT unknowns, not all unknowns.
Do not use it when a sibling skill owns the task. Testing one specific named assumption is think-what-would-have-to-be-true; imagining how a plan fails is think-premortem; generating the strongest KNOWN case against a favored view is think-red-team-light. This move enumerates the absent, not the present.

Instructions

When asked to pressure-test the confidence behind a judgment made from incomplete information, follow these steps:

State the judgment and the current confidence in one line. Name the specific call under consideration and the confidence currently attached to it (a level or a rough percentage). The ledger exists to re-rate THIS confidence; with no judgment and no stated confidence, stop.
Confirm this is the right move. Check the walls: is there a real reference class (route to reference-class-forecasting)? Is the task widening a numeric interval (this is not that medicine)? Are the unknowns cheap to just resolve (go resolve them)? Is the judge already calibrated or underconfident (skip)? Is the decision low-stakes and reversible (triage first)? If any wall applies, say so and stop or redirect.
Enumerate the relevant unknowns. List the variables that bear on the judgment but are unknown, unobserved, or unobtainable - the evidence you do NOT have. Push past the obvious; the whole point is to surface what attention skipped. Keep it to RELEVANT unknowns and time-box: the space of the unknown is unbounded.
Rate each unknown’s bearing on the judgment. For each, how much would knowing it move the call - high, medium, or low? A high-bearing unknown is one whose resolution could flip or substantially change the judgment.
Classify each unknown’s obtainability. Mark each as resolvable (obtainable, and at roughly what cost or effort) or genuinely unobservable. This is what separates a gap you can close from a gap you must live with.
Flag the unknowns worth resolving before committing. The high-bearing AND resolvable unknowns are the action list: resolve these before the judgment hardens. High-bearing but unobservable unknowns are the irreducible uncertainty the confidence must honestly absorb.
Re-rate confidence against the mapped gap. State the new confidence, the delta from the original, and the reason the delta is the size it is. The move is selective - if the original confidence was already honest about the gap, the delta may be small or zero, and that is a valid result, not a failure.
Emit the known-unknowns ledger artifact per references/TEMPLATE.md: the judgment and original confidence, the table of unknowns with bearing and obtainability, the resolve-before-committing flags, and the re-rated confidence with its delta and rationale - including the pre-printed evidence caveat.

Output Format

Use the template in references/TEMPLATE.md. The deliverable is the filled known-unknowns ledger - the judgment, the table of relevant unknowns with bearing and obtainability, the resolve-first flags, and the re-rated confidence with its delta and reason - not a prose essay. Do not pad the ledger with every conceivable unknown; relevance and bearing are the filters.

Quality Checklist

Before finalizing, verify:

The judgment under consideration and its original confidence are both stated in one line.
The walls were checked: no real reference class, not an interval-width task, unknowns not cheap to just resolve, judge not already calibrated or underconfident, decision not low-stakes-reversible. If a wall applied, the skill redirected or stopped instead of producing a ledger anyway.
The unknowns listed are RELEVANT and absent - variables not in hand that bear on the call - not a restatement of claims already present or counterarguments already known.
Each unknown has a bearing rating (how much it would move the call) AND an obtainability classification (resolvable, at what cost / unobservable).
The high-bearing resolvable unknowns are flagged as the resolve-before-committing list.
Confidence is re-rated with an explicit delta and the reason for its size; a small or zero delta is accepted as a valid selective result, not forced downward.
The output is the known-unknowns ledger artifact, not prose.
No overclaiming: the evidence is moderate (M) and transferred from human studies; claim a selective overconfidence-reduction and calibration aid, not a measured gain in AI decision outcomes, and never as interval-width repair (see evidence/dossier.md).

Evidence

Tier M (governing; moderate). The move has direct controlled support: Walters, Fernbach, Fox and Sloman (2017, Management Science 63(12): 4298-4307) ran three studies in which prompting people to list unknowns before stating confidence substantially reduced overconfidence, beat the classic consider-the-alternative technique in a head-to-head comparison, and acted selectively - cutting confidence where judges were overconfident while leaving well-calibrated and underconfident domains alone. The underlying mechanism (neglect of missing information inflates confidence and judgment extremity) is independently confirmed by the omission-neglect program (Kardes et al., 2006; the Sanbonmatsu-Kardes line), with Koriat, Lichtenstein and Fischhoff (1980) as the adjacent antecedent. It is M and not S because the exact prompt is a single research line with no named independent replication, the comparison claim comes from that same paper, and the populations are students and online panels on trivia, not field decisions. It is NOT interval-width medicine: post-estimate reasoning prompts were largely ineffective for interval overprecision (Ferretti, Montibeller and von Winterfeldt, 2023). All evidence is transferred from human subjects; none studies an agent-produced ledger, which is why the skill ships as an M-tier calibration aid with hard walls, never as a measured decision-outcome improver. Full grading, sources, and caveats: evidence/dossier.md.

Examples

See references/EXAMPLE.md for a completed known-unknowns ledger on a real decision.

Deep dive: worked example

A full worked run (the shared Northwind scenario)

Known-Unknowns Ledger - Worked Example

A completed run of the consider-the-unknowns skill on a real, consequential judgment. This is the quality bar a generated ledger should meet.

Uses the shared recurring scenario (Northwind, a B2B SaaS weighing a self-serve free-tier launch) so examples across skills read as one coherent product. Here the move is applied to a thin-evidence COMPETITIVE READ that the free-tier strategy quietly rests on - the kind of one-off judgment with no clean base-rate class where omission neglect inflates confidence. Where think-red-team-light would generate the strongest KNOWN case against the read, and think-what-would-have-to-be-true would test the conditions a favored option needs, this skill enumerates the relevant evidence Northwind does NOT have and re-rates the confidence against that gap. See docs/internal/AUTHORING.md.

Judgment under consideration

Judgment: “Wedge, the AI-native competitor that just announced a free tier, is not a real threat to our self-serve launch - they will run out of funding before they convert enough free users to matter, so we should proceed with our launch plan unchanged.”
Original confidence: High (the exec team put it around 80%).
Why an unknowns audit fits here: This is a one-off competitive read built from a thin, coherent slice of evidence - one funding rumor, a slick launch page, and the team’s prior that AI-native startups burn fast. There is no clean reference class for “will THIS specific competitor’s free tier beat ours,” the call is consequential (it sets whether Northwind’s launch plan changes), and the 80% feels driven as much by a tidy story as by evidence actually held. Classic omission-neglect setup.

Wall check (confirmed before building the ledger)

No genuine reference class exists - “this named competitor’s free-tier outcome” is a single case, not a base-rate class. (A reference-class read of “free-tier B2B startups that survive 24 months” would be a useful SEPARATE input, but it does not answer the specific threat read.)
Not an interval-width task - this is a discrete competitive judgment, not a numeric estimate with a stated range.
The unknowns are NOT all cheap to resolve - some (Wedge’s runway, their conversion curve) are partly obtainable; the decisive ones are not.
The judge is plausibly OVERconfident, not underconfident - 80% on a story-driven read is the target case.
Consequential and not trivially reversible - changing or not changing the launch plan is a real bet.

The unknowns

The relevant variables that bear on the “Wedge is not a real threat” judgment but are not in hand.

Unknown variable	Bearing (how much it would move the call)	Obtainability (resolvable, at what cost / unobservable)	Resolve before committing?
Wedge’s actual runway and burn rate	High - the entire judgment rests on “they run out of funding first”	Partly resolvable - approximate via headcount, recent raise filings, hiring pace; exact numbers are private	Yes - this is the load-bearing assumption and it is partly knowable
Wedge’s free-to-paid conversion rate and activation curve	High - a high conversion rate flips “won’t convert enough to matter”	Mostly unobservable - private; only inferable later from public traction signals	No (cannot resolve now) - becomes an irreducible unknown the confidence must absorb
Whether Wedge’s AI-native product does a job our workflow does NOT	High - if it solves a different, larger job, funding math is the wrong lens entirely	Resolvable - hands-on trial of their free tier, win/loss notes, customer interviews	Yes - cheap and decisive
Our own free-tier conversion rate at launch (unlaunched, so unknown)	High - “they won’t convert enough” implicitly assumes WE will; we have no data either	Resolvable only by launching or by a small pilot; not knowable pre-launch	Partial - run a limited pilot rather than assume
Whether a well-funded incumbent backs or acquires Wedge	High - removes the funding constraint the whole read depends on	Unobservable - depends on third-party intent	No - irreducible
Switching costs and lock-in for buyers choosing between us and Wedge	Medium - shapes how fast either free tier compounds	Resolvable - customer and prospect interviews	Yes - feeds the read and is cheap
Wedge’s geographic / segment focus vs ours	Medium - they may not contest our core segment at all	Resolvable - their site, job posts, customer logos	Yes - cheap
Macro funding climate over the next 12 months	Low-medium - shifts the base rate of “startups run out of money”	Partly resolvable - public market signals; coarse	No - too coarse to move this specific read

Resolve-before-committing list (high bearing AND resolvable): (1) Triangulate Wedge’s runway from headcount, filings, and hiring pace; (2) actually trial Wedge’s free tier and run win/loss interviews to learn whether it does a DIFFERENT job; (3) run a limited free-tier pilot of our own to get a real conversion signal rather than assuming one; (4) interview prospects on switching costs and segment overlap. These four are obtainable in days-to-weeks and each could move the call.
Irreducible unknowns (high bearing but unobservable): Wedge’s private conversion curve, and whether a deep-pocketed incumbent backs or buys them. No amount of work closes these now; they are the uncertainty the confidence has to honestly carry.

Re-rated confidence

Re-rated confidence: Medium - roughly 50-55%, pending the four resolve-first items.
Delta from original: Down from ~80% to ~50-55% (a large drop).
Reason the delta is this size: The original 80% rested almost entirely on ONE unknown treated as known - “they will run out of funding first” - while three other high-bearing variables (whether Wedge does a different job, what their conversion curve is, what OUR conversion will actually be) were never in the frame at all. The drop is large precisely because so much of the confidence was the comfort of a coherent story rather than evidence held. Note the selectivity: the parts of the read that ARE well-grounded (Northwind’s existing distribution and brand in its core segment) are not discounted - the audit cuts the confidence that came from omission, not the confidence that came from evidence. If the four resolve-first items come back favorable, confidence can rise again on a firmer basis; if they come back unfavorable, the launch plan should change, which is exactly the decision this ledger protects.

Evidence caveat (ships with every ledger)

This ledger is a calibration aid, not a measured improvement in the decision’s outcome. Its evidence tier is M (moderate): the move that listing relevant unknowns before stating confidence reduces overconfidence selectively (where the judge is overconfident) has direct controlled support (Walters, Fernbach, Fox and Sloman, 2017) plus an independent mechanism line (omission neglect; Kardes et al., 2006). It is M and not S because the exact prompt rests on a single research line with no named independent replication, on student and online-panel populations. All of that evidence is transferred from human studies and has not been validated on AI agents. The move does NOT repair the width of a numeric interval (Ferretti, Montibeller and von Winterfeldt, 2023), and it is no substitute for a real reference class when one exists. Treat the re-rated ~50-55% as a more honest confidence, not a proven-more-accurate one.

Note how this differs from its neighbors on the same Northwind question. think-red-team-light would build the strongest KNOWN argument that Wedge IS a threat - working from evidence and arguments already available. think-what-would-have-to-be-true would take the favored option (proceed unchanged) and test the conditions it needs to hold. think-evidence-vs-inference-sort would classify the claims already on the table into evidence, inference, and assumption. This skill does none of those: it enumerates the relevant evidence Northwind does NOT have, rates each gap by bearing and obtainability, and lets the size and resolvability of the gap - not a stronger argument - re-rate the confidence. The deliverable is a mapped absence and an honest confidence, not a counter-case and not a sorted set of present claims.

Grounding: the full evidence dossier

What the research does and does not show, with graded sources

Evidence Dossier: Consider the Unknowns

The single source of truth for the consider-the-unknowns skill. The SKILL.md, the sidecar (skill.meta.yml), and the eval cases all derive from this file. If a claim is not here, it does not belong in the skill. Promoted from frameworks/_proposed/consider-the-unknowns/dossier.md and admitted as a Build at tier M.


Skill	`thinking-framework-skills.consider-the-unknowns` (installable name `think-consider-the-unknowns`)
Family	assumption-and-belief-challenge
Evidence tier	M governing (moderate; see “What the evidence shows” for what holds it at M and not S, and what keeps it from dropping to P)
Confidence	Moderate that listing the relevant unknowns before stating confidence reduces overconfidence selectively (where the judge is overconfident); low that any specific human-subject effect transfers unchanged to AI agents
Status	cand (admitted from the v0.7.0 phase-2 reconciliation; the flagged single-source publication record verified real and citable, so the M tier holds)

1. The mechanism (what actually does the work)

Before committing to a judgment, explicitly enumerate the relevant evidence you do NOT have - the variables that bear on the question but are unknown, unobserved, or unobtainable - then weigh the gap that absence leaves and re-rate your confidence against it. The mechanism rests on a robust finding: confidence tracks the strength and coherence of the evidence actually considered, and people systematically neglect what is missing (the consumer-psychology literature names the bias omission neglect). A judgment built on three observations feels as solid as one built on thirty if the three cohere. The corrective move is to make the absence itself an object of attention: list the relevant unknowns, classify each as resolvable (obtainable, and at what cost) or genuinely unobservable, rate each unknown’s bearing on the judgment, and then re-state confidence against the mapped gap.

The durable cognitive move is not the worksheet. It is turning attention onto the absent evidence - the variables outside the material in front of you that would change the call if you had them - and letting the size and obtainability of that gap discipline the confidence you report. Two things distinguish it from every other belief-challenge move: the object is material that is ABSENT (not claims made, assumptions held, counterarguments available, or failures imaginable), and the output is a re-rated confidence justified by a mapped gap rather than a stronger argument or a defended conclusion.

The output is a known-unknowns ledger: the judgment under consideration; the relevant unknown variables, each with its bearing on the judgment and its obtainability; a flag on the unknowns worth resolving before committing; and a re-rated confidence with the delta and the reason its size is what it is.

2. Lineage

The phrase “known unknowns” is generic, popularized by Donald Rumsfeld’s February 2002 Department of Defense briefing, with earlier engineering and project-risk usage; the method name used here is the descriptive prompt drawn from the research line itself, not a brand. The measured, debiasing content of the term comes from three threads:

The core: Daniel J. Walters (INSEAD), Philip M. Fernbach, Craig R. Fox and Steven A. Sloman, “Known Unknowns: A Critical Determinant of Confidence and Calibration,” Management Science (2017). The INSEAD Knowledge piece “How Managers Can Curb Overconfidence” is the practitioner gloss. Walters and Fernbach (2021), “Investor Memory of Past Performance Is Positively Biased and Predicts Overconfidence,” PNAS, continues the overconfidence line in the field.
The independent mechanism line: David M. Sanbonmatsu, Frank R. Kardes and colleagues on omission neglect, from the early 1990s through Kardes et al. (2006), “Debiasing Omission Neglect,” Journal of Business Research.
The antecedent: Koriat, Lichtenstein and Fischhoff (1980), “Reasons for Confidence,” the original demonstration that what you list before judging changes calibration.
The neighboring tradition this is NOT: consider-the-opposite (Lord, Lepper and Preston 1984) and multiple-explanation (Hirt and Markman 1995), the consider-an-alternative family that the Walters studies used as the comparison arm; in this library that move lives in red-team-light.

Attribution: Daniel J. Walters, Philip M. Fernbach, Craig R. Fox and Steven A. Sloman (2017, Management Science); the mechanism line via the Sanbonmatsu-Kardes omission-neglect program. Not branded; no trademark.

3. What the evidence shows, and what it does NOT show

The honest grade is M (moderate), verified 2026-06-11. This candidate was admitted on a single external-research source with the publication record explicitly flagged for verification; that record verifies as real and citable, and on inspection it is stronger than its single-run provenance suggested.

What the record supports.

Daniel J. Walters, Philip M. Fernbach, Craig R. Fox and Steven A. Sloman (2017), “Known Unknowns: A Critical Determinant of Confidence and Calibration,” Management Science 63(12): 4298-4307 (published online December 2016). Three studies. Study 1 (correlational): participants who spontaneously thought about unknowns while answering two-alternative trivia questions were less overconfident. Studies 2 and 3 (experimental): prompting participants to list unknowns before stating confidence reduced overconfidence substantially, outperformed the classic consider-the-alternative debiasing technique in a head-to-head comparison, and selectively reduced confidence in domains where participants were overconfident while leaving well-calibrated and underconfident domains unaffected. Grade: M. Controlled, top-journal, and measured on the actual move this skill proposes, including the comparison arm that establishes its distinctness from counterargument generation.
Frank R. Kardes, Steven S. Posavac, David H. Silvera, Maria L. Cronley, David M. Sanbonmatsu, Paul Herr and Murali Chandrashekaran (2006), “Debiasing Omission Neglect,” Journal of Business Research 59: experiments showing judges form overly extreme, confident evaluations from limited evidence and that sensitivity to missing information can be increased (considering judgment criteria before receiving information; rating presented and missing attributes before evaluating). Grade: M for the mechanism. This is an independent second research program (the Sanbonmatsu and Kardes omission-neglect line, running from the early 1990s) confirming that neglect of missing information inflates judgment extremity and confidence, and that surfacing the missing information moderates it.
Asher Koriat, Sarah Lichtenstein and Baruch Fischhoff (1980), “Reasons for Confidence,” Journal of Experimental Psychology: Human Learning and Memory 6(2): 107-118. Listing reasons contradicting one’s chosen answer improved calibration; listing supporting reasons did not. Adjacent rather than direct: it operates on known counter-reasons, not on unknowns, so it frames the tradition but is NOT counted toward this grade.

What the record does NOT support.

No named independent direct replication of the exact consider-the-unknowns prompt was found, so the intervention’s controlled record remains a single research line (one author team, one paper, multiple experiments). The “more effective than consider-the-alternative” claim comes from that same paper, not from a meta-analysis.
The samples are the usual student and online-panel populations on trivia and general-knowledge domains, not field decisions.
It is NOT interval-width medicine. The nearest controlled test of post-estimate reasoning prompts in interval elicitation found them largely ineffective; mechanical widening plus re-elicitation and calibration training did better - Silvia Ferretti, Gilberto Montibeller and Detlof von Winterfeldt (2023), “Testing the Effectiveness of Debiasing Techniques to Reduce Overprecision in the Elicitation of Subjective Continuous Probability Distributions,” European Journal of Operational Research 304(2): 661-675. Consider-the-unknowns was not itself tested there, but this caps the claim: the M grade applies to item- and domain-level confidence judgments, not to the width of a numeric interval. Do not sell this move as interval-width repair.

These caps are exactly what hold the grade at M rather than S; the existence of the independent omission-neglect mechanism line is what keeps it from dropping to P.

Excluded-claims check: no numeric effect size is restated in this dossier, because the published abstract reports direction and substantiality without a portable single number and no independent quantification was found; every named finding above maps to an author-and-year source.

4. Transferred-evidence flag (required honesty for this library)

Every study above is on human subjects - students and online panels answering trivia and general-knowledge questions - in lab settings. None studies a known-unknowns ledger produced by or with an AI agent, nor whether an agent-produced ledger improves a human’s calibration. The evidence is transferred from human contexts and not validated for AI-augmented use. The AI value is mechanical and modest: an agent makes the move cheap to run, forces the discipline (a real enumeration of relevant absent variables, an obtainability classification, an honest re-rate rather than a reflexive confidence), and produces a durable, inspectable artifact - benefits that do not depend on the human-subject effect transferring unchanged. The skill ships honestly as an M-tier calibration aid for thin-evidence one-off judgments, with hard walls against the cases where the evidence does not reach (a real reference class exists; interval-width repair; unknowns cheap to resolve; an already-calibrated or underconfident judge).

5. When it works / when it fails (drives the eval negative cases and “When NOT to Use”)

Works best when:

A one-off, consequential judgment is being made from thin or partial evidence where no usable base-rate class exists: a competitive read, a market-entry call, a diagnosis from incomplete data, a hiring or vendor judgment where the file is mostly silence.
The judge is plausibly overconfident, and it is worth knowing how much of the confidence rests on evidence actually held versus on not having looked at what is missing. The controlled evidence shows the effect is usefully selective - it reduced confidence where people were overconfident and left well-calibrated domains alone, closer to targeted medicine than to a blanket confidence tax.

Fails or misleads when (poor-fit / anti-patterns):

A genuine reference class exists. Base rates beat introspective gap-mapping; route to reference-class-forecasting.
The task is a numeric interval that is too narrow. Post-estimate reasoning prompts are largely ineffective for interval overprecision (Ferretti, Montibeller and von Winterfeldt 2023); mechanical widening, re-elicitation, and calibration training do better. Do not sell this as interval-width repair.
The unknowns are cheap to resolve. Go get the information. Cataloging resolvable unknowns instead of resolving them is procrastination with a worksheet.
The judge is already well calibrated or underconfident. The evidence shows little effect there, and on an anxious, underconfident call the ledger only feeds doubt.
The decision is low-stakes and reversible. An unknowns audit on a two-way door is process for its own sake; triage one-way-vs-two-way-door first, and time-box the ledger when you do run it - the space of things you do not know is unbounded, so the ledger covers RELEVANT unknowns, not all unknowns.
The real task is something a sibling skill owns. Testing a specific named assumption is what-would-have-to-be-true; imagining how a plan fails is premortem; generating the strongest known case against a favored view is red-team-light.

6. Why it is a skill here (distinctness)

The distinct durable move: enumerate the absent. Every shipped neighbor in this family operates on PRESENT material, which is the wall in each case:

evidence-vs-inference-sort (medium overlap, the closest shipped skill): classifies the claims you HAVE into evidence, inference, and assumption, and flags uncited claims inside the material under audit. It never generates the list of relevant variables OUTSIDE the material. The shared mechanism is only the re-rate-confidence ending, well under the overlap ceiling.
what-would-have-to-be-true (medium): backward-chains from a favored option to the conditions required for it to be best, then tests the named conditions. Direction-committed and proposition-based; consider-the-unknowns is direction-agnostic and absence-based, run before a favored conclusion hardens.
red-team-light (low-medium): generates the strongest KNOWN case against the focal view. The distinctness experiment exists in print - Walters and colleagues ran consider-the-alternative as their comparison arm and found the unknowns prompt both different in mechanism and stronger in effect. Counterargument generation and absence inventory are different operations.
premortem (low): imagines concrete failure events; an unknown is an unobserved variable, not an imagined outcome.
ladder-of-inference-check, decision-journal, reference-class-forecasting, fermi-estimation, linear-model-aggregation (low): audit the climb on data you used, record the prediction for later review, substitute the outside view, decompose a number, or mechanize a repeated judgment. None enumerates absence.

No sequence of shipped skills produces the artifact: chaining evidence-vs-inference-sort into what-would-have-to-be-true still only processes stated claims and a favored option’s conditions; the unknowns ledger requires the enumerate-the-absent move itself.

Hard walls against the estimation/calibration cluster siblings vetted alongside it:

dialectical-bootstrapping: a numeric device - estimate, assume the estimate is wrong, re-estimate, average the two numbers. It produces a better point estimate; consider-the-unknowns produces no number and no average. Disjoint artifacts; both can run on the same judgment without redundancy.
interval-calibration-check: trains the confidence scale itself across many items via equivalent bets and scored feedback. Consider-the-unknowns is a per-judgment, qualitative audit with no betting device and no feedback cadence; and the Ferretti boundary result keeps their lanes separate in the literature itself.
estimate-talk-estimate: a multi-judge social protocol behind the facilitation wall. Consider-the-unknowns is single-judge and fully agent-executable; it does not touch that wall.

7. Sources

Daniel J. Walters, Philip M. Fernbach, Craig R. Fox and Steven A. Sloman, “Known Unknowns: A Critical Determinant of Confidence and Calibration,” Management Science 63(12): 4298-4307 (2017, online December 2016). https://doi.org/10.1287/mnsc.2016.2580 . Three studies; studies 2-3 are controlled tests of the exact prompt, showing listing unknowns before stating confidence substantially reduced overconfidence, beat the consider-the-alternative arm head-to-head, and acted selectively where judges were overconfident. The single most direct controlled support. Experimental study. (M; student and online-panel subjects, single research line.)
Frank R. Kardes, Steven S. Posavac, David H. Silvera, Maria L. Cronley, David M. Sanbonmatsu, Paul Herr and Murali Chandrashekaran, “Debiasing Omission Neglect,” Journal of Business Research 59 (2006). https://www.sciencedirect.com/science/article/abs/pii/S0148296306000324 . Independent omission-neglect program showing limited evidence inflates judgment extremity and confidence and that surfacing missing information moderates it. The second, independent mechanism line. Experimental study. (M for the mechanism.)
Asher Koriat, Sarah Lichtenstein and Baruch Fischhoff, “Reasons for Confidence,” Journal of Experimental Psychology: Human Learning and Memory 6(2): 107-118 (1980). https://iipdm.haifa.ac.il/images/publications/Asher_Koriat/1980-Koriat-Lichtenstein-Fischhoff-JEPHLM.pdf . The antecedent: listing contradicting reasons improved calibration, listing supporting reasons did not. Adjacent (operates on known counter-reasons, not unknowns); frames the tradition, NOT counted toward the grade. Experimental study.
Silvia Ferretti, Gilberto Montibeller and Detlof von Winterfeldt, “Testing the Effectiveness of Debiasing Techniques to Reduce Overprecision in the Elicitation of Subjective Continuous Probability Distributions,” European Journal of Operational Research 304(2): 661-675 (2023). https://www.sciencedirect.com/science/article/pii/S0377221722003046 . Boundary evidence: in interval elicitation, post-estimate reasoning-style debiasers were not very effective; mechanical range-stretching worked better. Caps the claim to item/domain confidence, not interval width. Experimental study.
INSEAD Knowledge, “How Managers Can Curb Overconfidence” (practitioner gloss on the known-unknowns line). https://knowledge.insead.edu/marketing/how-managers-can-curb-overconfidence . Practitioner-popular.

Excluded on the evidence rule: no numeric effect size or single quantified result is asserted as fact in this dossier, because the published record reports direction and substantiality without a portable single figure and no independent quantification was found. Every named finding maps to an author-and-year source. The “more effective than consider-the-alternative” claim is reported as coming from the same single paper, not from a meta-analysis, and the move is explicitly NOT claimed for interval-width repair.

Was this page helpful?

Thinking Framework Skills v0.8.0 · 56 frameworks