Ladder of Inference Check
People move from observation to action up an invisible, near-instant ladder: all available data, the data they select, the meaning they add, the assumptions they make, the conclusion they draw, the action they take. The leaps feel like facts. This skill slows the climb back down for a given conclusion: it reconstructs the rungs, exposes where selection and interpretation crept in, flags the riskiest leap, and tests at least one alternative interpretation of the same data. The output is an annotated reasoning trace, not prose.
When to Use
Section titled “When to Use”- A conclusion feels certain but actually rests on interpretation.
- A disagreement traces to two people reading the same situation differently.
- Auditing a contested inference, including the agent’s own conclusion.
- As a step in a reasoning-audit workflow, often after an evidence/inference sort.
When NOT to Use
Section titled “When NOT to Use”- The conclusion follows from direct, verifiable data with no real inferential leap.
- To generate ideas or options (wrong tool).
- On trivial matters where the climb does not matter.
- As a way to dress up and defend the conclusion you already hold.
Instructions
Section titled “Instructions”When asked to check the ladder of inference, follow these steps:
- State the conclusion being examined, in one sentence.
- List the observable data available - everything that could have been noticed, not just what was used.
- Identify the data actually selected - which subset the conclusion was built on, and what was left out.
- Surface the meaning and assumptions added - the interpretation laid on the selected data, and the assumptions that interpretation requires.
- Flag the riskiest rung - the single leap most likely to be wrong or selective.
- Test an alternative interpretation - give at least one credible different reading of the same data, and what it would imply.
- Emit the reasoning trace per
references/TEMPLATE.md.
Output Format
Section titled “Output Format”Use the template in references/TEMPLATE.md. The deliverable is the reconstructed ladder with a flagged rung and an alternative interpretation, not prose.
Quality Checklist
Section titled “Quality Checklist”Before finalizing, verify:
- Observable data includes what was left out, not only what was used.
- The meaning and assumptions added are stated explicitly, separate from the data.
- The single riskiest rung is named.
- At least one credible alternative interpretation is given.
- The trace tests the climb rather than rationalizing the conclusion.
- The output is the reasoning trace artifact, not prose.
Evidence
Section titled “Evidence”Tier P. The ladder is an influential practitioner model (Argyris; Senge, The Fifth Discipline, 1990). The underlying phenomenon (people select data and add interpretation, then treat conclusions as fact) is well grounded in cognitive psychology, but the ladder itself is a map, not a validated intervention, and evidence is transferred from human contexts, not AI-validated. Full grading: evidence/dossier.md.
Examples
Section titled “Examples”See references/EXAMPLE.md for a completed reasoning trace.
Deep dive: worked example
Section titled “Deep dive: worked example”A full worked run (the shared Northwind scenario)
Reasoning Trace - Worked Example
Section titled “Reasoning Trace - Worked Example”A completed run of think-ladder-of-inference-check, on the shared Northwind scenario. This is the quality bar a generated trace should meet.
Northwind is a B2B SaaS. Here the skill audits a conclusion the team reached during the free-tier debate.
Conclusion under examination
Section titled “Conclusion under examination”- “Our trials are not converting because the product is missing a free tier.”
The ladder, reconstructed
Section titled “The ladder, reconstructed”| Rung | Content |
|---|---|
| Observable data available | Trial-to-paid conversion is down 6 points QoQ; two lost deals mentioned wanting to “try before buying”; support tickets up; a competitor launched a free tier; onboarding completion rate also dropped; sales hired three new reps last quarter. |
| Data actually selected | Only the two “try before buying” comments and the competitor’s free tier. |
| Meaning added | Prospects need a no-commitment way to try, and competitors are setting that expectation. |
| Assumptions | That the two comments are representative; that the conversion drop is about packaging, not onboarding or new-rep ramp; that a free tier is what “try before buying” means. |
| Conclusion | We need a free tier to fix conversion. |
Riskiest rung
Section titled “Riskiest rung”- Data actually selected. The conclusion ignores two strong alternative signals present in the same data: onboarding completion also dropped, and three new reps were ramping. Either could explain the conversion fall without any packaging change.
Alternative interpretation
Section titled “Alternative interpretation”- A credible different reading: the conversion drop is an onboarding and ramp problem, not a packaging gap. The two “try before buying” comments are real but may be a small, vocal minority.
- What it would imply: fix the onboarding funnel and support new-rep ramp first (cheap, reversible), and verify the packaging hypothesis with data before building a free tier. (This hands off cleanly to
think-evidence-vs-inference-sortandthink-what-would-have-to-be-true.)
Note: the value is exposing that the conclusion was built on two anecdotes plus a competitor move, while three other data points pointing elsewhere were silently dropped at the “selected data” rung.
Grounding: the full evidence dossier
Section titled “Grounding: the full evidence dossier”What the research does and does not show, with graded sources
Evidence Dossier: Ladder of Inference Check
Section titled “Evidence Dossier: Ladder of Inference Check”Single source of truth for the
ladder-of-inference-checkskill. The SKILL.md, sidecar, and evals derive from this. If a claim is not here, it does not belong in the skill.
| Skill | thinking-framework-skills.ladder-of-inference-check (installable name think-ladder-of-inference-check) |
| Family | assumption-and-belief-challenge |
| Evidence tier | P (practitioner model; influential but thin controlled evidence) |
| Confidence | High that unexamined inference is real; the ladder is a useful map, not a validated instrument |
| Status | draft (authored 2026-05-31 from the discovery corpus) |
1. The mechanism (what actually does the work)
Section titled “1. The mechanism (what actually does the work)”People move from raw observation to action up an invisible, near-instant “ladder”: all observable data -> the data we select -> the meaning we add -> the assumptions we make -> the conclusions we draw -> the beliefs that harden -> the actions we take. The leaps feel like facts. The check slows the climb back down: for a given conclusion, reconstruct the rungs, expose where selection and interpretation crept in, and deliberately ask what other data and interpretations were available. The work is done by making a silent inferential jump inspectable, so it can be challenged before it drives action.
2. Lineage
Section titled “2. Lineage”- Chris Argyris developed the ladder of inference (organizational learning, 1970s-1990). Peter Senge popularized it in The Fifth Discipline (1990). Widely taught via The Systems Thinker and similar practitioner sources.
No trademark. Named descriptively.
3. What the evidence shows, and what it does NOT show
Section titled “3. What the evidence shows, and what it does NOT show”Supported: the underlying phenomenon - that people unconsciously select data and add interpretation, then treat conclusions as facts - is well grounded in cognitive psychology (selective attention, confirmation bias).
NOT shown: the ladder itself is a practitioner model, not a validated intervention. There is little controlled evidence that running the ladder improves decisions or reasoning accuracy by a measured amount. Grade it P: a useful map of where inference goes wrong, presented as a disciplined trace, not as a proven technique.
4. Transferred-evidence flag
Section titled “4. Transferred-evidence flag”Evidence is from human organizational-learning and cognitive contexts, not AI-augmented use. Transferred, not AI-validated. The AI value: a model asserts conclusions fluently and can be asked to reconstruct its own (or another’s) climb from data to conclusion, exposing the selection and assumptions, which is a direct counter to confident, unsourced conclusions.
5. When it works / when it fails
Section titled “5. When it works / when it fails”Works best when: a conclusion feels certain but rests on interpretation; a disagreement traces to different readings of the same situation; auditing a contested inference (the agent’s own or another’s).
Fails or misleads when (poor-fit / anti-patterns):
- The conclusion follows from direct, verifiable data with no real inferential leap.
- It is used to rationalize the existing conclusion rather than genuinely test the climb (the central failure mode).
- “Selected data” is presented as if it were all the data.
- The “alternative interpretations” step is skipped, which is where most of the value is.
- Trivial matters where the climb does not matter.
6. Output artifact
Section titled “6. Output artifact”An annotated reasoning trace: the conclusion at the top, the ladder rungs reconstructed beneath it (observable data available -> data actually selected -> meaning added -> assumptions -> conclusion), the single riskiest rung flagged, and at least one credible alternative interpretation of the same data.
7. Sources
Section titled “7. Sources”- Argyris, C. (1990). Overcoming Organizational Defenses; and Argyris’s action-science work on the ladder of inference.
- Senge, P. (1990). The Fifth Discipline - popularized the ladder.
- The Systems Thinker - practitioner write-up of the ladder and how to use it.
Verification status: Argyris/Senge attribution is well-attested. Treat effectiveness as practitioner-reported; do not attach a quantified effect in public-facing text.