Decision Journal

A decision journal records a decision at the moment it is made - the decision, the rationale, the predicted outcome, an explicit confidence level, and the assumptions it rests on - so it can be reviewed later against what actually happened. The load-bearing move is timing: the prediction is fixed in place before the outcome is known, while the reasoning and the felt confidence are still uncontaminated by the result. That contemporaneous record is the one reliable defense against hindsight bias (“I knew it all along”), it separates decision quality from outcome quality, and it supplies the recorded-prediction half of a calibration loop. The output is a structured decision journal entry, not prose, designed to be reopened and scored later.

When to Use

At the point of committing to a consequential, genuinely uncertain decision: a launch, hire, investment, vendor choice, bet, or strategic direction.
When you can still state an honest prediction, confidence level, and set of assumptions before the outcome is known.
When you intend to review the decision later against reality - it pairs with an after-action review (record now, review later).
When you want to build calibration over many decisions, not judge a single one.

When NOT to Use

To review a decision after the outcome is already known. That is an after-action review (think-after-action-review); writing a “journal entry” after the result back-fits the prediction, the exact distortion this method exists to prevent.
For trivial or fully reversible (two-way-door) decisions. The capture overhead is not worth it for a cheaply undone choice with no real uncertainty.
When no expectation can be honestly stated. If there is no genuine prediction, confidence, or assumption to record, the entry is theater.
To surface only the conditions that must hold for a choice to be right. That is think-what-would-have-to-be-true; the journal captures the whole decision plus a predicted outcome and confidence for calibration.
As a substitute for actually reviewing entries later. A journal nobody revisits delivers no calibration; if there is no intent to review, skip it.

Instructions

When asked to record a decision journal entry, follow these steps:

Confirm the decision is worth journaling, and not already resolved. State the decision in one or two sentences. If it is trivial, fully reversible, or the outcome is already known, say so and stop (and for a known outcome, point to think-after-action-review).
Record the situation and the rationale. The context as it stands now, and why this choice is being made. Capture the real reasoning, including what makes it a hard call.
Record the options considered and not taken. The alternatives on the table and the short reason each was set aside. This is part of the rationale a later review will check.
State the predicted outcome and an explicit confidence. What is expected to happen by a stated date, and a confidence as a percentage or band (for example, “70%”). The confidence must be explicit; a prediction without a stated confidence cannot be calibrated.
Name the assumptions the decision rests on. The specific things being taken as true that, if wrong, would change the call. List them so they can be checked later, not left implicit.
Set a review date and the signals to check then. A concrete date for an after-action review, and the observable signals that will tell you whether the prediction held. This is what makes the entry reviewable rather than a diary note.
Emit the decision journal entry per references/TEMPLATE.md.

Output Format

Use the template in references/TEMPLATE.md. The deliverable is the filled decision journal entry - dated, with a predicted outcome, an explicit confidence, named assumptions, and a review date - not a prose essay.

Quality Checklist

Before finalizing, verify:

The entry is dated and written before the outcome is known (not a back-fitted record of a decision already resolved).
The predicted outcome is concrete and tied to a stated date.
Confidence is explicit (a percentage or band), so it can be scored later.
The assumptions the decision rests on are named specifically, not left implicit.
A review date and the signals to check then are set, so the entry is actually reviewable.
The output is the decision journal entry artifact, not prose.
No overclaiming: the entry enables honest review and calibration; it is not claimed to make the decision turn out better (see evidence/dossier.md).

Evidence

Tier P (practitioner). The mechanism rests on well-supported findings - hindsight bias is real and hard to suppress by willpower (Fischhoff 1975), and recorded probabilistic predictions plus feedback improve calibration over time (Tetlock & Gardner 2015) - but the practice itself, popularized by practitioners (Duke 2018), has limited controlled evidence that journaling improves decision outcomes. The journal’s value is in making later review honest and calibration possible, not in guaranteeing better results. The evidence is transferred from human studies and practice and has not been validated for AI-augmented use. Full grading, sources, and caveats: evidence/dossier.md.

Examples

See references/EXAMPLE.md for a completed decision journal entry on a real decision.

Deep dive: worked example

A full worked run (the shared Northwind scenario)

Decision Journal Entry - Worked Example

A completed run of the decision-journal skill on a real, consequential decision, recorded at the moment of commitment. This is the quality bar a generated entry should meet. Note that it is written before the outcome is known - there is no hindsight in it.

Entry header

Date recorded: 2026-05-31
Decision: Launch a self-serve free tier of Northwind (our B2B SaaS) in 6 weeks to accelerate top-of-funnel growth ahead of the Q3 board target, rather than doubling down on the existing sales-led motion.
Decided by / owner: VP Product (with sign-off from CEO and VP Sales)
Reversibility: One-way door in practice. Pulling a free tier after launch is possible but damages trust and is publicly visible, so we are treating it as hard to reverse.

Situation and rationale

Sales-led growth is hitting a ceiling; pipeline is flat quarter over quarter and the Q3 board target assumes a step change in top-of-funnel volume that the current motion will not deliver. A self-serve free tier is the fastest lever we believe we have to widen the funnel. We are choosing it now, over the safer alternatives, because we judge the growth risk of doing nothing to be larger than the cannibalization risk of acting, and because the window to show movement before the Q3 board meeting is closing. It is a hard call: the team is split on whether free will feed paid or eat it.

Options considered and not taken

Option	Why set aside
Launch a self-serve free tier in 6 weeks (chosen)	Fastest path to the funnel step change the board target needs; we judge upside to outweigh cannibalization risk
Double down on sales-led motion (hire 3 more reps)	Slower ramp, will not move top-of-funnel volume in time for Q3; ceiling concern unaddressed
Time-limited free trial instead of a free tier	Lower cannibalization risk but weaker top-of-funnel pull; team judged it insufficient for the target
Do nothing this quarter, revisit in Q4	Misses the board target window; the growth risk of inaction is the thing we are most worried about

Prediction

Predicted outcome: By 6 months after launch (end of Q4), self-serve drives 3x sign-up volume and a measurable, positive lift in net-new paid conversions, without a net decline in sales-led paid MRR.
Confidence: 60%. (We are genuinely split; the 60% reflects real disagreement, not a round-number placeholder.)
What would surprise me: Net-new paid MRR falling below the pre-launch trend - that would mean free cannibalized paid rather than feeding it, and would prove the cautious half of the team right.

Assumptions this rests on

The free tier can be scoped so that the highest-value features stay behind paid, so free feeds paid rather than replacing it.
Support and infrastructure cost per free user stays within the modeled ceiling (free users do not swamp the team or blow the unit economics).
Sales comp and lead-routing are redesigned before launch, so reps do not undercut or resent the motion.
The 6-week timeline is enough to ship a thin but secure self-serve billing-and-auth path.
“Free” attracts users who fit our ICP, not a different, never-converting segment.

Review

Review date: 2026-11-30 (end of Q4, 6 months post-launch). Reopen this entry via an after-action review.
Signals to check then: net-new paid MRR trend vs pre-launch; free-to-paid conversion rate among ICP-fit users; support tickets and cloud cost per free user vs the model; sales-team behavior and pipeline complaints; whether sign-up volume actually hit 3x.

Value added: recorded at commitment, this entry fixes a 60% confidence and five named assumptions in place before the outcome is known. When the Q4 after-action review reopens it, the team will compare results against what they actually predicted - not a hindsight-rewritten version where “we always knew free would work” or “it was obviously a mistake.” That honest expected-vs-actual comparison, and the calibration it feeds over many such decisions, is the value; the entry makes no claim that journaling made the launch itself succeed.

Grounding: the full evidence dossier

What the research does and does not show, with graded sources

Evidence Dossier: Decision Journal

The single source of truth for the decision-journal skill. The SKILL.md, the sidecar (skill.meta.yml), and the eval cases all derive from this file. If a claim is not here, it does not belong in the skill.


Skill	`thinking-framework-skills.decision-journal` (installable name `think-decision-journal`)
Family	meta-thinking-and-reflection
Evidence tier	P (practitioner - useful method, limited controlled evidence; see “What the evidence shows”)
Confidence	Moderate that recording-at-decision-time defeats hindsight distortion; low that journaling alone improves decision outcomes
Status	draft (first authored 2026-05-31, against discovery corpus)

1. The mechanism (what actually does the work)

A decision journal records a decision at the moment it is made - the decision itself, the rationale, the expected outcome, the confidence, and the assumptions it rests on - so that the decision can be reviewed later against what actually happened. The load-bearing move is timing: the record is written before the outcome is known, while the reasoning and the felt confidence are still intact and uncontaminated by the result. It does three things:

Defeats hindsight bias by pre-committing the prediction. Once an outcome is known, memory silently rewrites what “we knew all along.” A contemporaneous record fixes the prediction in place so the later review compares against what was actually expected, not a back-fitted version of it. This is the durable cognitive move; the notebook is just the means.
Makes confidence and assumptions explicit and checkable. Forcing a stated confidence level and a list of named assumptions turns a vague feeling (“I’m pretty sure”) into something that can later be scored, which is the raw material for calibration over many decisions.
Separates decision quality from outcome quality. Because the rationale is recorded independent of the result, a later review can ask “was this a good decision given what was knowable then?” rather than only “did it work out?” - guarding against outcome bias (judging a good process by a bad roll of the dice, or vice versa).

The mechanism is what we implement. “Decision journal” is the descriptive packaging; the durable move is contemporaneous capture of decision, rationale, predicted outcome, confidence, and assumptions, structured for honest later review.

2. Lineage

Decision journaling as a calibration practice is most associated with poker player and decision researcher Annie Duke, Thinking in Bets (2018), and with investor/practitioner writing (e.g. Shane Parrish / Farnam Street’s “decision journal” templates, drawing on Daniel Kahneman’s advice to record reasoning at decision time).
The cognitive bias it counters - hindsight bias (“I knew it all along”) is well-established in the psychology literature: Fischhoff, B. (1975), “Hindsight is not equal to foresight,” Journal of Experimental Psychology: Human Perception and Performance, 1(3), 288-299; Roese, N. J., & Vihari, K. (2012), “Hindsight bias,” Perspectives on Psychological Science, 7(5), 411-426.
Calibration of subjective probability through recorded prediction and feedback draws on the forecasting and calibration literature: Lichtenstein, Fischhoff & Phillips (1982); and the practice of scored forecasting (Tetlock & Gardner, Superforecasting, 2015), where writing down a probability and later scoring it is what makes calibration possible.

No trademark. “Decision journal” is a generic, descriptive term in common practitioner use; no attribution is required and none is claimed. We name the skill descriptively and cite the lineage here.

3. What the evidence shows, and what it does NOT show

This is the honest core of the dossier. The skill must not overclaim.

What is reasonably supported:

Hindsight bias is real, robust, and hard to suppress by willpower. Fischhoff (1975) and the subsequent literature show that once people know an outcome, they systematically misremember their prior predictions as closer to the truth. A contemporaneous record is one of the few reliable defenses, because it removes the need to remember the prediction at all.
Calibration improves with recorded predictions and feedback. The forecasting literature (Tetlock; Lichtenstein et al.) shows that people who record probabilistic predictions and then score them against outcomes can become measurably better calibrated over time. A decision journal supplies exactly the recorded-prediction half of that loop.

What is NOT shown (the caveat that keeps the skill honest):

There is no strong controlled evidence that keeping a decision journal improves decision outcomes. The supported claims are about (a) the existence of hindsight bias and (b) calibration improving under recorded-prediction-plus-feedback regimes. The leap from “journaling defeats hindsight bias and enables calibration” to “journaling makes your decisions turn out better” is plausible but not established by controlled study. The journal’s value is in making later review honest and calibration possible, not in guaranteeing better results.
Much of the practitioner enthusiasm (Duke, Parrish, investor blogs) is experience-based, not experimental. It is credible and internally consistent, but it is testimony, not a trial. Grade it as practitioner evidence, not strong evidence.
A journal only pays off if it is actually reviewed later. A drawer full of unreviewed entries delivers none of the calibration benefit; it is a cost with no return. The evidence for benefit is contingent on the review half of the loop happening.

Net grade: P (practitioner). A genuinely useful method with a sound mechanistic rationale (hindsight bias is real; recorded predictions enable calibration) but limited controlled evidence that the practice itself improves outcomes. The skill should claim the honest-review and calibration-enabling benefits and explicitly disclaim a guaranteed-better-outcome benefit.

4. Transferred-evidence flag (required honesty for this library)

All of the evidence above comes from human subjects and human practitioners - lab studies of hindsight bias, forecasting tournaments, and the experience of human decision-makers keeping journals. There is no direct study of decision journals authored by, or with, an AI agent, and none of whether an AGENT-produced journal entry improves a human’s later calibration. The evidence supporting this skill is therefore transferred from human contexts, not validated for AI-augmented use. This skill must say so. Treat the AI value as: the agent makes the capture cheap and immediate (the moment-of-decision friction is what kills the practice in humans), enforces the full structure (decision, rationale, predicted outcome, confidence, assumptions), and produces a durable, reviewable artifact - benefits that do not depend on the unproven outcome-improvement claim.

5. When it works / when it fails (drives the eval negative cases)

Works best when:

The decision is consequential and not yet resolved, so a genuine prediction can still be recorded before the outcome is known.
An honest expectation, confidence level, and set of assumptions can actually be stated (there is real uncertainty and a real basis for a prediction).
There is an intention to review the entry later, against the actual outcome (it pairs with an after-action review).

Fails or misleads when (poor-fit / anti-patterns):

Reviewing a decision after the outcome is already known - that is an after-action review (think-after-action-review), which compares expected-vs-actual after the fact. The decision journal records the expectation at decision time precisely so that the later AAR has something honest to compare against. Writing a “journal entry” after you know the result is back-fitting, the exact distortion the method exists to prevent. (Anti-trigger and the key near-miss.)
Trivial or fully reversible (two-way-door) decisions - the capture overhead is not worth it for a choice you can cheaply undo or that has no meaningful uncertainty.
When no expectation can be honestly stated - if there is no real prediction, confidence, or assumption to record (the outcome is already determined, or the “decision” is a formality), there is nothing to calibrate against and the entry is theater.
Surfacing the conditions that must hold for a choice to be right is a different tool (think-what-would-have-to-be-true); the journal captures the whole decision plus a predicted outcome and confidence for later calibration, not just the load-bearing conditions.
As a substitute for actually reviewing entries - a journal nobody revisits delivers no calibration. If there is no intent to review, the cost is unrecovered.

6. Output artifact

The skill must emit a decision journal entry, not prose: a dated, structured record with the decision, the situation/context, the rationale, the options considered and not taken, the predicted outcome, an explicit confidence (a percentage or band), the named assumptions the decision rests on, and a review date with the expected signals to check then. The artifact is the deliverable; a discursive write-up is not. It is designed to be reopened later (by an after-action review) and scored against reality.

7. Sources

Fischhoff, B. (1975), “Hindsight is not equal to foresight,” J. Experimental Psychology: Human Perception and Performance 1(3):288-299 - establishes hindsight bias.
Roese, N. J., & Vihari, K. (2012), “Hindsight bias,” Perspectives on Psychological Science 7(5):411-426 - review of the robustness of the effect.
Duke, A. (2018), Thinking in Bets - decision journaling, separating decision quality from outcome quality, practitioner source.
Tetlock, P., & Gardner, D. (2015), Superforecasting - recorded probabilistic predictions plus scoring as the basis for calibration.
Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982), “Calibration of probabilities,” in Kahneman, Slovic & Tversky (eds.), Judgment under Uncertainty - calibration literature.

Verification status: citations 1-2 and 4-5 are standard and well-attested. Citation 3 (Duke) and the Parrish/Farnam Street decision-journal templates are practitioner sources, credible but experience-based; they are cited as lineage and as the origin of the practice, not as controlled evidence of outcome improvement. The “no strong controlled evidence that journaling improves outcomes” statement in section 3 reflects the absence of such a study in the discovery corpus as of authoring; it should be re-checked before any public-facing claim, but the honest default is to not claim outcome improvement.

Thinking Framework Skills v0.3.0 · 38 frameworks