After Action Review

Most retrospectives are an unstructured “how did it go?” that produces venting and vague lessons. The After Action Review imposes the structure that actually carries the benefit: compare what was expected to what actually happened, diagnose why the gaps occurred (in both directions), and convert that into what to sustain and what to change, specifically and with owners. The expected-vs-actual comparison is load-bearing; without a recorded expectation there is nothing to learn against, only hindsight narrative. The output is an after-action review, and it must be blameless to work.

When to Use

A project, launch, sprint, experiment, or incident has finished.
There was a real expectation to compare the outcome against (or you can reconstruct it honestly).
The team wants to learn, not assign fault.

When NOT to Use

Before the event (that is a premortem).
As a status update or a summary of what shipped.
When it will become blame (it stops working the moment people fear fault).
When there is genuinely no expectation and none can be honestly reconstructed.

Instructions

When asked to run an after-action review, follow these steps:

State what was expected. The goal, the plan, and the predicted outcome going in. If it was not recorded, reconstruct it honestly and say you are doing so - do not back-fit it to the result.
State what actually happened. The real outcome, concretely, including what went better than expected, not only what went worse.
Diagnose the gaps. For each meaningful difference (both directions), ask why - the actual cause, not the convenient one. Keep it blameless.
Capture what to sustain. The things that worked and should be repeated. Do not skip this for the failures.
Specify what to change. Concrete, owned changes for next time - not vague “communicate better.”
Emit the after-action review per references/TEMPLATE.md.

Output Format

Use the template in references/TEMPLATE.md. The deliverable is the expected-vs-actual review with sustain/change actions, not prose and not a status report.

Quality Checklist

Before finalizing, verify:

What was expected is stated (recorded or honestly reconstructed), separate from the outcome.
What actually happened includes the better-than-expected, not only failures.
Each gap has a real “why,” kept blameless.
What to sustain is captured, not just what to change.
Changes are specific and owned, not vague.
The output is the AAR artifact, not a status update.

Evidence

Tier S. A meta-analysis of team and individual debriefs (Tannenbaum & Cerasoli, 2013) found structured debriefs improve performance substantially (effect size ~0.79), and the effect depends on the structure - intent, comparison to expectations, specific takeaways - not on merely holding a meeting (origin: US Army TC 25-20). Unstructured retros do not carry this. Evidence is for human teams, transferred to AI use, not AI-validated. Full grading: evidence/dossier.md.

Examples

See references/EXAMPLE.md for a completed after-action review.

Deep dive: worked example

A full worked run (the shared Northwind scenario)

After Action Review - Worked Example

A completed run of think-after-action-review, on the shared Northwind scenario. This is the quality bar a generated AAR should meet.

Northwind is a B2B SaaS. They ran the two-week ICP free-to-paid conversion pilot (the one the decision skills called for and WOOP committed them to). Here the skill reviews it.

Event

The two-week gated free-tier conversion pilot, run before deciding whether to build the full free tier.

What was expected

Going in (recorded in the WWHTBT ledger): the pilot would show ICP free-to-paid conversion at or above the breakeven the cost model needed (~5%), telling us cleanly whether to build.

What actually happened

ICP free-to-paid came in at ~3% - below breakeven. But unexpectedly, the onboarding fixes shipped for the pilot lifted the existing paid-trial conversion by 4 points - a bigger near-term win than the free tier would have been. The pilot also ran a few days late.

Why the gaps (both directions)

Difference (expected vs actual)	Why it happened (real cause, blameless)
Free-to-paid lower than expected (3% vs 5%)	The free cohort skewed less ICP-fit than the gating assumed; the value gate was too loose
Paid-trial conversion rose (not predicted)	The onboarding work done to support the pilot fixed friction that was the actual constraint all along
Pilot ran late	Billing edge cases took longer than scoped, as the reference-class estimate had warned

Sustain (what worked, repeat it)

Running a cheap, time-boxed pilot before a one-way-door build - it changed the decision and cost two weeks, not a quarter.
Instrumenting ICP fit on signups from day one.

Change (specific and owned)

Change for next time	Owner
Default to fixing the funnel before assuming a packaging gap; the pilot showed packaging was not the constraint	PM (Growth)
Tighten the ICP value gate before any future free-access test	PM (Growth)
Budget billing/auth work at the reference-class rate (~1.5x), not the inside estimate	Eng lead

Note: the value is the expected-vs-actual structure. A “how did the pilot go?” retro would have recorded “free tier didn’t convert, oh well.” The AAR caught the more important, unexpected result - the onboarding fix was the real win - and turned it into an owned change of strategy.

Grounding: the full evidence dossier

What the research does and does not show, with graded sources

Evidence Dossier: After Action Review

Single source of truth for the after-action-review skill. The SKILL.md, sidecar, and evals derive from this. A strong-evidence anchor and the library’s first reflection-family skill.


Skill	`thinking-framework-skills.after-action-review` (installable name `think-after-action-review`)
Family	meta-thinking-and-reflection
Evidence tier	S (strong meta-analytic support for structured debriefs)
Confidence	High that structured debriefs improve performance; unstructured retros do not
Status	draft (authored 2026-05-31 from the discovery corpus)

1. The mechanism (what actually does the work)

Most retrospectives are an unstructured “how did it go?” that produces venting and vague lessons. The After Action Review imposes a structure that is the source of the effect: compare what was expected to what actually happened, diagnose why the gaps occurred (in both directions - what went better than expected, too), and convert that into what to sustain and what to change, specifically and with owners. The “expected vs actual” comparison is the load-bearing move: without a recorded expectation, there is nothing to learn against, only hindsight narrative.

It must be blameless to work: the moment it becomes about fault, people stop surfacing the real causes.

2. Lineage

Originated in the US Army (TC 25-20, “A Leader’s Guide to After-Action Reviews”). The broader research base is the team-debrief literature.

No trademark. Named descriptively.

3. What the evidence shows, and what it does NOT show

Strongly supported (the S): a meta-analysis of team and individual debriefs (Tannenbaum & Cerasoli, 2013) found structured debriefs improve performance substantially - on the order of a ~20-25% improvement, effect size around 0.79. The effect depends on structure (intent, comparison to expectations, specific behavioral takeaways), not on simply holding a meeting.

What it does NOT show: that an unstructured retro helps (it largely does not), or that an AAR fixes anything if its “lessons” are vague and never change behavior. The brand “AAR” is practitioner packaging; the mechanism (structured, expectation-anchored, blameless debrief) is what carries the evidence.

4. Transferred-evidence flag

The evidence is from human teams debriefing real events, not AI-augmented use. Transferred, not AI-validated. The AI value: a model asked “how did it go?” produces a tidy summary; this skill forces the expected-vs-actual comparison, the why, and specific owned sustain/change items - the structure the evidence says is the active ingredient - and produces a durable artifact.

5. When it works / when it fails

Works best when: a project, launch, sprint, experiment, or incident has finished and there was a real expectation to compare against; a team wants to actually learn and is willing to be blameless.

Fails or misleads when (poor-fit / anti-patterns):

No recorded expectation to compare actual against (the central failure - you get hindsight narrative, not learning). Reconstruct the expectation honestly if it was not written down.
It turns into blame, so people stop surfacing real causes.
“Lessons” stay vague and unowned, changing no future behavior.
Capturing only failures and skipping what to sustain.
Run before the event (that is a premortem) or as a status update (wrong tool).

6. Output artifact

An after-action review: what was expected, what actually happened, why the gaps occurred (both better and worse than expected), what to sustain, and what to change - each change specific and owned. Blameless throughout.

7. Sources

US Army, TC 25-20 - the original After-Action Review guide.
Tannenbaum, S. I., & Cerasoli, C. P. (2013) - meta-analysis of debriefs and performance (ES ~0.79).

Verification status: the Tannenbaum & Cerasoli meta-analysis and its effect-size are well-attested; confirm the exact figure against the paper before a public quantified claim. The “structure is the active ingredient” point is the honest core.

Thinking Framework Skills v0.3.0 · 38 frameworks