After Action Review
Most retrospectives are an unstructured “how did it go?” that produces venting and vague lessons. The After Action Review imposes the structure that actually carries the benefit: compare what was expected to what actually happened, diagnose why the gaps occurred (in both directions), and convert that into what to sustain and what to change, specifically and with owners. The expected-vs-actual comparison is load-bearing; without a recorded expectation there is nothing to learn against, only hindsight narrative. The output is an after-action review, and it must be blameless to work.
When to Use
Section titled “When to Use”- A project, launch, sprint, experiment, or incident has finished.
- There was a real expectation to compare the outcome against (or you can reconstruct it honestly).
- The team wants to learn, not assign fault.
When NOT to Use
Section titled “When NOT to Use”- Before the event (that is a premortem).
- As a status update or a summary of what shipped.
- When it will become blame (it stops working the moment people fear fault).
- When there is genuinely no expectation and none can be honestly reconstructed.
Instructions
Section titled “Instructions”When asked to run an after-action review, follow these steps:
- State what was expected. The goal, the plan, and the predicted outcome going in. If it was not recorded, reconstruct it honestly and say you are doing so - do not back-fit it to the result.
- State what actually happened. The real outcome, concretely, including what went better than expected, not only what went worse.
- Diagnose the gaps. For each meaningful difference (both directions), ask why - the actual cause, not the convenient one. Keep it blameless.
- Capture what to sustain. The things that worked and should be repeated. Do not skip this for the failures.
- Specify what to change. Concrete, owned changes for next time - not vague “communicate better.”
- Emit the after-action review per
references/TEMPLATE.md.
Output Format
Section titled “Output Format”Use the template in references/TEMPLATE.md. The deliverable is the expected-vs-actual review with sustain/change actions, not prose and not a status report.
Quality Checklist
Section titled “Quality Checklist”Before finalizing, verify:
- What was expected is stated (recorded or honestly reconstructed), separate from the outcome.
- What actually happened includes the better-than-expected, not only failures.
- Each gap has a real “why,” kept blameless.
- What to sustain is captured, not just what to change.
- Changes are specific and owned, not vague.
- The output is the AAR artifact, not a status update.
Evidence
Section titled “Evidence”Tier S. A meta-analysis of team and individual debriefs (Tannenbaum & Cerasoli, 2013) found structured debriefs improve performance substantially (effect size ~0.79), and the effect depends on the structure - intent, comparison to expectations, specific takeaways - not on merely holding a meeting (origin: US Army TC 25-20). Unstructured retros do not carry this. Evidence is for human teams, transferred to AI use, not AI-validated. Full grading: evidence/dossier.md.
Examples
Section titled “Examples”See references/EXAMPLE.md for a completed after-action review.
Deep dive: worked example
Section titled “Deep dive: worked example”A full worked run (the shared Northwind scenario)
After Action Review - Worked Example
Section titled “After Action Review - Worked Example”A completed run of think-after-action-review, on the shared Northwind scenario. This is the quality bar a generated AAR should meet.
Northwind is a B2B SaaS. They ran the two-week ICP free-to-paid conversion pilot (the one the decision skills called for and WOOP committed them to). Here the skill reviews it.
- The two-week gated free-tier conversion pilot, run before deciding whether to build the full free tier.
What was expected
Section titled “What was expected”- Going in (recorded in the WWHTBT ledger): the pilot would show ICP free-to-paid conversion at or above the breakeven the cost model needed (~5%), telling us cleanly whether to build.
What actually happened
Section titled “What actually happened”- ICP free-to-paid came in at ~3% - below breakeven. But unexpectedly, the onboarding fixes shipped for the pilot lifted the existing paid-trial conversion by 4 points - a bigger near-term win than the free tier would have been. The pilot also ran a few days late.
Why the gaps (both directions)
Section titled “Why the gaps (both directions)”| Difference (expected vs actual) | Why it happened (real cause, blameless) |
|---|---|
| Free-to-paid lower than expected (3% vs 5%) | The free cohort skewed less ICP-fit than the gating assumed; the value gate was too loose |
| Paid-trial conversion rose (not predicted) | The onboarding work done to support the pilot fixed friction that was the actual constraint all along |
| Pilot ran late | Billing edge cases took longer than scoped, as the reference-class estimate had warned |
Sustain (what worked, repeat it)
Section titled “Sustain (what worked, repeat it)”- Running a cheap, time-boxed pilot before a one-way-door build - it changed the decision and cost two weeks, not a quarter.
- Instrumenting ICP fit on signups from day one.
Change (specific and owned)
Section titled “Change (specific and owned)”| Change for next time | Owner |
|---|---|
| Default to fixing the funnel before assuming a packaging gap; the pilot showed packaging was not the constraint | PM (Growth) |
| Tighten the ICP value gate before any future free-access test | PM (Growth) |
| Budget billing/auth work at the reference-class rate (~1.5x), not the inside estimate | Eng lead |
Note: the value is the expected-vs-actual structure. A “how did the pilot go?” retro would have recorded “free tier didn’t convert, oh well.” The AAR caught the more important, unexpected result - the onboarding fix was the real win - and turned it into an owned change of strategy.
Grounding: the full evidence dossier
Section titled “Grounding: the full evidence dossier”What the research does and does not show, with graded sources
Evidence Dossier: After Action Review
Section titled “Evidence Dossier: After Action Review”Single source of truth for the
after-action-reviewskill. The SKILL.md, sidecar, and evals derive from this. A strong-evidence anchor and the library’s first reflection-family skill.
| Skill | thinking-framework-skills.after-action-review (installable name think-after-action-review) |
| Family | meta-thinking-and-reflection |
| Evidence tier | S (strong meta-analytic support for structured debriefs) |
| Confidence | High that structured debriefs improve performance; unstructured retros do not |
| Status | draft (authored 2026-05-31 from the discovery corpus) |
1. The mechanism (what actually does the work)
Section titled “1. The mechanism (what actually does the work)”Most retrospectives are an unstructured “how did it go?” that produces venting and vague lessons. The After Action Review imposes a structure that is the source of the effect: compare what was expected to what actually happened, diagnose why the gaps occurred (in both directions - what went better than expected, too), and convert that into what to sustain and what to change, specifically and with owners. The “expected vs actual” comparison is the load-bearing move: without a recorded expectation, there is nothing to learn against, only hindsight narrative.
It must be blameless to work: the moment it becomes about fault, people stop surfacing the real causes.
2. Lineage
Section titled “2. Lineage”- Originated in the US Army (TC 25-20, “A Leader’s Guide to After-Action Reviews”). The broader research base is the team-debrief literature.
No trademark. Named descriptively.
3. What the evidence shows, and what it does NOT show
Section titled “3. What the evidence shows, and what it does NOT show”Strongly supported (the S): a meta-analysis of team and individual debriefs (Tannenbaum & Cerasoli, 2013) found structured debriefs improve performance substantially - on the order of a ~20-25% improvement, effect size around 0.79. The effect depends on structure (intent, comparison to expectations, specific behavioral takeaways), not on simply holding a meeting.
What it does NOT show: that an unstructured retro helps (it largely does not), or that an AAR fixes anything if its “lessons” are vague and never change behavior. The brand “AAR” is practitioner packaging; the mechanism (structured, expectation-anchored, blameless debrief) is what carries the evidence.
4. Transferred-evidence flag
Section titled “4. Transferred-evidence flag”The evidence is from human teams debriefing real events, not AI-augmented use. Transferred, not AI-validated. The AI value: a model asked “how did it go?” produces a tidy summary; this skill forces the expected-vs-actual comparison, the why, and specific owned sustain/change items - the structure the evidence says is the active ingredient - and produces a durable artifact.
5. When it works / when it fails
Section titled “5. When it works / when it fails”Works best when: a project, launch, sprint, experiment, or incident has finished and there was a real expectation to compare against; a team wants to actually learn and is willing to be blameless.
Fails or misleads when (poor-fit / anti-patterns):
- No recorded expectation to compare actual against (the central failure - you get hindsight narrative, not learning). Reconstruct the expectation honestly if it was not written down.
- It turns into blame, so people stop surfacing real causes.
- “Lessons” stay vague and unowned, changing no future behavior.
- Capturing only failures and skipping what to sustain.
- Run before the event (that is a premortem) or as a status update (wrong tool).
6. Output artifact
Section titled “6. Output artifact”An after-action review: what was expected, what actually happened, why the gaps occurred (both better and worse than expected), what to sustain, and what to change - each change specific and owned. Blameless throughout.
7. Sources
Section titled “7. Sources”- US Army, TC 25-20 - the original After-Action Review guide.
- Tannenbaum, S. I., & Cerasoli, C. P. (2013) - meta-analysis of debriefs and performance (ES ~0.79).
Verification status: the Tannenbaum & Cerasoli meta-analysis and its effect-size are well-attested; confirm the exact figure against the paper before a public quantified claim. The “structure is the active ingredient” point is the honest core.