Plus/Delta, Start/Stop/Continue, Rose/Thorn/Bud

Status: Folded · Evidence: P · Family: Meta-thinking and reflection · Verdict: fold (2026-06-03)

Use instead: After Action Review

What it is

These are three of the most popular “fast retro” formats: simple sorting templates that take a finished experience (a sprint, an event, a day, a meeting) and route it into a fixed handful of named buckets. Plus/Delta sorts observations into two columns - what went well (Plus) and what to change next time (Delta, the math symbol for change, chosen to avoid the negative charge of “minus” or “cons”). Start/Stop/Continue sorts behaviors into three: things to begin doing, things to stop doing, and things that worked and should keep going. Rose/Thorn/Bud sorts into a positive (Rose), a problem (Thorn), and a nascent opportunity worth growing (Bud).

Strip the branding and the three formats are the same move: a categorized debrief. Take a bounded experience, label each observation as keep / change (and, in the three-bucket variants, an explicit new-thing-to-try), and walk out with a short list of forward actions. The bucket labels differ cosmetically - two columns versus three, “Delta” versus “Stop + Start,” a horticultural metaphor versus a verb pair - but the underlying generative beats are identical: gather what happened, sort it into “sustain” and “change” piles, and convert the change pile into next steps.

The honest description has to separate the durable move from the packaging. The durable move is the structured retrospective: pause after an experience, surface observations as a team, decide what to keep and what to alter, and commit to actions. That move is real and valuable. The Plus/Delta, Start/Stop/Continue, and Rose/Thorn/Bud labels are interchangeable presets for the data-sorting step inside that retrospective - in the canonical five-phase retrospective structure (Derby and Larsen), they are activities that live in the “gather data” and “generate insight” phases. They are facilitation conveniences, not three separate cognitive mechanisms.

When it helps / when it misleads

As a stance, a fast retro format helps when a team needs a low-friction container to convert a just-finished experience into a short list of changes. The two-or-three-bucket templates are genuinely good at this: they are fast, require no training, and the deliberately balanced labels (a Plus for every Delta, a Rose for every Thorn) keep a debrief from collapsing into pure complaint. For a new team, a short timebox, or a lightweight check-in, that simplicity is the whole point.

They mislead or waste effort when:

The sort is mistaken for the analysis. Dropping observations into Plus and Delta columns categorizes them; it does not explain why the deltas happened or which matter most. The most common failure mode is a tidy two-column board that names problems without diagnosing causes or prioritizing them, so the “action” is a wish list with no owner.
There was a prior expectation to compare against. The moment the experience is a plan with a predicted outcome (a launch, an incident, a campaign, a forecast), the sharper move is to compare expected against actual and diagnose the gap - which is the after-action review, a more structured and far better-evidenced version of the same debrief. A generic keep/change sort throws away the expectation anchor that makes a debrief diagnostic rather than impressionistic.
“Change” or “Start” produces an intention with no follow-through. Left unstructured, the change column yields items with no owner, no trigger, and no review - a whiteboard photo, not a change. The discipline that makes a debrief actionable (name the sustain-and-change actions, assign them, set a follow-up) is exactly what the structured debrief adds and the bare buckets omit.
The format is treated as three distinct tools. Choosing between Plus/Delta and Start/Stop/Continue and Rose/Thorn/Bud is choosing a label set, not a method. Teaching them as three separate skills implies a depth of difference that is not there; the real choice is whether to run a structured debrief at all, and against what expectation.

What the evidence says

The honest grade for the move these formats name - “sort a finished experience into keep / change buckets and act on it” - is P (practitioner), and this entry has to be careful, because fast-retro write-ups routinely borrow robustness from a neighboring, far-better-tested method.

What the record supports. Plus/Delta, Start/Stop/Continue, and Rose/Thorn/Bud are real, named, widely-used practitioner formats with documented lineages (academic feedback practice, a 1970s leadership feedback model, and a Scouting / design-thinking reflection respectively) and near-universal use in Agile retrospectives and facilitation. As lightweight containers for a team debrief they are plausibly useful and very widely adopted. That is the extent of the directly-supported claim: respectable practitioner heuristics for facilitating a retrospective.

What the record does NOT support, and the laundering trap. There is no controlled or comparative study I can locate that measures any of these specific formats - “does sorting into Plus/Delta produce better outcomes than Start/Stop/Continue, or than no retro?” - against an alternative. The academic literature notes the opposite: despite the importance of retrospectives in Agile, the topic has received relatively little rigorous study, and existing studies report a lack of retrospective measurements confirming whether outcomes were achieved (Andriyani, Hoda and Amor, Learning in the Large, 2018). Practitioner sources describe and compare the formats; they do not test them.

The figures quoted to make fast retros look evidence-backed measure adjacent constructs, and attaching them to the bucket templates would be exactly the transferred-evidence laundering this library exists to prevent:

The after-action review has its own large meta-analytic effect. Keiser and Arthur (2021), a bare-bones meta-analysis of 61 studies (107 effect sizes, 915 teams and 3,499 individuals) in the Journal of Applied Psychology, report d = 0.79 for the AAR across training evaluation criteria - larger than many of the largest training-method effects on record. That number belongs to the AAR, a structured expected-versus-actual debrief with disciplined action conversion, not to a two-column keep/change sort. Counting it toward Plus/Delta or Start/Stop/Continue would launder a more rigorous cousin’s robustness onto a looser frame the cousin did not test. It is cited here to mark the boundary and to locate where the “debriefs work” evidence actually lives, not to lift this grade.

The conservative governing grade is therefore P: recognized practitioner formats, no direct controlled evidence for their own move, with the AAR meta-analysis explicitly not counted toward them because it measures a more structured method.

Transfer caveat (required). All of the adjacent evidence is from human teams in training, military, clinical, and field settings. None of it studies Plus/Delta, Start/Stop/Continue, or Rose/Thorn/Bud (or Agile retrospectives generally) performed by or with an AI agent. The evidence is transferred from human contexts and is not validated for AI-augmented use.

Excluded figures (required). The frequently repeated claim that retrospectives deliver “42% higher quality” (and similar unattributed percentage claims about retro effectiveness) traces to no nameable primary source I could locate; it circulates in consulting and vendor write-ups without an author-and-year citation, so under the evidence rule it is excluded as fact and does not influence the grade. The only sourced quantity in this neighborhood is the AAR meta-analytic d = 0.79 (Keiser and Arthur 2021), which measures a different, more structured construct and is recorded above as a boundary marker only.

Why it is / is not a skill here

Verdict: Fold into after-action-review. The registry records the reasoning in one line - “Subsumed: retro modes / AAR variants” - and the concrete case follows.

The Build burden is to name one distinct, durable cognitive move that no shipped skill produces, and to show no existing skill already produces it above the roughly 20 percent overlap ceiling. These formats fail that burden because the move they share - reconstruct a finished experience, sort observations into sustain and change, convert the change pile into forward actions, and emit a debrief that ends in next steps - is the same three-beat generative move the after-action review runs at higher rigor. AAR’s “expected versus actual, then why, then sustain or change” and a retro’s “Plus and Delta, then so what, then next time” overlap far above the ceiling; AAR adds exactly the two things the bare buckets lack and that the evidence calls for: an explicit prior expectation to diagnose the gap against, and a disciplined conversion of lessons into owned, trackable sustain-and-change actions. A fast retro is AAR with the expectation anchor removed and the depth left optional. The schema target resolves: after-action-review is status: shipped.

The three formats are also not distinct from each other. Plus/Delta is the keep/change sort with two buckets; Start/Stop/Continue splits “change” into “stop” and “start” and keeps “continue”; Rose/Thorn/Bud renames keep/problem/opportunity with a metaphor. Choosing among them is choosing a label set for one data-sorting step, not selecting a separate mechanism - which is why they are bundled into a single registry entry rather than three. A library that ships distinct cognitive moves cannot honestly ship three near-identical bucket presets as three skills, nor one of them as a fourth member of the same family the AAR already anchors.

Why fold rather than reject or recipe. A reject would be less informative than a fold - the move is real, famous, and worth locating, so the honest service is to point the reader to where it already lives (the after-action review, run on any retrospective, with a Plus/Delta or Start/Stop/Continue sort as an optional gather-data preset). It is not a clean recipe either: a recipe is a fixed multi-skill chain, whereas these are interchangeable label sets that map onto a single existing move. The fold is the precise verdict, and it sets the precedent the catalog then reused: when What / So What / Now What came up for vetting it was folded into the same target, citing this very entry as the prior decision that “the catalog has already folded this exact class of generic retro format into AAR.” The learning value of the NO: a famous, genuinely useful retro template is not automatically a skill. Plus/Delta, Start/Stop/Continue, and Rose/Thorn/Bud are ways of holding a debrief, and a library that ships artifacts rather than facilitation labels documents them and folds them into the better-evidenced after-action review.

Lineage and who to read

The three formats have separate, well-documented origins that converge on the same retrospective use.

Plus/Delta is an adaptation of academic feedback practice into a two-column keep/change debrief; the “Delta” label borrows the mathematical symbol for change to keep the second column constructive rather than a list of complaints. It has no single nameable inventor and is generic and unbranded.

Start/Stop/Continue is a 1970s leadership-feedback model. One of the earliest documented attributions runs through Thomas DeLong (Harvard Business School), who credits learning it from Phil Daniels, a psychology professor at Brigham Young University; the format itself is generally listed as origin-unknown and is unbranded and widely reused.

Rose/Thorn/Bud is widely traced to the Boy Scouts of America as a reflection prompt (one positive, one negative, one new goal or insight) and was later adapted as a design-research and facilitation method - the LUMA Institute packages it as a named technique in its human-centered-design toolkit. The horticultural metaphor (rose, thorn, bud) is generic; LUMA’s particular curricular packaging is the only place a trademark-style claim could attach, and this entry uses the generic descriptive form.

For the canon these formats live inside, read Esther Derby and Diana Larsen, Agile Retrospectives: Making Good Teams Great (Pragmatic Bookshelf, 2006), which defines the five-phase retrospective (set the stage, gather data, generate insight, decide what to do, close) into whose data-gathering step these formats slot, and Norman L. Kerth, Project Retrospectives: A Handbook for Team Reviews (Dorset House, 2001), the source of the retrospective Prime Directive. For the honest read on effectiveness, read the after-action-review meta-analysis (Keiser and Arthur 2021), the better-evidenced cousin this entry folds into, and note the Agile-retrospective literature’s own observation that rigorous outcome measurement of specific formats is largely absent. All three formats are generic descriptive templates in common use; no trademark gates them, so this entry is documented descriptively and is not flagged as branded.

Named sources

Esther Derby and Diana Larsen, Agile Retrospectives: Making Good Teams Great (Pragmatic Bookshelf, 2006). The standard practitioner canon for team retrospectives; defines the five-phase structure into whose “gather data” and “generate insight” steps Plus/Delta, Start/Stop/Continue, and Rose/Thorn/Bud slot as interchangeable activities. Foundational / practitioner, no controlled evaluation of the formats. (P)
Norman L. Kerth, Project Retrospectives: A Handbook for Team Reviews (Dorset House, 2001). Origin of the modern project retrospective and the Prime Directive; practitioner guidance, not an effectiveness study. (P)
Nathanael L. Keiser and Winfred Arthur Jr., “A Meta-Analysis of the Effectiveness of the After-Action Review (or Debrief) and Factors That Influence Its Effectiveness,” Journal of Applied Psychology 106(7) (2021): 1007-1032. Bare-bones meta-analysis of 61 studies (107 ds, 915 teams, 3,499 individuals); the AAR improves training criteria at d = 0.79. Evidence for the after-action review - the more structured cousin this entry folds into - cited to mark where the debrief evidence lives, NOT counted toward the fast-retro formats. (S/M, for AAR - not for these formats)
Yngve Lindsjorn, Viktoria Stray et al. / Andriyani, Hoda and Amor, “Learning in the Large - An Exploratory Study of Retrospectives in Large-Scale Agile Development” (XP 2018 / arXiv 1805.10310). Notes that despite their importance, retrospectives have received relatively little rigorous study and that existing work lacks measurements confirming whether retrospective outcomes were achieved. Locates the honest ceiling of the retro evidence base. (P, exploratory)
LUMA Institute, Rose, Thorn, Bud (human-centered design method toolkit). The design-research packaging of the Scouting reflection prompt as a named facilitation technique; a practitioner method reference, not an evidentiary one. (P)

Excluded under the evidence rule: the often-repeated claim that retrospectives deliver “42% higher quality” (and similar unattributed “retros improve X by N%” framings) traces to no nameable primary source and is excluded; it does not influence the grade. The only sourced quantity in this neighborhood is the AAR meta-analytic d = 0.79 (Keiser and Arthur 2021), which measures a more structured method than these bucket formats and is recorded as a boundary marker only.

Was this page helpful?

Thinking Framework Skills v0.8.0 · 56 frameworks