Skip to content

Measure OKR Grader: Storevine Campaigns Q3

Sample: measure-okr-grader. Storevine Campaigns Q3 2026 Cycle Review

Scenario

Storevine’s Campaigns team is closing the Q3 2026 cycle. The OKR set was authored in late June using /okr-writer (see the corresponding writer sample at library/skill-output-samples/foundation-okr-writer/sample_foundation-okr-writer_storevine_campaigns-q3.md). The cycle ended September 30. Final values are now in for KR1 and KR3; KR2’s 90-day cohorts are partially complete (the 60-day intermediate is available, the 90-day final is not yet observable).

The team wants a cycle review they can take to the Q4 planning workshop. The growth-pm runs /okr-grader with the original OKR set, the final and interim KR values, the cycle’s narrative, and the initiative status.

The cycle had a mixed result. KR1 hit hard. KR2 trended below projection. KR3 guardrail held. Initiative 2 (Templates v2) underperformed expectations and the team needs to decide whether to retire the thesis or carry it.

Source Notes:

  • Storevine is fictional
  • All metrics [fictional]
  • Pairs with library/skill-output-samples/foundation-okr-writer/sample_foundation-okr-writer_storevine_campaigns-q3.md
  • Aspirational OKR scoring follows the Google convention (0.6 to 0.7 sweet spot for aspirational)
  • Committed and compliance_or_safety scoring conventions are not exercised in this sample; see the workbench thread for committed and compliance_or_safety scoring examples

Prompt

/okr-grader
Original OKR: see sample_foundation-okr-writer_storevine_campaigns-q3.md
Cycle: Q3 2026 (July 1 to September 30, 2026)
OKR type: aspirational
Final KR values:
- KR1 (weekly active senders): 26% [fictional] (target was 28%, baseline 14%)
- KR2 (90-day campaign retention): 60-day cohort interim is 19% [fictional];
full 90-day target was 38% (baseline 22.8%); 90-day final not yet observable
- KR3 (guardrail, median CTR): 3.6% [fictional] (target was hold at or above
3.4%, baseline 3.4%)
Guardrails:
- Unsubscribe rate ended cycle at 0.81% [fictional] (baseline 0.72%, threshold 0.95%)
- Spam complaint rate ended at 0.05% [fictional] (baseline 0.04%, threshold 0.1%)
Initiative status:
- Initiative 1 (Lifecycle email triggers): shipped July 22 as planned. Open
rate on triggered emails 38% [fictional]. Click-through to send action
21% [fictional].
- Initiative 2 (Templates v2): shipped 8 of 12 templates by Sept 30
[fictional]. Adoption among first-time repeat senders was 31% [fictional],
below the 60% [fictional] target. Templates that did get adopted showed
no significant retention difference vs. generic templates (Q3 cohort).
- Initiative 3 (Campaign performance feedback loop): shipped Aug 28.
Adoption among repeat senders 64% [fictional]. Self-reported usefulness
4.1 / 5 [fictional].
Cycle context:
- Onboarding team's Q3 KR (boost Campaigns activation rate to 50%)
hit 47% [fictional]; close enough that we did not rebaseline KR1
mid-cycle.
- Mid-cycle, the team caught a bug in lifecycle trigger eligibility logic
(week 3 of August). Fixed in 4 days but ~120 merchants [fictional]
received duplicate triggers. No measurable impact on unsub rate.
Stakeholder interpretation: Campaigns leadership reads the cycle as a
clear win on activation-to-recurring-send (KR1) and a learning result
on retention-thesis (KR2 + Templates v2). Looking for grader's
independent read.

Output

# OKR Cycle Review: Storevine Campaigns Team, Q3 2026
## Summary
- Scope: Campaigns team
- Cycle: 2026-Q3 (July 1 to September 30, 2026)
- Overall interpretation: strong aspirational outcome on KR1, partial signal
on KR2, guardrail held. The KR2 90-day window extends past cycle close, so
a single objective-level number is misleading. The honest read is that the
team validated the engagement thesis (KR1) and invalidated the
templates-as-retention-driver thesis (Initiative 2).
- Evidence confidence: high for KR1 and KR3, medium for KR2 (interim only),
high for guardrails.
- Main learning: lifecycle triggers drove the engagement lift; templates v2
did not move retention. The Campaigns engagement-causes-retention belief
held at the leading-indicator level (KR1) but the lagging-indicator
evidence is not yet in.
## Scorecard
- Objective: Make Campaigns the marketing tool merchants reach for every
week.
- Rough qualitative read: validated lift in weekly engagement; retention
causality still open. Do not collapse this to a single average; KR2
is not yet observable at 90 days.
- KR1: Increase weekly active senders from 14% to 28% [fictional] of
Campaigns-activated merchants by 2026-09-30.
- Actual: 26% [fictional] (as_of 2026-09-30, trailing 4-week average).
- Score: 0.86 (aspirational). Calculation:
(26 - 14) / (28 - 14) = 12 / 14 = 0.857.
- Evidence confidence: high. Amplitude `campaign_sent` event well-
instrumented; cohort definition stable across cycle (Onboarding team's
near-miss kept the activated-merchant denominator within tolerance).
- Interpretation: lifecycle triggers (Initiative 1) materially moved the
metric. The triggered-email-to-send conversion (21%) translates cleanly
into the weekly cohort lift.
- KR2: Increase 90-day campaign retention from 22.8% to 38% [fictional] for
merchants who send their first campaign in Q3.
- Actual: not-yet-observable. Q3 cohorts' 90-day windows extend into late
December. Interim 60-day signal is 19% [fictional].
- Score: deferred. Final score available 2026-12-31. Best-estimate
projection if forced today is 0.10 to 0.20 (aspirational), well below
the 0.6 to 0.7 sweet spot. See Risks in Interpretation below; do not
treat the projection as the final score.
- Evidence confidence: medium. The 60-day interim is reliable; the 90-day
projection assumes Q2 cohort decay shape, which may not apply to a
cohort with different engagement composition.
- Interpretation: retention thesis is trending below projection. Hold the
final score until 2026-12-31 and grade then.
- KR3 (operational_health; indicator class guardrail): Hold median
campaign click-through rate at or above 3.4% [fictional] across all
Q3 sends.
- Actual: 3.6% [fictional].
- Score: pass (operational_health; threshold held within band).
Improved by 0.2 percentage points above the baseline.
- Evidence confidence: high.
- Interpretation: lifecycle triggers did not degrade send-quality. This
is meaningful. The most common failure mode for "send more" initiatives
is engagement collapse; the team avoided it. Per the indicator-class
`guardrail` rule, KR3 is reported as its own signal and is NOT averaged
into the primary objective score.
## Objective Interpretation
- Result: aspirational success on activation engagement (KR1); aspirational
shortfall (likely) on the retention thesis (KR2, score deferred). The
guardrail held.
- Why: Initiative 1 (lifecycle triggers) was the load-bearing bet for KR1
and it worked roughly as hypothesized. Initiative 2 (Templates v2) was
the load-bearing bet for KR2 and it under-shipped (8 of 12) and
under-adopted (31% vs 60% target). Even templates that were adopted did
not show a retention effect.
- What changed during the cycle: mid-cycle bug in lifecycle eligibility
(4 days, no measurable impact). No external surprises. Onboarding team's
near-miss on its own KR did not destabilize our KR1 denominator.
- What the score does and does not mean:
- Does mean: lifecycle triggers are a validated engagement lever and
should continue at full scale.
- Does mean: the templates-as-retention-driver thesis is weak. Adopted
templates showed no retention difference vs generic. This is
invalidating evidence, not just inconclusive evidence.
- Does NOT mean: engagement does not drive retention. The KR1 lift is
too recent to have produced measurable retention impact yet.
- Does NOT mean: Campaigns is a finished product. KR2 remains an open
thesis that needs a different bet next cycle.
## Evidence Quality
- Issue 1: KR2's 90-day window extends past the cycle boundary. Q3 grade
is a projection until late December.
- Impact: cannot give a confident final score this cycle. Forcing a
numeric score now would mislead readers.
- Recommended fix: in next cycle's `/okr-writer` run, either
(a) reframe KR2 as a 30-day or 60-day intermediate metric, or
(b) accept that retention KRs are graded one quarter in arrears and
document the convention explicitly. Choosing (b) is cleaner if the
team plans to keep retention as a primary KR.
- Issue 2: KR2's underperformance correlates with Initiative 2's
under-shipping (8 of 12 templates) and low adoption (31%). This confounds
the templates-as-retention-driver test: the team did not run the bet at
full scale.
- Impact: the invalidating evidence is partial. A team could argue
"we did not really try."
- Recommended fix: do not relitigate Templates v2 next cycle without a
clearer hypothesis about why adoption was low. The grader's read is
that the thesis is weak enough to retire; the team should explicitly
decide rather than carry an ambiguous bet forward.
- Issue 3: Mid-cycle lifecycle-trigger bug affected ~120 merchants with
duplicate sends. No measurable unsub or CTR impact, but the bug is
unrecorded in standard dashboards.
- Impact: low for this cycle. Documented here for the cycle log.
- Recommended fix: add a `triggered_email_dedup_failures` event so future
cycles can detect this class of bug in real time.
## Initiative Review
- Initiative 1 (Lifecycle email triggers):
- Linked to: KR1 primarily, KR2 secondarily.
- Status: shipped on time (July 22).
- Apparent contribution: high. Triggered emails reached 38% open rate
and 21% click-through-to-send rate, translating into the KR1 weekly
cohort lift.
- Recommendation: continue at full scale next cycle.
- Initiative 2 (Templates v2):
- Linked to: KR1 and KR2.
- Status: partially shipped (8 of 12 templates).
- Apparent contribution: low. Adopted templates showed no retention
effect in the Q3 cohort. Even at full ship, the underlying thesis is
not supported by the partial evidence.
- Recommendation: retire the current framing. If the team wants to
revisit, run `/hypothesis` first to sharpen the sub-thesis
(which segment, which template type, which trigger), and validate via
`/experiment-design` before baking into a KR.
- Initiative 3 (Campaign performance feedback loop):
- Linked to: KR2 primarily.
- Status: shipped late (August 28 vs target of mid-August).
- Apparent contribution: unclear. 64% adoption among repeat senders and
4.1 / 5 self-reported usefulness suggest merchant demand, but the
contribution to KR2 cannot be isolated from Initiative 1's effects.
- Recommendation: continue next cycle to gather more data; consider
promoting from supporting bet to candidate primary initiative if the
Q4 retention cohort shows a feedback-loop effect.
## Learning
- Validated assumptions:
- Lifecycle triggers materially increase weekly engagement.
- Engagement gains do not require sacrificing send quality (KR3 held).
- Empowered-team initiative ownership produced a learning-grade Q3, not
just a delivery-grade Q3.
- Invalidated assumptions:
- Templates v2 as the primary retention-driver lever. The adopted cohort
showed no retention effect. This is the strongest invalidating signal
of the cycle.
- Designer capacity to ship 12 seasonal templates in Q3 was overestimated;
8 of 12 was the actual capacity. Revise next cycle's planning.
- Surprises:
- Initiative 3 (feedback loop) shipped late but adopted high. Adoption
rate suggests merchant demand for post-send analytics is stronger than
the team expected. Worth a deeper investigation.
- The 60-day interim for KR2 came in lower than the Q2 baseline cohort
despite KR1 success. If engagement causally drives retention, the team
should have seen at least a small interim lift. The flat result is
itself information.
- Decision implications:
- Continue Initiative 1 at full scale next cycle.
- Retire Initiative 2's current framing.
- Promote Initiative 3 from supporting bet to candidate primary initiative
for KR2 next cycle.
- Reframe KR2 measurement boundary (Issue 1) before Q4 OKR drafting.
## Next-cycle Recommendations
1. Continue lifecycle triggers as a primary lever. Set Q4 KR1 target based
on Q3's 26% landing point, not Q3's pre-cycle 14% baseline.
2. Retire the Templates v2 thesis as currently framed. Do not re-run the
bet without sharpening the sub-thesis first.
3. Reframe KR2 to either 60-day retention (gradeable within cycle) or
90-day retention (graded one quarter in arrears). The former gives
clearer cycle accountability; the latter is methodologically truer to
the underlying behavior. Pre-decide before Q4 OKR drafting.
4. Investigate the engagement-without-retention finding. If KR1 lifts
continue without moving retention, the engagement-causes-retention
belief is wrong and the team's strategy needs to change.
5. Hand-off:
- `/lessons-log` for the Templates v2 invalidation (organizational
memory).
- `/retrospective` for the team-process learning on designer capacity
estimation.
- `/hypothesis` for "Campaigns engagement causally affects 12-month
merchant retention" with a defined invalidation criterion.
- `/instrumentation-spec` for the `triggered_email_dedup_failures`
event.
- `/okr-writer` for next-cycle drafting once KR2 measurement boundary
is resolved.
## Risks in Interpretation
- A naive 0.62 rough-average objective score would include a forced KR2
score that is not yet observable. Reading 0.62 as "the team hit the
aspirational sweet spot" would be misleading. The honest read is "0.86
on KR1, retention thesis still open at 90 days, guardrail held." Avoid
collapsing heterogeneous KR types into a single number.
- Initiative 2's under-shipping confounds the Templates v2 invalidation. A
reasonable counter-read is "we did not really test it." The grader's
stronger evidence is the no-retention effect among the 31% who did adopt;
that is the part that says the thesis is weak even at full adoption.
- Stakeholder framing of "clear win on activation, learning on retention"
is broadly correct but understates the invalidating signal on Templates
v2. The Q4 planning workshop should explicitly decide whether to retire
or rework the thesis rather than leave it as ambiguously "ongoing."
- KR3 (guardrail) holding is good news but is not by itself proof of
safety. Two cycles of held guardrails would strengthen the case that
lifecycle triggers do not degrade send quality at scale.
## Source of Truth
go/okrs-q3-2026-campaigns (Confluence). This artifact is a review document,
not the canonical OKR record.