Release v2.12.0. OKR Skills Launch

Released: 2026-05-03 Type: Feature release (minor) Skill count: 40 (up from 38) Key theme: OKR Skills set; full quarterly write-and-score cycle

TL;DR

You can now run a complete quarterly OKR cycle through pm-skills. The new pair:

/okr-writer    # at cycle start: draft, review, rewrite, or coach OKRs
/okr-grader    # at cycle close: score, interpret evidence, prepare next cycle

Both skills ship together so the type enum (committed | aspirational | learning | operational_health | compliance_or_safety) is consistent across the write-and-score handoff, and so users do not see a writer that points to a not-yet-existing grader.

The grader enforces the misuse failure modes that make OKRs dangerous: it refuses to retroactively change targets, refuses to retroactively shrink committed scope to claim a pass, refuses to soften committed misses with aspirational scoring, refuses to average failed guardrails into the primary objective score, and refuses to use OKR scores as individual performance ratings.

Six thread-aligned library samples (3 per skill across storevine, brainshelf, and workbench) demonstrate the full surface: aspirational sweet-spot scoring, committed-fail handling, compliance_or_safety partial-coverage, indicator-class guardrail rules, retention-thesis invalidation, and the all-three-empowerment-modes (empowered, mixed, feature-team).

No behavioral changes to any pre-existing skill (a small cross-reference cleanup in the writer now points to the grader directly, since the grader exists).

What changed

graph TD
    A[v2.11.1 tagged]
    B[2026-04-29 OKR strategy session]
    C[Approach B locked: writer + grader, defer rest]
    D[foundation-okr-writer built]
    E[Codex adversarial review on writer]
    F[measure-okr-grader built]
    G[Codex adversarial review on grader, round 1]
    H[Round 2 catches taxonomy propagation]
    I[Round 3 returns 0 findings]
    J[v2.12.0 release prep + count consistency sweep]
    L[Release-state Codex confirmation, round 1 catches reference-README defects]
    M[Round 2 catches homepage / anatomy / versioning drift]
    N[Round 3 catches versioning data accuracy, anatomy counts, date alignment]
    P[Round 4 catches release-notes audit-trail and changelog anchor]
    Q[Loop terminates: 4 rounds, MEDIUM-only audit-trail meta-findings, no HIGH/IMPORTANT, below Phase 0 termination threshold]
    O[v2.12.0 tagged]

    A --> B --> C --> D --> E --> F --> G --> H --> I --> J --> L --> M --> N --> P --> Q --> O

    style A fill:#c8e6c9
    style C fill:#fff4e1
    style I fill:#c8e6c9
    style Q fill:#c8e6c9
    style O fill:#81c784

Added

foundation-okr-writer with command /okr-writer. Drafts, reviews, rewrites, and coaches outcome-based OKR sets across team, department, product, product-area, and company scopes. Five entry modes are detected from user phrasing:
- Guided (default, moderate engagement): brief diagnostic, draft, score, surface issues, ask for confirmation.
- One-Shot via --oneshot: complete OKR set in one pass with all assumptions labeled.
- Sustained Coach: iterative loop, one component at a time, re-scored each turn.
- Audit Only: score and critique an existing OKR set; no new drafts unless asked.
- Rewrite: convert flawed OKRs, feature lists, or roadmap items into outcome-shaped OKRs.
An empowered-team diagnostic captures whether the team can change initiatives mid-cycle if KRs are not moving, and adds a conditional Disclosure section to the artifact when feature-team signals are present rather than refusing to operate.
measure-okr-grader with command /okr-grader. Scores completed OKRs against the canonical type enum:
- aspirational: numeric on the 0 to 1 scale; sweet spot 0.6 to 0.7.
- committed: pass or fail against the target; nothing below 1.0 is a pass.
- compliance_or_safety: binary; no partial credit; no retroactive scope shrinkage.
- operational_health: pass | fail | drift-within-tolerance against the threshold band.
- learning: validated | invalidated | partially-validated | insufficient-evidence; no numeric score.
Indicator class guardrail is independent of OKR type and adds a never-averaged-into-primary-score rule. Two special states prevent the most common misinterpretations: not-yet-observable for cycle-window extensions past close (90-day cohorts that finish in the next quarter) and not-yet-fully-observable for committed or compliance_or_safety KRs with partial coverage (cannot be retroactively shrunk to claim a pass on the observed subset).
6 thread-aligned library samples in library/skill-output-samples/. 3 per skill (storevine, brainshelf, workbench).

The storevine Campaigns thread now spans measure-experiment-results, foundation-okr-writer, and measure-okr-grader for a complete write-and-score arc on a single product context. Brainshelf samples demonstrate a retention-thesis invalidation: an at-scale replication study returned 1.6x vs the 3.4x observed in the original beta cohort. Workbench samples exercise the conventions the storevine and brainshelf samples do not: committed KR fail (10 of 12 onboardings, not softened to 0.83 aspirational), compliance_or_safety KR not-yet-fully-observable (1 of 3 audits completed, not retroactively scoped to claim a pass), and committed KR with guardrail indicator class.
docs/skills/foundation/foundation-okr-writer.md and docs/skills/measure/measure-okr-grader.md mirror pages auto-generated by scripts/generate-skill-pages.py. Both skills appear in mkdocs.yml nav.

Changed

foundation-okr-writer/SKILL.md cross-reference cleanups now that the grader exists: line 45 redirects scoring users to /okr-grader directly; line 184 drops the “planned for a later release” framing. No behavioral change to the writer.
utility-pm-skill-builder packet-format simplification (silent). The Step 5 Skill Implementation Packet was reduced from 13 to 12 items in the prior session. Nothing downstream depended on the removed item; the packet format change does not affect any shipped skill or sample. Bundled silently with the OKR launch per design.
README.md skill counts updated 38 → 40 (26 phase + 8 foundation + 6 utility), version badge bumped 2.11.1 → 2.12.0.
.claude-plugin/plugin.json and marketplace.json version bumped to 2.12.0; descriptions updated to reflect 40 skills and the new OKR Skills set.
library/skill-output-samples/README_SAMPLES.md updated: total samples 120 → 126, total skills 38 → 40, Browse-by-Skill table extended, all three thread tables extended with OKR rows, footer version refreshed.

Infrastructure / process

Phase 0 Adversarial Review Loop applied per the v2.11.0 codification. Both skills passed through Codex adversarial review before commit:
- The writer’s review caught 1 generator-script bug (HTML attribution comment rendering as page H1 across all skill pages) and 1 nonexistent-command directive (the writer redirected scoring users to /okr-grader while the grader did not yet exist). Both resolved before the writer commit.
- The grader’s review took 3 rounds to converge. Round 1 caught 1 HIGH finding (workbench sample taught users to violate the grader’s own compliance scoring rule by retroactively shrinking the committed scope) and 2 MEDIUM findings (/define-hypothesis nonexistent slash command in samples; OKR-type vs indicator-class taxonomy drift in SKILL.md). Round 2 caught 2 MEDIUM findings (taxonomy drift propagation to TEMPLATE.md and sample KR3 framings). Round 3 returned 0 findings. All resolved before the grader commit.
Em-dash sweep extension across the auto-generated docs/skills/ mirror to keep the mirror in sync with the standing no-em-dash rule applied to source SKILL.md files.
docs/internal/audit-ci/ consolidated into docs/internal/audit/_archived/ to flatten the audit-history tree.

Why this matters

OKRs are easy to ship as artifacts and very hard to ship as a working operating system. The literature converges on the same set of failure modes: feature-delivery KRs disguised as outcomes, fabricated baselines, missing evidence sources, naive cascading, compensation coupling, and committed misses softened by aspirational interpretation at cycle close.

The pm-skills contribution is to make those failure modes hard to ship past. The writer is a coach, not a template filler: it diagnoses empowered-team context, separates outcomes from work, applies the canonical OKR type enum, and runs a Quality Audit Rubric on every draft. The grader is an evidence interpreter, not an arithmetic engine: it scores per the type, applies indicator-class rules independently, and refuses the misuse patterns even when the user (or stakeholder) asks for them.

Both skills ship at the same time so the cross-skill hand-off is coherent at first appearance. The 6 thread samples cover the empowerment spectrum (empowered for storevine and brainshelf, mixed for workbench) and the OKR type spectrum (aspirational sweet-spot, deferred not-yet-observable, committed fail, compliance_or_safety partial coverage, indicator-class guardrail).

This is the first release where the new OKR Skills set ships with a Phase 0 Adversarial Review Loop trail in the release notes. The 3-round convergence on the grader is concrete evidence that adversarial review catches design-level pedagogical bugs (the workbench sample taught users to violate the rule the skill is supposed to enforce) that traditional review would have missed.

Validation

All 40 SKILL.md files pass scripts/lint-skills-frontmatter.sh
scripts/validate-commands.sh passes (40 / 40 commands resolve to canonical SKILL.md paths)
scripts/validate-agents-md.sh passes (40 skill paths matched in AGENTS.md)
scripts/validate-meeting-skills-family.sh passes (Meeting Skills Family contract conformance)
scripts/validate-version-consistency.sh passes at 2.12.0 (CHANGELOG, plugin.json, marketplace.json, README badge all aligned)
scripts/check-count-consistency.sh passes (40 skills, 47 commands, 9 workflows; no stale hardcoded counts in tracked files)
scripts/check-context-currency.sh passes (AGENTS/claude/CONTEXT.md and AGENTS/codex/CONTEXT.md both at v2.12.0)
scripts/generate-skill-pages.py regenerates the docs/skills mirror without diff drift
mkdocs build --strict passes with the new docs/reference/README.md wired into nav
Codex adversarial review converged on stable findings: 3 rounds for the grader (round 1: 1 HIGH + 2 MEDIUM; round 2: 2 MEDIUM taxonomy drift; round 3: 0), 1 round for the writer; 4 release-state confirmation rounds (round 1: 2 MEDIUM reference-README defects; round 2: 2 MEDIUM homepage / anatomy / versioning drift; round 3: 3 MEDIUM versioning data accuracy + anatomy counts + date alignment; round 4: 2 MEDIUM release-notes audit-trail accuracy + changelog anchor). Loop terminated after the round-4 fix per the Phase 0 codified rule “until findings stabilize below IMPORTANT severity”: MEDIUM count was 2 / 2 / 3 / 2 across rounds (no escalation, no HIGH, all findings below IMPORTANT threshold), and the round-4 findings were entirely meta-level audit-trail accuracy in the release notes themselves rather than new release-state defects. The pattern of fix-introduces-fix is exactly what the Phase 0 loop is designed to catch: each resolution can introduce new drift, and only by re-running the review do you find it.
Zero em-dash characters in tracked files
Zero references to the silently removed Quality Forecast / K/P/C/W zone classification in tracked files

Upgrade guide

If you installed pm-skills before v2.12.0, you have two options:

Option A: Stay on v2.11.1 (no action needed)

The skills you already have keep working. v2.12.0 is purely additive for the OKR work; the only modification to a pre-existing skill is two cross-reference lines in foundation-okr-writer (which itself shipped in this release).

Option B: Pull the OKR Skills set

Pull the v2.12.0 tag and the new commands appear:

/okr-writer    # at cycle start
/okr-grader    # at cycle close

Use the writer to draft or audit a quarterly OKR set. At cycle close, feed the OKR set plus final KR values to the grader for scoring, evidence interpretation, learning synthesis, and next-cycle recommendations.