Design Sprint Test and Score (Friday)
Quick facts
Classification: tool | Version: 0.1.0 | Category: discovery | License: Apache-2.0
Try it: /tool-design-sprint-test-and-score "Your context here"
Friday is the sprint’s payoff. 5 target-profile customers run the prototype while the team observes; the team synthesizes observations into a scorecard against the sprint questions; the Decider makes the build / iterate / pivot / stop call by end-of-day. The week’s 35-40 person-days plus customer recruiting cost converts into one actionable decision.
Family contract: docs/reference/skill-families/design-sprint-skills-contract.md. This skill is a member of design-sprint-skills.
When to Use
- It is Day 5 of the Design Sprint and Thursday’s prototype passed trial run.
- 5 confirmed participants are scheduled (canonical; or 4 if 1 cancelled-and-no-buffer; pause if below 4).
- The team can observe interviews live (in-person or via Zoom breakout room) and synthesize during the day.
- The Decider is present Friday PM for the post-interview review (canonically 14:00-18:00 PT window covering observation of slots 4-5 plus Decider review by 17:30 PT).
When NOT to Use
- Thursday prototype did not pass trial run. Re-run trial; if still failing at 19:00 PT Thursday, postpone Friday.
- Fewer than 3 customers confirmed. Per Ratified Decision 3, the canonical guidance is 5 customers; 3-4 or 6-7 gets a documented warning; below 3 or above 7 should trigger a re-decision (postpone or split testing). Note: the v0.1.0 family validator does NOT mechanically enforce these thresholds (cohort count is in the EXAMPLE artifact, not in frontmatter); enforcement is a v2.16 validator-expansion candidate.
- Decider unavailable for the post-interview review window. Without Decider, the day produces observations without a call.
- The team plans to use this skill to write the executive memo. Per Ratified Decision 4: exec memo authoring is delegated to
foundation-stakeholder-update(existing pm-skills foundation skill); this skill produces the Decider summary only.
How to Use
Use the /tool-design-sprint-test-and-score slash command:
/tool-design-sprint-test-and-score "Your context here"Or reference the skill file directly: skills/tool-design-sprint-test-and-score/SKILL.md
Output Template
Design Sprint Friday Artifact: [Initiative or Challenge name]
Per-Customer Interview Observation Notes
Customer 1: [Anonymized ID] (slot [time])
Profile summary: [Demographics + relevance to target profile; pre-screen score if applicable]
Context (Act 2) reactions:
- [Behavior or quote 1]
- [Behavior or quote 2]
- [Behavior or quote 3]
Tasks (Act 4) behavior:
- Task 1: [Behavioral observations with timestamps where relevant; e.g., “completed Task 1 in 47 seconds; hesitated 6 sec at Panel 3 before tapping the chip”]
- Task 2: [Behavioral observations]
- Task 3: [Behavioral observations]
Debrief (Act 5) reactions:
- [Overall reaction summary]
- [Pricing answer if relevant]
- [Anything-else moments]
Customer 2: [Anonymized ID] (slot [time])
[Same structure as Customer 1]
Customer 3: [Anonymized ID] (slot [time])
[…]
Customer 4: [Anonymized ID] (slot [time])
[…]
Customer 5: [Anonymized ID] (slot [time])
[…]
Best Quotes
[5-15 verbatim customer quotes the team flags as most signal-bearing. Format: quote attributed to anonymized customer ID. Selected to inform Decider summary or downstream pitch / planning artifact.]
- “[Verbatim quote]” - C[N]
- “[Verbatim quote]” - C[N]
- “[Verbatim quote]” - C[N]
- “[Verbatim quote]” - C[N]
- “[Verbatim quote]” - C[N]
- “[Verbatim quote]” - C[N]
Scorecard Grid
Rows: sprint questions from Monday’s map-and-target. Columns: 5 customers. Each cell: Y / N / partial / unclear with one-line note. Rightmost column: day-end decision per question.
| C1 | C2 | C3 | C4 | C5 | Day-end decision | |
|---|---|---|---|---|---|---|
| Q1: [Question wording from Monday] | [Y/N/p/u + note] | […] | […] | […] | […] | [Validated / Invalidated / Inconclusive with N-of-5 reasoning] |
| Q2: [Question wording] | […] | […] | […] | […] | […] | [Decision] |
| Q3: […] | […] | […] | […] | […] | […] | [Decision] |
| Q4: […] | […] | […] | […] | […] | […] | [Decision] |
| Q5: […] | […] | […] | […] | […] | […] | [Decision] |
| Q6: […] | […] | […] | […] | […] | […] | [Decision] |
[Decider override notes: if any day-end decision differs from the mechanical rule, the Decider records the reasoning here.]
Observed Patterns
[4 buckets. Each pattern names how many customers showed it (N of 5).]
Worked (validated patterns)
- [Pattern] (N of 5): [Description; which customers showed it; why this matters.]
- [Pattern] (N of 5): […]
Hesitated (friction patterns)
- [Pattern] (N of 5): [Description; where in the flow hesitation appeared; what the customers said about it.]
- [Pattern] (N of 5): […]
Broke trust (trust failures)
- [Pattern] (N of 5): [Description; what triggered the trust break; whether the customer recovered.]
- [Pattern] (N of 5): […]
Unexpected (surprising patterns the team did not predict)
- [Pattern] (N of 5): [Description; why this is novel; whether it suggests a future opportunity.]
- [Pattern] (N of 5): […]
Hot Takes
[Each team member writes one short paragraph silently and in parallel before group synthesis. Captures personal read on Friday before consensus bias sets in.]
[Team member 1 name + role]
[2-4 sentences. Personal read. Strongly opinionated. Specific.]
[Team member 2 name + role]
[2-4 sentences.]
[Team member 3 name + role]
[2-4 sentences.]
[Team member 4 name + role]
[2-4 sentences.]
Decider Summary
The call: [Build / Iterate / Pivot / Stop / Reframe]
Rationale: [3-5 sentences explaining the call. Reference scorecard day-end decisions and the most signal-bearing patterns. Acknowledge what the call rules out.]
Highest-confidence learning from this sprint: [One sentence stating the thing the team is now most confident about. Should be supported by 4-of-5 or 5-of-5 scorecard evidence.]
Most important revision the team would make: [One sentence stating the most important change to the direction. Could be a feature adjustment, an audience adjustment, a positioning adjustment, or a model adjustment.]
Next artifact: [The post-sprint deliverable. Examples: “PRD for v0.1 build via deliver-prd”; “smaller follow-on experiment design via measure-experiment-design”; “pivot rationale documented via iterate-pivot-decision”; “stakeholder update via foundation-stakeholder-update”.]
Decider Checkpoint
Decider sign-off required before the sprint closes.
- Decider confirms the scorecard reflects the team’s collective read (with override reasoning where applicable).
- Decider commits to the call (build / iterate / pivot / stop / reframe) and acknowledges the team will not re-litigate without explicit new evidence.
- Decider names the highest-confidence learning and accepts it as the load-bearing finding of the sprint.
- Decider names the most important revision and assigns ownership.
- Decider names the next artifact and the owner who will produce it (canonically within 5 business days of sprint close).
- Decider acknowledges Monday context handoff: any pre-existing roadmap commitments dependent on the build direction are re-confirmed or revised.
Signed: [Decider name, role], [ISO date and local time]
Sprint closed.
Example Output
Design Sprint Friday Artifact: Brainshelf Camera-Capture Validation
Design Sprint Friday Artifact: Brainshelf Camera-Capture Validation
Friday 2026-06-05. Trial run passed Thursday 17:15 PT (one minor copy fix landed at 17:30 PT). 5 customers completed the full Five-Act interview Friday 09:00-16:30 PT (4 from the primary roster + Discord-4 from the activated buffer). Jamie’s Decider review concluded 17:30 PT. This is the sprint’s payoff artifact.
Per-Customer Interview Observation Notes
Customer 1: UI-Panel-A (09:00 slot)
Profile summary: US, age 38, M, reads ~35 books per year, currently uses paper journal + memory; pre-screen score 8/10 on books-as-memory framing.
Context (Act 2) reactions:
- “I have a notebook by my bed, but I only write in it when I finish something, not when I find something.”
- Described 3 specific instances in the last month of forgetting whether he had already read a book; one resulted in a duplicate purchase.
- Has tried Goodreads twice; abandoned both times within a week because “it felt like homework.”
Tasks (Act 4) behavior:
- Task 1: Completed in 38 seconds. Hovered on Panel 3 chip selection for 5 sec (“I’m not sure if I want to commit to Read or Want to Read yet”). Tapped “Want to read.” Did not notice the “where you saw it: Twig and Cover Books” tag.
- Task 2: Found The Overstory in 12 seconds via the “Recently captured” row. Asked unprompted: “What’s the Twig and Cover Books thing?” When explained, said: “That’s the thing I’d actually use this for.”
- Task 3: Searched “trees” (not “forests”). Found The Overstory; was surprised it matched (“I didn’t write ‘trees’ anywhere, did I?”). When explained that the captured-time context tag is searchable, said: “That’s interesting but it feels like it might be too magic.”
Debrief (Act 5) reactions:
- “It’s faster than I expected. Way faster than Goodreads.”
- “The geolocation thing is the killer feature. I’d want it for the bookstore moment specifically.”
- Pricing: “Maybe USD 4-5 a month. I’d pay for it if it works as fast as you just showed me.”
Customer 2: Discord-1 (10:30 slot)
Profile summary: UK, age 31, F, reads ~50 books per year, currently uses Goodreads grudgingly + memory; pre-screen score 9/10.
Context (Act 2) reactions:
- “Goodreads is fine for the books I finish but I don’t remember to log books I’m just curious about.”
- Described the pain of being in a bookstore wanting to remember a book but feeling friction with current tools (“opening the Goodreads app is enough to lose the moment”).
- Reads in genre clusters (currently on a Le Guin re-read); finds Goodreads’ chronological view unhelpful for that pattern.
Tasks (Act 4) behavior:
- Task 1: Completed in 31 seconds. Tapped chip immediately (“Want to read”). Noticed the geolocation tag and said “oh that’s clever.”
- Task 2: Found The Overstory in 8 seconds. Spent 30 seconds exploring the recently captured row, scrolling and commenting on how the visual layout felt “like a real shelf.”
- Task 3: Searched “old growth forests”; found The Overstory. “Did it find that because the book mentions forests, or because I was somewhere?” When explained: “Okay, I love that, but I’d want to know which one matched, like a little chip on the result.”
Debrief (Act 5) reactions:
- “I’d actually use this. That’s not a thing I say about most apps.”
- “The undo is good. The whole ‘are you sure?’ thing in Goodreads is the worst.”
- Pricing: “Five pounds a month. Easy. I’d pay seven if it had the search-by-context thing more prominently.”
Customer 3: Discord-4 (12:00 slot; buffer-activated Tuesday)
Profile summary: US, age 36, M, reads ~28 books per year, currently uses no system; pre-screen score 7/10.
Context (Act 2) reactions:
- “I don’t really have a system. I just remember the books I want to talk about.”
- Described forgetting a recommendation from a friend three weeks earlier and being embarrassed when the friend asked about it.
- Skeptical of book-tracking tools: “I tried Goodreads once and it stressed me out.”
Tasks (Act 4) behavior:
- Task 1: Completed in 42 seconds. Hesitated on Panel 2 (“Is it taking a picture? Did it work?”). After Panel 3 appeared said: “Oh, I see. It just did it.”
- Task 2: Found The Overstory in 18 seconds. Did not notice the geolocation tag until prompted; then said “I don’t go to bookstores much, so that’s not really for me.”
- Task 3: Searched “forests” (no result match); then “trees” (found The Overstory). When explained: “Cool, but I’d probably just search ‘overstory’ if I remembered the title.”
Debrief (Act 5) reactions:
- “I’d try this. I’m not sure I’d stick with it.”
- “The capture is good. The library felt empty - I’d want to see what it’s like with a bunch of books in there.”
- Pricing: “Maybe USD 3 a month? I’m not sure I’d actually use it enough to pay much more.”
Customer 4: Discord-3 (14:00 slot)
Profile summary: Canada, age 29, M, reads ~30 books per year, currently uses Notes app + paper journal; pre-screen score 8/10.
Context (Act 2) reactions:
- “My Notes app has a ‘books’ note that I add to sometimes. It’s a mess.”
- “I want to remember what I thought when I bought a book, not just the title.”
- Described wanting to find “the book about distributed systems I bought last summer” and failing.
Tasks (Act 4) behavior:
- Task 1: Completed in 33 seconds. Tapped “Reading” chip (added a step the script didn’t anticipate). Noticed geolocation immediately: “Oh nice, that’s the context I want.”
- Task 2: Found The Overstory in 6 seconds. Asked: “Can I add a note here?” When told the prototype doesn’t have that yet but the real product would, he said “That’s the feature I’d want.”
- Task 3: Searched “forests”; found The Overstory. “This works the way I want it to. Most search in apps is bad.”
Debrief (Act 5) reactions:
- “The capture flow is the cleanest I’ve seen for this kind of thing.”
- “I want notes. I want to add my own context to each capture. The geolocation is great but I want my words too.”
- Pricing: “Six dollars a month feels right. Eight if you had the notes thing.”
Customer 5: UI-Panel-B (15:30 slot)
Profile summary: Australia, age 52, F, reads ~60 books per year, currently uses Bookly + paper notebooks; pre-screen score 9/10.
Context (Act 2) reactions:
- “Bookly is okay but it’s designed for tracking what you’re reading, not for capturing what you find.”
- Described a workflow of photographing book covers in her phone’s Camera app then losing track of which folder they’re in.
- Strong articulator of the books-as-memory framing: “I read the way some people garden. I’m not collecting; I’m building a memory.”
- Volunteered a recent example of recall pain: “Last month I almost bought a Le Guin novella I’d already read three years ago. I caught it at the register only because the cover felt familiar. That happens to me maybe four times a year and I always think ‘I should have a way to know.’”
Tasks (Act 4) behavior:
- Task 1: Completed in 29 seconds (fastest of all 5). Did not hesitate; tapped chip immediately. Noticed geolocation and laughed: “That solves the folder problem.”
- Task 2: Found The Overstory in 5 seconds. Spent 90 seconds exploring the library: “Show me my whole library. Show me the books I have but haven’t started. Show me books grouped by month I captured them.” (Probed all three; the prototype doesn’t support all three but she filled in the gap intuitively.)
- Task 3: Searched “old growth”; found The Overstory. “Yes. This is what I want. Bookly doesn’t do this.”
Debrief (Act 5) reactions:
- “This is the first time I’ve seen a book app that thinks the way I do about books.”
- “The undo is good. The geolocation is essential. I want to add notes.”
- Pricing: “I’d pay USD 10 a month. I pay USD 5 for Bookly and this is more useful.”
Best Quotes
- “The geolocation thing is the killer feature. I’d want it for the bookstore moment specifically.” - C1
- “I’d actually use this. That’s not a thing I say about most apps.” - C2
- “Five pounds a month. Easy. I’d pay seven if it had the search-by-context thing more prominently.” - C2
- “Cool, but I’d probably just search ‘overstory’ if I remembered the title.” - C3
- “I want notes. I want to add my own context to each capture. The geolocation is great but I want my words too.” - C4
- “I read the way some people garden. I’m not collecting; I’m building a memory.” - C5
- “This is the first time I’ve seen a book app that thinks the way I do about books.” - C5
- “Yes. This is what I want. Bookly doesn’t do this.” - C5
- “The capture is good. The library felt empty - I’d want to see what it’s like with a bunch of books in there.” - C3
- “I’d pay USD 10 a month. I pay USD 5 for Bookly and this is more useful.” - C5
Scorecard Grid
| C1 | C2 | C3 | C4 | C5 | Day-end decision | |
|---|---|---|---|---|---|---|
| Q1: Will 25+/year readers complete sub-3-second capture without abandoning, and is the library valuable for personal recall? | Y (capture interaction ~2.5s; full task 38s incl. script-read; “way faster than Goodreads”) | Y (capture interaction ~2.2s; full task 31s; “I’d actually use this”) | partial (capture interaction ~3.4s with mid-flow pause; full task 42s; “not sure I’d stick with it”) | Y (capture interaction ~2.6s; full task 33s; “cleanest I’ve seen”) | Y (capture interaction ~2.0s; full task 29s; “solves the folder problem”) | Validated (4-of-5 Y on capture-interaction sub-3-sec bar; 1 partial; 0 N) |
| Q2: Does OCR + cover-recognition feel acceptable in the prototype, or do mis-resolutions break trust? | Y (didn’t notice it was simulated) | Y (no comment) | unclear (asked “did it work?” once) | Y (no comment) | Y (no comment) | Validated for Figma simulation; real-OCR test deferred (4-of-5 Y; 1 unclear) |
| Q3: Sustainable price point above USD 4/month? | Y (USD 4-5) | Y (GBP 5-7 = USD 6-9) | partial (USD 3; below threshold) | Y (USD 6-8) | Y (USD 10) | Validated above USD 4 (4-of-5 Y; median USD 6) |
| Q4: Do customers describe “did I already read this?” as frequent and painful? | Y (described 3 instances; 1 duplicate purchase) | Y (Goodreads abandoned partly for this) | N (“I just remember the books I want to talk about”) | Y (Notes app mess; “distributed systems” failure) | Y (paper-notebook fallback for this) | Validated (4-of-5 Y; 1 N) |
| Q5: Do customers understand within 15 sec where the book lives and how to find it? | Y (12s in Task 2) | Y (8s) | Y (18s; just over threshold) | Y (6s) | Y (5s) | Validated (4-of-5 Y; 1 borderline at 18s; median 8s; all below 20s) |
| Q6: Do customers see value as fast capture or as trustworthy recall, and does it depend on context? | Capture (bookstore moment specifically) | Both (capture flow + library shelf metaphor) | Capture (skeptical on recall stickiness) | Both, leaning recall (wants notes) | Both, leaning recall (memory framing) | Inconclusive trend toward “both, leaning recall for power users; capture for casual users.” Not a binary; depends on the use context as the question anticipated. |
Decider override notes: Q3 day-end decision marked Validated despite C3’s USD 3 partial; Jamie reasoning: C3 self-described as marginal target (“not sure I’d actually use it enough to pay much more”); the other 4 customers (median USD 6) carry the validation.
Observed Patterns
Worked (validated patterns)
- The “where you saw it” geolocation tag (5 of 5 noticed; 4 of 5 enthusiastically endorsed): the single most-praised element of the prototype. C1 called it the “killer feature”; C5 said it “solves the folder problem.” C3 was the lone partial because he “doesn’t go to bookstores much.”
- Sub-3-second capture interaction (4 of 5 hit on the discrete capture interaction: Panel 1 tap through Panel 4 toast; median 2.5s. Full Task 1 completion median was 35 sec including script-read and chip selection; the capture interaction is the load-bearing measurement against assumption A1): the speed worked. Even C3’s 3.4s capture interaction with mid-flow pause was perceived as fast in debrief because the alternative is Goodreads/manual entry.
- The 4-chip status selection (Read / Want / Reading / Reference) (5 of 5 used without confusion): four chips was the right number; nobody asked for more or fewer.
- The library “Recently captured” row visual (4 of 5 explored beyond Task 2’s requirement; C5 spent 90 sec): the visual metaphor of a row of recent captures landed.
Hesitated (friction patterns)
- Panel 2 recognition feedback was ambiguous for some (1 of 5; C3 specifically): the pulsing outline alone wasn’t enough confirmation; C3 wasn’t sure if recognition completed until Panel 3 appeared.
- The chip-selection moment is the longest hesitation in the capture flow (3 of 5 hesitated 3-6 sec on Panel 3): customers want a moment to decide status; this is friction worth accepting if it produces accurate status data, but worth considering whether a “decide later” option would speed initial capture.
Broke trust (trust failures)
- None of 5 customers experienced a trust-breaking moment in the prototype. This is the headline trust finding: the simulated OCR did not break trust because the script staged it to succeed. Real-OCR trust testing remains a post-sprint risk.
Unexpected (surprising patterns the team did not predict)
- 3 of 5 customers asked for notes/annotation on each capture (C2 indirectly, C4 explicitly, C5 explicitly). The team did not build a notes feature in the prototype; this consistent unprompted request suggests the recall pillar needs notes, not just metadata + geolocation.
- C5 spontaneously asked for 3 library views the prototype didn’t support (whole library, unread-only, grouped by month captured). This suggests heavy users (60+/year readers) want richer library affordances; the v0.1 single library view may be insufficient for the top of the target segment.
- Pricing came in higher than the team predicted (median USD 6; one outlier at USD 10). The team’s pre-sprint conservative estimate was USD 4 as a ceiling; the actual market may be more elastic.
Hot Takes
Jamie (founder, PM)
The lead question validated cleanly. I’m now most worried about the recall-pillar gap; 3 of 5 unprompted asks for notes is a stronger signal than I expected, and our v0.1 build plan does not include notes. I think the answer is build, but with notes added to the v0.1 scope, not deferred to v0.2. The “where you saw it” element is genuinely a wedge; I want to figure out how to market it explicitly without making it sound creepy.
Alex (design lead)
The prototype worked because the visual hierarchy made the capture moment feel like the primary surface, not the library. The library is what people want to live in, though; that’s the thing I underdesigned. If we build, I want to spend the first design week of build on the library surfaces, not on capture polish. Capture is solved at the storyboard level; library is not.
Sam (engineering lead)
Real OCR is the next risk. Our prototype simulated OCR to succeed; the question of “will customers tolerate mis-resolutions” is still open. I want to ship a sub-product or experiment specifically to test real-OCR acceptance before we commit to the full v0.1 build cycle. Maybe a Figma to TestFlight prototype with real Vision framework integration as a 1-week spike before the 6-week MVP.
Riley (customer expert)
The framing matters more than I expected. C5’s “I read the way some people garden” is the customer we should design for; C3 is the customer who will churn unless we add notes. If we’re choosing a primary persona, it’s C5. If we’re choosing a marketing message, it’s “remember every book the way you remember every garden.” I’d build with C5 as the north star.
Decider Summary
The call: Build, with one scope amendment.
Rationale: The lead question Q1 validated 4-of-5 with no N; the assumption A1 from the Foundation Sprint that 25+/year readers are switchable from “do nothing” with sub-3-second capture is supported by Friday’s evidence. Q3 (pricing) validated above expectations (median USD 6 vs pre-sprint USD 4 ceiling). Q5 (capture-to-library understanding) validated 5-of-5. The scope amendment: notes/annotation per capture moves from v0.2 deferral to v0.1 inclusion based on 3-of-5 unprompted requests; this changes the 6-week MVP cycle to a 7-8 week cycle but keeps the launch target within the seed-round pitch window. The Red Bookstore Mode backup is not activated.
Highest-confidence learning from this sprint: The “where you saw it” contextual capture (geolocation today; notes tomorrow) is the wedge that differentiates Brainshelf from Goodreads, StoryGraph, paper journals, and doing nothing. 4-of-5 customers responded to it as a primary value driver, not a secondary feature.
Most important revision the team would make: Add notes/annotation per capture to the v0.1 scope (not v0.2 deferral). This was the consistent unprompted ask across 3 of 5 customers and the single most-mentioned gap in the prototype.
Next artifact: PRD for v0.1 build with the notes-added scope; produced via deliver-prd skill within 5 business days of sprint close (target: 2026-06-12). The PRD will reference this Friday artifact and the FS Founding Hypothesis as its load-bearing context.
Decider Checkpoint
Decider sign-off required before the sprint closes.
- Jamie confirms the scorecard reflects the team’s collective read (with the one override on Q3 reasoned above).
- Jamie commits to the call (Build with notes added to v0.1 scope) and acknowledges the team will not re-litigate without explicit new evidence (e.g., real-OCR trust failure surfaces in the Sam-proposed 1-week spike).
- Jamie names the highest-confidence learning: “where you saw it” contextual capture is the differentiating wedge.
- Jamie names the most important revision: notes/annotation per capture moves from v0.2 deferral to v0.1 inclusion (6-week MVP becomes 7-8 weeks).
- Jamie names the next artifact: PRD for v0.1 build; Jamie owns; target 2026-06-12.
- Jamie acknowledges the Monday context handoff: the seed-round pitch deck timing slips 1-2 weeks; investor schedule re-confirmation goes out Monday 2026-06-08. Sam’s real-OCR spike scheduled week of 2026-06-15 in parallel with PRD finalization.
Signed: Jamie (founder, PM), 2026-06-05 17:30 PT.
Sprint closed. Foundation Sprint + Design Sprint arc complete; Build authorized; PRD work begins Monday 2026-06-08.