Design Sprint Test and Score: Workbench One-Screen UX Validation

Scenario

Friday 2026-08-07. Trial run passed Thursday 17:15 PT. 5 SREs completed the full Five-Act with simulated-incident Tasks Friday 09:00-16:30 PT. The team invokes tool-design-sprint-test-and-score to capture observations, score, and frame Priya’s call.

Per-Customer Interview Observation Notes

Customer 1: Fintech-SRE-A (09:00; design partner)

Profile: Series C fintech; 4 years on-call; uses Datadog + Sentry primary.

Context: Most recent on-call: 3 alerts in 7 days. First tool typically Datadog dashboard. Disorientation time typically 4-8 min for unfamiliar incident types.

Task 1: Opened Workbench Panel 2 deeplink. Read service-map in 4 sec; identified “user-service red, spreading to checkout” in another 6 sec. Total “what is broken” comprehension: 12 sec. Top action callout read in 18 sec. Total orient: 35 sec.

“Would I stay here? Yes. The blast-radius answer is exactly what I open Datadog for and this gives it in 10 seconds.”

Task 2: Used inline drill-down on Panel 6 deployment view. “I would not have left Workbench to check this. That’s the win.”

Task 3: “Slow degradation is harder for the one-screen. I’d want a 1-hour time selector option.”

Debrief: Strong yes. Pricing USD 200/seat.

Customer 2: Fintech-SRE-B (10:30; design partner)

Profile: Series C fintech (Jin’s colleague); 6 years on-call; Datadog + Honeycomb.

Task 1: Orient 50 sec. “Slower than C1 but I needed to read the anomaly band twice to trust the ranking.”

Task 2: Drill-down used; appreciated inline. “Would not revert.”

Task 3: “Slow-degradation case: agree with C1; this UI is for the sudden-break incident.”

Debrief: Yes with caveats. Pricing USD 175.

Customer 3: SeriesB-SRE-C (12:00)

Profile: Series B AI; 3 years on-call; lean stack (Sentry only).

Task 1: Orient 80 sec. “I don’t have a service graph at my company so the service-map area felt empty when I imagined this for my context.”

Task 2: Drill-down worked but “I would have wanted to see the raw error rate alongside, not just the deployment diff.”

Task 3: “Slow degradation: yes the one-screen would not work for that.”

Debrief: “I’d try it. Pricing USD 150.”

Customer 4: SeriesB-SRE-D (14:00)

Profile: Series B fintech; 5 years on-call; Datadog primary.

Task 1: Orient 45 sec. “This is the layout I’ve been wishing for. The ‘is it spreading’ question is the one Datadog makes me click 3 times to answer.”

Task 2: Drill-down loved.

Task 3: Slow degradation noted.

Debrief: Strong yes. Pricing USD 250.

Customer 5: SeriesD-SRE-E (15:30)

Profile: Series D ecommerce; 10 years on-call; Datadog + custom dashboards.

Task 1: Orient 55 sec. “Trustable. I’d still want to verify in Datadog the first few uses but then I’d switch.”

Task 2: Drill-down: “This is the bit that changes my behavior. Inline drill-down is how you keep me from reverting.”

Task 3: Slow degradation noted.

Debrief: Yes after 2-week trial. Pricing USD 225.

Best Quotes

“The blast-radius answer is exactly what I open Datadog for and this gives it in 10 seconds.” - C1
“I would not have left Workbench to check this. That’s the win.” - C1
“This is the layout I’ve been wishing for.” - C4
“Inline drill-down is how you keep me from reverting.” - C5
“I don’t have a service graph at my company so the service-map area felt empty.” - C3
“Slow degradation is harder for the one-screen.” - C1
“The ‘is it spreading’ question is the one Datadog makes me click 3 times to answer.” - C4
“I’d still want to verify in Datadog the first few uses but then I’d switch.” - C5

Scorecard Grid

	C1	C2	C3	C4	C5	Day-end
Q1: 60-sec orient	Y (35s)	Y (50s)	partial (80s)	Y (45s)	Y (55s)	Validated 4-of-5 (median 50s under 60-sec bar; 1 partial at 80s tied to service-graph absence)
Q2: revert-rate <20%	Y (no revert)	Y	partial (would verify in Datadog first; then switch)	Y	partial (would verify Datadog first 2 weeks)	Directional 3-of-5 hard Y; 2 partial all signal eventual stay
Q3: augment-not-replace	Y	Y	Y	Y	Y	Validated 5-of-5
Q4: USD 150-300 pricing	Y (200)	Y (175)	Y (150)	Y (250)	Y (225)	Validated 5-of-5 (median USD 200; range USD 150-250)
Q5: revert reason = data gap vs comprehension	n/a (no revert)	“trust building” comprehension	”service graph absence” data gap	n/a	”trust building” comprehension	Mixed: 2 data, 2 comprehension

Decider override: Q2 marked Directional (not Validated) due to 2 customers self-describing 2-week trust-build period.

Observed Patterns

Worked

Blast-radius spatial answer (5 of 5): “what is broken + is it spreading” comprehensible in <15 sec for all 5.
Inline drill-down (5 of 5): unanimous; this is the revert-prevention pattern.
Top action callout (4 of 5): “what to look at first” answered without prompting.
Augment positioning (5 of 5): no SRE described Workbench as replacing existing tools.

Hesitated

Service-graph dependency (1 of 5 hard; C3): SREs without service-graph config see empty area.
Trust-build period (2 of 5; C3, C5): 2-week period of verifying against Datadog before fully trusting Workbench.

Broke trust

None observed in simulation.

Unexpected

Slow-degradation case (3 of 5 unprompted): the one-screen designed for sudden-break incidents does not work for slow-degradation incidents (3+ hr timescale). Future v0.2 question.
Higher pricing tolerance than expected (5 of 5; median USD 200): exceeds the Datadog price-sensitivity intuition Priya had pre-sprint.

Hot Takes

Priya

Build. The blast-radius + inline-drill-down combination is the wedge. Q2 directional (3-of-5 hard + 2 partial) is acceptable because both partials self-describe an eventual switch. We need to add the Sketch B inline-drill-down patterns to v0.1 scope (Marcus already advocating for it). Slow-degradation is a v0.2 question worth scoping into the roadmap.

Marcus

Inline drill-down is build-feasible from existing API integrations. The service-graph-absence concern is a config issue for onboarding, not a UX problem; we can guide customers through service-graph generation in setup. Slow-degradation needs a different UI mode; that’s v0.2.

Ari

The spatial-mental-model layout worked. C3’s empty service-map is the only design failure mode and it’s an onboarding fix, not a layout problem. The Sketch B inline-drill-down patterns merge cleanly into Sketch A; this is the v0.1 design.

Jin

The trust-build comment from C3 and C5 is the biggest finding. Build a 2-week trial onboarding that gives SREs explicit “verify in Datadog” prompts during the first few incidents; once they verify, they switch fully. The product roadmap should include this trial-mode.

Decider Summary

The call: Build, with scope amendment to add Sketch B inline-drill-down patterns to v0.1 + design a 2-week trial-mode onboarding.

Rationale: Q1 validated 4-of-5 on 60-sec orient. Q3 augment-positioning validated 5-of-5. Q4 pricing validated above expectations (median USD 200; pre-sprint estimate was USD 150). Q2 directional with all 5 SREs signaling eventual stay. Inline-drill-down (Sketch B) is the v0.1 revert-prevention pattern. Trust-build period is the v0.1 onboarding design challenge. Slow-degradation is v0.2.

Highest-confidence learning: The blast-radius spatial answer + inline drill-down combination is the wedge that converts senior SREs from multi-tool juggling to one-screen-first incident response.

Most important revision: Add inline-drill-down patterns (from Sketch B) to v0.1 + design 2-week trial-mode onboarding.

Next artifact: PRD for v0.1 build with notes-added scope and trial-mode onboarding; produced via deliver-prd within 5 business days (target: 2026-08-14).

Decider Checkpoint

Priya confirms scorecard
Priya commits to Build with scope amendment
Priya names highest-confidence learning
Priya names revision: inline-drill-down + 2-week trial-mode
Priya names next artifact: PRD by 2026-08-14
Priya acknowledges Monday handoff: Series C fintech retention conversations resume 2026-08-10

Signed: Priya (founder, PM), 2026-08-07 17:25 PT.

Sprint closed. Foundation Sprint + pilot + Design Sprint arc complete; Build authorized with scope amendment; PRD work begins Monday 2026-08-10.