Skip to content

Foundation Sprint Differentiation: Workbench Debugging Toolchain

Candidate Differentiators (generated; pre-scoring)

#DifferentiatorSource
D1Incident-time focus (UX optimized for the disorientation phase)Priya
D2One screen, not 5-7 tabs (consolidated view during incidents)Jin
D3Auto-correlated dependency + state + trace timelineMarcus
D4SRE-vocabulary first-class UX (incidents, runbooks, MTTR; not “spans” and “log lines”)Ari
D5Open-source friendly (export, replay, integrate with existing tools)Marcus
D6Replay/time-travel within an incident windowMarcus
D7Sub-30-second setup-to-data for a new service (low integration friction)Priya
D8Pricing model that scales with incidents, not with data volumePriya

Scored Differentiators

Scoring criteria: 1-5 on (a) feasibility for team to deliver well, (b) defensibility against competitor copying within 12 months, (c) SRE-judged importance from 19-interview synthesis.

DifferentiatorFeasibilityDefensibilityImportanceTotalRank
D1: Incident-time focus555151
D2: One screen during incidents535132
D3: Auto-correlated state + trace + deps454132
D6: Replay / time-travel354124
D4: SRE-vocabulary UX533115
D7: Sub-30-second setup434115
D8: Incident-priced model443115
D5: Open-source friendly433108

2x2 Chart

Axes (chosen via note-and-vote at 14:45 PT):

  • X-axis: Always-on vs Incident-time focus
  • Y-axis: Disorientation-phase support (weak to strong)
STRONG ON DISORIENTATION
|
| Workbench .
| (specialized direction)
|
|
|
| . Honeycomb (best in class on traces)
|
| . Datadog
ALWAYS-ON | INCIDENT-TIME
<------ . New Relic ----------+------------ . Sentry ------>
| . Grafana stack
|
|
|
|
| . internal homegrown
|
| . multi-tool juggling (status quo)
|
WEAK ON DISORIENTATION

Reading the chart: the “incident-time + strong on disorientation” quadrant is largely empty. Datadog and Sentry sit closer to incident-time than New Relic, but none of them are optimized for the cognitive disorientation phase. Multi-tool juggling sits in the bottom-right and is the dominant status-quo. Workbench wants the upper-right open quadrant.

Decision Principles

The Differentiation work produces 4 decision principles constraining all Workbench v0.1 product decisions:

  1. Incident-time first; always-on never. Workbench is optimized for the moment when the pager fires. We do not build “always-on” dashboards, alert configuration, or metrics-tracking features that are not load-bearing for incident-time. If a feature is only useful between incidents, it does not ship in v0.1.
  2. One screen during the incident. SREs juggle 5-7 tools today. Workbench’s job is to be the one tool they need open during the incident. Every UI decision pulls toward consolidation.
  3. SRE vocabulary, SRE workflow. The language is “incident,” “service,” “deployment,” “dependency,” “runbook,” not “span,” “log line,” “metric,” “trace ID.” If an SRE has to translate between vendor vocabulary and their team’s vocabulary, the tool is failing them in the worst moment.
  4. Augment, don’t replace existing observability investment. Customers keep Datadog or Honeycomb or their open-source stack; Workbench imports from them during an incident. We do not ask customers to rip-and-replace.

Mini Manifesto

What Workbench is:

Workbench is the incident-time companion tool for senior SREs at growth-stage startups. When the pager fires at 03:00 and the SRE is staring at 5 browser tabs trying to figure out which service in their distributed system is misbehaving, Workbench is the one screen that shows the auto-correlated picture: which service, which version, which dependency, which state preceded the failure, plotted on one incident-time timeline.

Workbench’s entire reason to exist is to compress the disorientation phase of incident response. The team has measured the disorientation phase as 5-20 minutes per incident across 19 customer interviews; cutting it in half is the product’s value claim. Workbench is not always-on; Workbench is the tool you grab the moment something breaks.

What Workbench is NOT:

Workbench is NOT a Datadog replacement. Customers keep Datadog (or Honeycomb, or their internal Grafana stack) for everything between incidents. Workbench imports data from those tools during the incident.

Workbench is NOT an always-on observability platform. We do not compete on retention windows, custom dashboards, or alert-rule engines. The product surface is intentionally narrow.

Workbench is NOT trying to debug code. Workbench shows what was happening; it does not propose code fixes or call out specific bugs. The SRE retains full diagnostic authority.

Workbench is NOT for Series A startups or for enterprises. The Series A team is too small to have the distributed-systems complexity Workbench solves for. The enterprise team has different procurement, scale, and tooling-context realities than v0.1 can serve.

Decider Checkpoint

Priya sign-off required to proceed to Approach Options (Day 2 AM).

  • Priya confirms the 4 decision principles, especially principle 1 (incident-time first; always-on never).
  • Priya confirms the Mini Manifesto including all 4 negative-positioning paragraphs.
  • Priya accepts that this Differentiation block has effectively pre-committed Workbench to the specialized-debugger direction; Day 2 Approach Options will be variations within that direction, not a re-litigation against the general-observability path.
  • Priya confirms the 2x2 axis choice (always-on vs incident-time x disorientation-phase strength).
  • Priya accepts the top-3 differentiators (D1 incident-time + D2 one-screen + D3 auto-correlation).

Signed: Priya, 2026-05-21 17:45 PT