Skip to content

Measure Survey Analysis: Workbench DevEx Pulse

Scenario

The Workbench platform team ran a dev-experience pulse survey and got 65 responses from engineers. They want to know what to fix. This demonstrates the exploratory analysis pattern with a weak sample - and the most valuable thing the analysis does is refuse to overstate it. With N=65, the data is directional only; it cannot support capital-allocation decisions. The survey is illustrative; invented data is tagged [fictional].

Source Notes:

  • Jeff Sauro and James R. Lewis, “Quantifying the User Experience” (Morgan Kaufmann) - the sample-size guidance behind the “N=65 is direction-only” stance and the refusal to imply statistical precision.
  • Virginia Braun and Victoria Clarke, “Using thematic analysis in psychology” (Qualitative Research in Psychology, 2006) - the thematic-clustering method for the open-text responses, which is where a small exploratory survey delivers its real value.

Prompt

/survey-analysis
analyze our dev-experience pulse survey. 65 engineers responded out of ~280.
mix of likert questions (rate your dev experience 1-5 across a few areas)
plus an open text "biggest friction in your day?". tell us what to prioritize.

Output

Survey Analysis: Workbench Dev-Experience Pulse

Read this first: N=65 (of ~280 engineers, ~23% response). This sample is large enough to spot directional themes but too small for statistically reliable conclusions or capital-allocation decisions. Everything below is direction-only. Treat it as a signal of where to look, not as a mandate of what to fund.

Executive Summary

Sixty-five engineers responded. The data is directional only and should not drive headcount or large investment decisions on its own. The clearest signal - consistent across the likert scores and the open text - is slow and flaky CI as the top friction. Local environment setup is a strong second. Confidence: Low (sample size), but the convergence of likert and open-text on the same theme raises the qualitative confidence in CI as the place to look next. The most important caveat: this is a self-selected fifth of the org.

Survey Methodology Summary

  • Sample size (N): 65 of ~280 engineers (~23% response [fictional])
  • Recruitment method: Slack announcement + email; voluntary
  • Who responded vs. who was invited: Self-selected; engineers with strong feelings (especially frustrations) over-represented
  • Selection bias risks: Voluntary response skews toward the dissatisfied; quiet-but-content engineers under-represented
  • Question-design risks: Likert “rate your dev experience” is vague; open-text “biggest friction” prompts complaints

These choices affect interpretation: the survey is a frustration-finder, not a representative satisfaction measure.

Per-Question Analysis

With N=65, report direction only. No margin of error is quoted because the sample and self-selection do not support implied precision.

Q#QuestionDistributionConfidenceWhat it showsWhat it does NOT show
Q1Overall dev experience (1-5)Mean ~2.9 [fictional]Direction onlyMiddling sentimentWhether it generalizes to the 215 non-respondents
Q2CI/build experience (1-5)Mean ~2.2 [fictional]Direction onlyCI is the weakest areaMagnitude across the org
Q3Local env setup (1-5)Mean ~2.5 [fictional]Direction onlySecond weakest-
Q4Biggest daily friction (open text)58 responses [fictional]Direction onlyThemes belowA count that represents all engineers

Persona / Segment Breakdown

Segment cuts are NOT reported. Splitting 65 responses by team or tenure produces sub-segments well below n=30, which cannot support any defensible claim. If segment differences matter, they require a larger sample (see Next Steps).

Open-Text Thematic Clustering

AI-assisted clustering of 58 open-text responses; quotes illustrative [fictional]. This is where a small exploratory survey earns its keep.

ThemeApprox. mentionsRepresentative quotesValenceNote
Slow / flaky CI~30”I rerun CI 3 times to get a green build” [fictional]NegativeMatches the low Q2 score
Local env setup~16”Setting up a new service eats a whole day” [fictional]NegativeMatches Q3
Unclear ownership / docs~12”I never know which team owns what” [fictional]NegativeNew theme not in likert
Too many tools~8”context-switching across 5 dashboards” [fictional]NegativeMinor
Things are fine~5”honestly it’s pretty good” [fictional]PositiveUnder-represented by design

Convergence note: CI is both the lowest likert score (Q2) and the most-mentioned open-text theme. When two independent measures point the same way, the qualitative confidence in that direction rises even though the sample is small.

Hypothesis Validation

HypothesisStatusEvidenceConfidence
CI is the top frictionSUPPORTED (directionally)Lowest likert + top open-text theme convergeLow (N=65) but cross-measure consistent
We can rank the full friction list reliablyNOT TESTED BY THIS SURVEYSample too small to rank beyond the top 1-2-

What the Data Does NOT Show

  • Population not represented: The ~215 engineers who did not respond; content engineers especially
  • Questions not answered: How much time CI friction actually costs; whether fixing it would change retention or velocity
  • Confounds: Voluntary response skews toward the frustrated
  • Follow-up that would close the biggest gap: A larger, randomized or census survey, and CI telemetry (actual build times, flake rates)

Prioritized Recommendations

#RecommendationEvidenceConfidenceCounter-evidenceResearch that would strengthen it
1Investigate CI flakiness with telemetry before funding a fixConvergent top themeLow-MediumSelf-selected sampleCI build-time and flake-rate data
2Run a larger pulse to make this decision-gradeN=65 is direction-onlyHigh (on the method)Costs timeA near-census of the ~280-engineer org (target n approx 200, ~70% response)
3Do NOT reallocate headcount on this survey aloneSample sizeHighPressure to act fastCombine survey + telemetry

Next Steps

  • Pull CI telemetry now (build times, flake rates) - behavioral data will confirm or deny the survey signal at far higher confidence than 65 self-reports
  • Run a follow-up quarterly dev-experience survey as a near-census of the ~280-engineer org (target n approx 200, ~70% response) so segment cuts and a full friction ranking become possible
  • This analysis can tell you where to look (CI); it explicitly cannot justify capital allocation or headcount on its own