Measure Survey Analysis: Workbench DevEx Pulse

Scenario

The Workbench platform team ran a dev-experience pulse survey and got 65 responses from engineers. They want to know what to fix. This demonstrates the exploratory analysis pattern with a weak sample - and the most valuable thing the analysis does is refuse to overstate it. With N=65, the data is directional only; it cannot support capital-allocation decisions. The survey is illustrative; invented data is tagged [fictional].

Source Notes:

Jeff Sauro and James R. Lewis, “Quantifying the User Experience” (Morgan Kaufmann) - the sample-size guidance behind the “N=65 is direction-only” stance and the refusal to imply statistical precision.
Virginia Braun and Victoria Clarke, “Using thematic analysis in psychology” (Qualitative Research in Psychology, 2006) - the thematic-clustering method for the open-text responses, which is where a small exploratory survey delivers its real value.

Prompt

measure-survey-analysis

analyze our dev-experience pulse survey. 65 engineers responded out of ~280.
mix of likert questions (rate your dev experience 1-5 across a few areas)
plus an open text "biggest friction in your day?". tell us what to prioritize.

Output

Survey Analysis: Workbench Dev-Experience Pulse

Read this first: N=65 (of ~280 engineers, ~23% response). This sample is large enough to spot directional themes but too small for statistically reliable conclusions or capital-allocation decisions. Everything below is direction-only. Treat it as a signal of where to look, not as a mandate of what to fund.

Executive Summary

Sixty-five engineers responded. The data is directional only and should not drive headcount or large investment decisions on its own. The clearest signal - consistent across the likert scores and the open text - is slow and flaky CI as the top friction. Local environment setup is a strong second. Confidence: Low (sample size), but the convergence of likert and open-text on the same theme raises the qualitative confidence in CI as the place to look next. The most important caveat: this is a self-selected fifth of the org.

Survey Methodology Summary

Sample size (N): 65 of ~280 engineers (~23% response [fictional])
Recruitment method: Slack announcement + email; voluntary
Who responded vs. who was invited: Self-selected; engineers with strong feelings (especially frustrations) over-represented
Selection bias risks: Voluntary response skews toward the dissatisfied; quiet-but-content engineers under-represented
Question-design risks: Likert “rate your dev experience” is vague; open-text “biggest friction” prompts complaints

These choices affect interpretation: the survey is a frustration-finder, not a representative satisfaction measure.

Per-Question Analysis

With N=65, report direction only. No margin of error is quoted because the sample and self-selection do not support implied precision.

Q#	Question	Distribution	Confidence	What it shows	What it does NOT show
Q1	Overall dev experience (1-5)	Mean ~2.9 `[fictional]`	Direction only	Middling sentiment	Whether it generalizes to the 215 non-respondents
Q2	CI/build experience (1-5)	Mean ~2.2 `[fictional]`	Direction only	CI is the weakest area	Magnitude across the org
Q3	Local env setup (1-5)	Mean ~2.5 `[fictional]`	Direction only	Second weakest	-
Q4	Biggest daily friction (open text)	58 responses `[fictional]`	Direction only	Themes below	A count that represents all engineers

Persona / Segment Breakdown

Segment cuts are NOT reported. Splitting 65 responses by team or tenure produces sub-segments well below n=30, which cannot support any defensible claim. If segment differences matter, they require a larger sample (see Next Steps).

Open-Text Thematic Clustering

AI-assisted clustering of 58 open-text responses; quotes illustrative [fictional]. This is where a small exploratory survey earns its keep.

Theme	Approx. mentions	Representative quotes	Valence	Note
Slow / flaky CI	~30	”I rerun CI 3 times to get a green build” `[fictional]`	Negative	Matches the low Q2 score
Local env setup	~16	”Setting up a new service eats a whole day” `[fictional]`	Negative	Matches Q3
Unclear ownership / docs	~12	”I never know which team owns what” `[fictional]`	Negative	New theme not in likert
Too many tools	~8	”context-switching across 5 dashboards” `[fictional]`	Negative	Minor
Things are fine	~5	”honestly it’s pretty good” `[fictional]`	Positive	Under-represented by design

Convergence note: CI is both the lowest likert score (Q2) and the most-mentioned open-text theme. When two independent measures point the same way, the qualitative confidence in that direction rises even though the sample is small.

Hypothesis Validation

Hypothesis	Status	Evidence	Confidence
CI is the top friction	SUPPORTED (directionally)	Lowest likert + top open-text theme converge	Low (N=65) but cross-measure consistent
We can rank the full friction list reliably	NOT TESTED BY THIS SURVEY	Sample too small to rank beyond the top 1-2	-

What the Data Does NOT Show

Population not represented: The ~215 engineers who did not respond; content engineers especially
Questions not answered: How much time CI friction actually costs; whether fixing it would change retention or velocity
Confounds: Voluntary response skews toward the frustrated
Follow-up that would close the biggest gap: A larger, randomized or census survey, and CI telemetry (actual build times, flake rates)

Prioritized Recommendations

#	Recommendation	Evidence	Confidence	Counter-evidence	Research that would strengthen it
1	Investigate CI flakiness with telemetry before funding a fix	Convergent top theme	Low-Medium	Self-selected sample	CI build-time and flake-rate data
2	Run a larger pulse to make this decision-grade	N=65 is direction-only	High (on the method)	Costs time	A near-census of the ~280-engineer org (target n approx 200, ~70% response)
3	Do NOT reallocate headcount on this survey alone	Sample size	High	Pressure to act fast	Combine survey + telemetry

Next Steps

Pull CI telemetry now (build times, flake rates) - behavioral data will confirm or deny the survey signal at far higher confidence than 65 self-reports
Run a follow-up quarterly dev-experience survey as a near-census of the ~280-engineer org (target n approx 200, ~70% response) so segment cuts and a full friction ranking become possible
This analysis can tell you where to look (CI); it explicitly cannot justify capital allocation or headcount on its own