¶
Quick facts
Phase: Define | Version: 2.0.0 | Category: ideation | License: Apache-2.0
Hypothesis¶
A hypothesis is a testable prediction about how a change will affect user behavior or business outcomes. It transforms assumptions into explicit statements that can be validated or invalidated through experimentation. Well-formed hypotheses prevent teams from building features based on untested beliefs and create shared understanding of what success looks like.
When to Use¶
- After problem framing, before committing to a solution
- When designing experiments or A/B tests
- When team members have differing assumptions about user behavior
- Before investing significant engineering resources in a feature
- When pivoting direction and need to validate the new approach
How to Use¶
Use the /hypothesis slash command:
Or reference the skill file directly: skills/define-hypothesis/SKILL.md
Instructions¶
When asked to create a hypothesis, follow these steps:
-
State the Belief Articulate what you believe will happen. Use the structured format: "We believe that [action/change] for [target user] will [expected outcome]." Be specific about the intervention — vague hypotheses can't be tested.
-
Identify the Target User Define who this hypothesis applies to. A hypothesis about "users" is too broad. Specify the segment: new users in their first week, power users with 10+ sessions, churned users returning, etc.
-
Define the Expected Outcome What behavior change or result do you expect? Frame it in terms of user actions (complete onboarding, make a purchase, return within 7 days) rather than internal metrics when possible.
-
Set Success Metrics Choose a primary metric that directly measures the expected outcome. Include secondary metrics that provide context and guardrail metrics that ensure you're not causing harm elsewhere.
-
Describe Validation Approach How will you test this hypothesis? A/B test, user interviews, prototype testing, cohort analysis? Be specific about sample size, duration, and statistical requirements.
-
Document Risks and Assumptions What could invalidate this hypothesis beyond the test results? What are you assuming to be true that you haven't validated?
Output Template¶
Hypothesis: [Brief Title]¶
Hypothesis Statement¶
We believe that [specific action or change]
for [target user segment]
will [expected outcome/behavior change]
as measured by [primary success metric]
Background & Rationale¶
Problem Context¶
[Problem context]
Supporting Evidence¶
[Evidence that supports this belief]
Alternative Hypotheses Considered¶
[Alternative approaches]
Target User Segment¶
Definition¶
[User segment definition]
Segment Size¶
[Estimated count or percentage]
Current Behavior¶
[Current state]
Success Metrics¶
Primary Metric¶
| Metric | Current Baseline | Target | Minimum Detectable Effect |
|---|---|---|---|
| [Metric name] | [Current value] | [Target value] | [MDE %] |
Secondary Metrics¶
| Metric | Current Baseline | Expected Direction |
|---|---|---|
| [Metric 1] | [Value] | [Increase/Decrease/No change] |
| [Metric 2] | [Value] | [Increase/Decrease/No change] |
Guardrail Metrics¶
| Metric | Current Value | Acceptable Range |
|---|---|---|
| [Metric 1] | [Value] | [Range] |
Validation Approach¶
Method¶
[Validation method]
Sample Size & Duration¶
- Sample size: [Number per variant]
- Duration: [Time period]
- Traffic allocation: [Percentage]
Pass/Fail Criteria¶
- Validated if: [Specific criteria]
- Invalidated if: [Specific criteria]
- Inconclusive if: [Specific criteria]
Risks & Assumptions¶
Key Assumptions¶
- [Assumption 1]
- [Assumption 2]
Risks¶
- [Risk 1]
- [Risk 2]
Timeline¶
| Phase | Dates | Duration |
|---|---|---|
| Setup & instrumentation | [Dates] | [Duration] |
| Test running | [Dates] | [Duration] |
| Analysis | [Dates] | [Duration] |
| Decision | [Date] | — |
Example Output¶
Hypothesis: Simplified Onboarding Flow
Hypothesis: Simplified Onboarding Flow¶
Hypothesis Statement¶
We believe that reducing the onboarding flow from 7 steps to 3 essential steps
for new users signing up for a free trial
will increase onboarding completion rate
as measured by percentage of users who complete all onboarding steps within their first session
Background & Rationale¶
Problem Context¶
Our SaaS product has a 34% onboarding completion rate — meaning 66% of new signups never finish setup and experience the core value proposition. User research indicates the current 7-step onboarding feels overwhelming, with significant drop-off occurring at steps 4 and 5 (team invitation and integration setup). Users who don't complete onboarding are 4x more likely to churn within 14 days.
Supporting Evidence¶
- Session recordings show users hesitating and abandoning at the team invitation step
- Support tickets frequently ask "Can I skip some of these steps?"
- Competitor analysis shows market leaders use 3-4 step onboarding flows
- Exit survey data: 42% of churned users cite "too complicated to get started"
- Hotjar heatmaps show users scrolling to find a "skip" button that doesn't exist
Alternative Hypotheses Considered¶
- Progress indicators: Adding a progress bar might reduce anxiety without changing steps — rejected because underlying issue is step count, not visibility
- Tooltips/guidance: More help content might reduce confusion — rejected because it adds more cognitive load
- Optional steps: Making steps skippable might work — considered as fallback if simplification fails
Target User Segment¶
Definition¶
New users who: - Sign up for a free trial (not paid conversion from trial) - Are the first user from their organization (not invited team members) - Access the product via web (not mobile app)
Segment Size¶
- 12,400 new trial signups per month meeting these criteria
- 8,200 (66%) currently fail to complete onboarding
Current Behavior¶
- Average time to complete current onboarding: 18 minutes
- Step 1-3 completion: 78%
- Step 4 (team invitation) completion: 52%
- Step 5 (integration) completion: 41%
- Full completion (all 7 steps): 34%
- Users who complete onboarding activate core feature within 24h: 89%
Success Metrics¶
Primary Metric¶
| Metric | Current Baseline | Target | Minimum Detectable Effect |
|---|---|---|---|
| Onboarding completion rate | 34% | 50% | 10% relative lift |
Secondary Metrics¶
| Metric | Current Baseline | Expected Direction |
|---|---|---|
| Time to complete onboarding | 18 min | Decrease to <8 min |
| Day-1 core feature activation | 30% | Increase |
| Support tickets (first 24h) | 8.2% of users | Decrease |
| User satisfaction (post-onboarding) | 3.⅖ | Increase |
Guardrail Metrics¶
| Metric | Current Value | Acceptable Range |
|---|---|---|
| 14-day trial-to-paid conversion | 12% | No decrease >5% relative |
| Team invitation rate (within 7 days) | 23% | No decrease >10% relative |
| Integration connection rate (within 7 days) | 31% | No decrease >10% relative |
Validation Approach¶
Method¶
A/B test with 50/50 traffic split between: - Control: Current 7-step onboarding flow - Treatment: New 3-step onboarding (account basics, workspace setup, first task creation)
Deferred steps (team invitation, integrations) will be prompted via in-app messaging after initial activation.
Sample Size & Duration¶
- Sample size: 3,000 users per variant (6,000 total)
- Duration: 14 days of enrollment + 7 days observation window
- Traffic allocation: 50% control / 50% treatment
- Statistical significance: 95% confidence level
- Statistical power: 80%
Pass/Fail Criteria¶
- Validated if: Onboarding completion increases by ≥10% relative (34% → 37.4%+) with 95% confidence AND guardrail metrics stay within acceptable range
- Invalidated if: Onboarding completion shows no significant change or decreases, OR guardrail metrics breach acceptable range
- Inconclusive if: Results don't reach statistical significance within test window — extend test or increase sample
Risks & Assumptions¶
Key Assumptions¶
- Users who complete a shorter onboarding will still discover team/integration features later
- The 3 essential steps are sufficient to demonstrate core product value
- In-app prompts can effectively drive deferred actions
- Onboarding completion is a leading indicator of retention (not just correlated)
Risks¶
- Feature discovery risk: Users might never set up teams/integrations if not prompted during onboarding
- Segment spillover: Results might not generalize to invited users or mobile signups
- Novelty effect: Initial lift might fade as users become accustomed to flow
- Selection bias: Users who would have completed 7-step flow might be different from marginal completers
Timeline¶
| Phase | Dates | Duration |
|---|---|---|
| Setup & instrumentation | Jan 15-17, 2026 | 3 days |
| Test running | Jan 18-31, 2026 | 14 days |
| Observation window | Feb 1-7, 2026 | 7 days |
| Analysis | Feb 8-10, 2026 | 3 days |
| Decision | Feb 11, 2026 | — |
Real-World Examples¶
See this skill applied to three different product contexts:
Storevine (B2B): Storevine B2B ecommerce platform — Campaigns v1 first-campaign guided flow hypothesis
Prompt:
/hypothesis
Project: Campaigns — native email marketing for Storevine merchants
Stage: Post-discovery, pre-PRD finalization
Hypothesis I want to define:
- Non-adopter merchants (no active external email tool, <250 customers)
are ~38% of our active base [fictional] and represented 3 of 8 merchant
interview participants (P3, P6, and P8)
- Core belief: setup complexity is the barrier — not awareness or price
- Specific hypothesis: a guided first-campaign flow with product-seeded
templates will drive first-send rate from ~12% [fictional] to ≥30%
[fictional] within 60 days of GA
Prior work to reference:
- Merchant interview synthesis (Jan 12–28, 2026): P3, P6, and P8 described
email as "too overwhelming to start" or perennially "on the list"
- Competitive analysis (Feb 2026): Shopify Email's template-first + free
tier activation is their primary new-merchant onboarding lever
- Problem statement: email-related churn estimated at 4.8 pp [fictional]
of overall 22% [fictional] annual merchant churn rate
Need: full hypothesis document with success metrics, validation approach,
pass/fail criteria, and risks. Will attach to PRD as primary testable belief.
Output:
Hypothesis: Pre-Populated Templates Drive First Campaign Sends for Non-Adopter Merchants¶
Brainshelf (Consumer): Brainshelf consumer PKM app — Resurface morning email digest hypothesis
Prompt:
/hypothesis
trying to figure out if a morning digest email will actually get people to re-read
their saved stuff. context: brainshelf pkm app, 22k MAU [fictional]. users save
~47 items/month but only go back to read ~9% within 30 days [fictional]. classic
guilt pile problem from interviews.
want to run an A/B test on a morning email that surfaces 3-5 items from their
library based on what they've been reading lately. need a hypothesis doc to
align the team before we commit to building it.
primary metric: resurface item click rate. secondary: actual read completion.
guardrail: don't tank unsubscribe rate.
Output:
Hypothesis: Morning Resurface Email Increases Re-Read Rate¶
Workbench (Enterprise): Workbench enterprise collaboration platform: required-section enforcement hypothesis
Prompt:
/hypothesis
Product: Workbench Blueprints (enterprise doc templates with required sections and approval gates)
Stage: Define phase, post-discovery interviews and problem statement
Hypothesis: Requiring all Blueprint sections to be completed before an author can submit for approval will reduce median time to first approved Blueprint.
Context:
- 38% of Blueprints in closed beta reach approval with ≥1 empty required section [fictional]
- Median time to first approval: 4.0 days [fictional]
- Most rejections are for missing content, not quality [fictional]
- Approvers (dept heads, compliance leads) are the bottleneck -- they reject and wait, or approve with risk
- Target: reduce median approval time to ≤1 day [fictional] (aspirational)
- MDE for experiment: 1.0 day reduction (to ≤3.0 days) [fictional]
Target users: Project leads and document authors at enterprise Workbench accounts
Validation: A/B test in closed beta (80 accounts, ~300 Blueprints/week [fictional])
Primary metric: median time-to-first-approval (days)
Guardrails: author abandonment, author NPS
Stakeholders: Sandra C. (Head of Product), Karen L. (Eng Lead), Leo M. (Data Analyst)
Output:
Hypothesis: Required Blueprint Sections Reduce Time-to-Approval¶
Quality Checklist¶
Before finalizing, verify:
- Hypothesis is falsifiable (possible to prove wrong)
- Success metric has a specific numeric target
- Target user segment is clearly defined
- Validation approach is practical and time-bound
- Pass/fail criteria are unambiguous
- Hypothesis doesn't assume the solution works
Output Format¶
Use the template in references/TEMPLATE.md to structure the output.