Fermi Estimation

Sometimes you need a number and there is nothing to look up: no dataset, no genuine reference class, no precedent to borrow. A single all-at-once guess at the whole magnitude is badly anchored and hides its own uncertainty. The Fermi move is to factor the unknown into a short chain of sub-quantities - each one small and familiar enough to guess to within a factor - then multiply the chain back into an estimate and compound the per-factor bands into a low/high range. The reason it can beat one wild guess is partial error cancellation: if the per-factor errors are roughly independent and centered, over-guessing one factor and under-guessing another tend to offset in the product. The output is a Fermi decomposition worksheet, not a lone number. The honest constraint: the cancellation only works when the factors are independent, and the benefit is real mainly for large, unfamiliar quantities - not ordinary ones you could estimate directly.

When to Use

You need a numeric magnitude and no lookup-able data and no genuine reference class exists, so the number has to be built from factors.
The quantity is large and unfamiliar (market size, total load, total cost, a conversion count you cannot look up) - the regime where decomposition actually helps.
An order-of-magnitude answer with an honest band is useful for sizing, sanity-checking, or triage; the number does not have to be exact.
You want the estimate inspectable: each factor, its basis, and its band exposed so a reader can challenge one number, not an opaque total.

When NOT to Use

A genuine reference class with real base-rate data exists. Then anchor on that data, not on invented factors - use think-reference-class-forecasting. Fermi is precisely the build-from-factors method for when no such class exists; if you have real base rates, reference-class forecasting is strictly better.
The task only needs the question decomposed for coverage, not a number. If you want a mutually-exclusive, collectively-exhaustive breakdown of a question and explicitly no estimate, use think-issue-tree, which produces a tree and produces no number. Fermi exists to produce a number; do not use it when a number is not wanted.
The quantity is ordinary and familiar. Decomposing something you could estimate directly adds noise; the decomposition benefit was absent or negative in that regime (see evidence/dossier.md).
The factors share a driver (correlated). Multiplicative error-cancellation fails when factors move together; the chain can be worse than one careful guess. Flag it and restructure to independent factors, or stop.
Never emit a point estimate with no low/high band. A Fermi number without its range hides the uncertainty the method exists to expose.

Instructions

When asked to estimate a magnitude with no data to look up, follow these steps:

State the target quantity precisely, with its unit. Confirm there is no real base-rate data or reference class to use instead (if there is, route to think-reference-class-forecasting). Confirm a number is actually wanted (if not, route to think-issue-tree).
Build the multiplicative factor chain. Write the unknown as a product of sub-quantities, each one small and familiar enough to guess to within roughly a factor. Keep the chain short; prefer factors you can anchor.
Give each factor a band and a basis. For every factor, state a low / best / high guess, and where the number came from (a known datum, an analogy, a plausible range). A factor with no stated basis is just a guess in disguise.
Run the independence check. Ask whether any two factors share a driver (move together). If they do, the error-cancellation premise breaks - flag the correlated factors and either restructure to independent factors or note that the range is unreliable.
Combine. Multiply the best-guesses for the point estimate. Multiply the lows for the range floor and the highs for the range ceiling to get the compounded low/high range. Report the estimate as the point value and the range, never the point alone.
Flag the dominant uncertainty. Name the one factor whose band most widens the combined range - that is where tightening one guess would most improve the answer.
Emit the Fermi decomposition worksheet per references/TEMPLATE.md.

Output Format

Use the template in references/TEMPLATE.md. The deliverable is the worksheet: the factor chain, per-factor low/best/high with bases, the point estimate, the compounded low/high range, the independence check, and the dominant-uncertainty flag - not a single number and not prose.

Quality Checklist

Before finalizing, verify:

The target is genuinely a build-from-factors magnitude: no real base-rate data or reference class was available (else use reference-class-forecasting), and a number is actually wanted (else use issue-tree).
The unknown is written as a multiplicative chain of factors, each small enough to guess to within a factor.
Every factor has a low/best/high band and a stated basis for the guess.
The independence check was run, and any correlated factors (sharing a driver) are flagged rather than silently multiplied.
The output gives a point estimate and a compounded low/high range - never a point estimate alone.
The dominant-uncertainty factor is named.
No overclaim: the method gives directional, order-of-magnitude help for extreme uncertain quantities under an independence condition; it does not give a precise or proven number (see evidence/dossier.md).
The output is the Fermi decomposition worksheet artifact, not prose.

Evidence

Tier M/P, transferred-evidence. The mechanism - judgmental multiplicative decomposition - has some controlled support: MacGregor & Armstrong (2007, Decision Sciences, “Judgmental Decomposition: When Does It Work?”) and MacGregor (2001, in Armstrong ed., Principles of Forecasting) report that breaking an estimate into parts and recombining can reduce error - that earns the M half. Three facts cap it below a clean M and demand honesty: (1) the benefit is conditional, present for extreme/uncertain (large, unfamiliar) quantities and absent or negative for ordinary ones; (2) the multiplicative cancellation premise is sensitive to correlated component errors, which erode the benefit; (3) the base is essentially a single multi-problem study line plus field lore (the “within an order of magnitude” track record) plus a statistical argument (log-normal / geometric-mean cancellation), not replicated or meta-analytic. This skill therefore cites no effect-size figure; widely-repeated numbers could not be verified to a primary source and would overstate the grade. The evidence is human-subject, not AI-agent-validated. Full grading, sources, and the deliberately-omitted statistics: evidence/dossier.md.

Examples

See references/EXAMPLE.md for a completed Fermi decomposition worksheet on the shared Northwind scenario.

Deep dive: worked example

A full worked run (the shared Northwind scenario)

Fermi Decomposition Worksheet - Worked Example

A completed run of think-fermi-estimation, on the shared Northwind scenario. This is the quality bar a generated worksheet should meet.

Northwind is a B2B SaaS weighing a self-serve free-tier launch. The question on the table: how many new paying accounts would the free tier convert in its first year? There is no data to look up - the tier has never existed, and there is no genuine reference class with real base rates for Northwind’s funnel. So the number has to be built from factors. (If real comparable base rates existed, this should route to think-reference-class-forecasting instead.)

Target quantity

What is being estimated: new paying accounts acquired via the free tier in year one - unit: paid accounts / year.
Why Fermi (not a lookup): the free tier does not exist yet and Northwind has no comparable self-serve history, so there is no dataset and no genuine reference class - the number is built from factors.

Factor chain

paid accounts/year = monthly site visitors x free-signup rate x free-to-paid conversion rate x 12 months

Per-factor bands

Factor	Low	Best	High	Basis for the guess
Monthly site visitors	20,000	40,000	80,000	Current marketing-site traffic is ~40k/mo per analytics; band allows for a launch bump or a soft quarter
Free-signup rate (visitor -> free account)	1%	2%	4%	Self-serve signup CTRs cluster low single digits; 2% is a common mid-funnel anchor, basis is analogy not Northwind data
Free-to-paid conversion (free -> paid, year one)	2%	4%	8%	Freemium B2B free-to-paid is widely cited in low single digits; wide band because Northwind’s tier design is unset
Months active	12	12	12	One year, fixed (not a source of uncertainty)

Independence check

Do any two factors share a driver? Yes - partially. Free-signup rate and free-to-paid conversion both depend on how qualified the incoming traffic is: a campaign that floods the top of the funnel with low-intent visitors would lift signups but depress conversion (and vice versa). They are negatively correlated through traffic quality, which means the true range is somewhat narrower than the naive low-times-low / high-times-high band suggests (the extremes partly cancel). Visitors and months are independent of both.
Action: keep the band but read the floor and ceiling as conservative outer bounds, not equally likely; do not treat signup-rate and conversion as freely independent when reasoning about the tails.

Combined estimate

Point estimate (best-guesses): 40,000 x 2% x 4% x 12 = ~384 paid accounts/year (call it ~400).
Low (lows): 20,000 x 1% x 2% x 12 = ~48/year.
High (highs): 80,000 x 4% x 8% x 12 = ~3,072/year (call it ~3,000).
So the answer is roughly ~400 paid accounts in year one, plausibly between ~50 and ~3,000 - and, per the independence check, the true spread is likely tighter than that raw 50-to-3,000 because signup rate and conversion partly offset.

Dominant uncertainty

The factor whose band most widens the range is the free-to-paid conversion rate (a 4x span from 2% to 8%, and the least anchored to any Northwind datum). Tightening this one guess - by running a small free-tier pilot or finding a true comparable - would shrink the range far more than refining traffic or signup rate. That, not the headline ~400, is the worksheet’s most useful output: it says where to spend effort before betting on the number.

Note: the value is not the point estimate. It is that the worksheet (a) makes every assumption challengeable instead of hiding them in one number, (b) refuses to report ~400 without the ~50-3,000 band, (c) catches that two factors share a driver so the tails do not multiply naively, and (d) names conversion rate as the thing to de-risk first. A bare “we’d get about 400 signups converting” would have buried all four. And the honest caveat stands: this is an order-of-magnitude build-from-factors estimate, not a forecast - if a real reference class turns up, switch methods.

Grounding: the full evidence dossier

What the research does and does not show, with graded sources

Evidence Dossier: Fermi Estimation

Single source of truth for the fermi-estimation skill. The SKILL.md, sidecar, and evals derive from this. A moderate/practitioner-tier method (M/P) with a transferred-evidence flag: the controlled support is real but conditional, and the field track record is lore, not measurement.


Skill	`thinking-framework-skills.fermi-estimation` (installable name `think-fermi-estimation`)
Family	decision-and-option-evaluation
Evidence tier	M/P (some controlled support for the decomposition mechanism, capped below clean M; transferred-evidence)
Confidence	Moderate that multiplicative decomposition helps for extreme/uncertain quantities; low that it helps for ordinary ones
Status	draft (authored 2026-06-01 from the discovery corpus)

1. The mechanism (what actually does the work)

You need a number for which no lookup-able data exists (“how many paying accounts would a free tier convert in year one?”, “how much support load would self-serve signups add?”). The Fermi move is multiplicative decomposition of a magnitude: factor the unknown into a short chain of sub-quantities, each one guessable to within a factor (an order of magnitude or better), then multiply the chain back into a point estimate, and compound the per-factor bands into a low/high range.

The reason this can beat a single all-at-once guess is partial cross-factor error cancellation. If the per-factor errors are roughly independent and centered (over-guess one factor, under-guess another), they tend to offset in the product rather than compound. Stated more formally: a product of independent multiplicative factors is approximately log-normal, and the geometric mean of independent over- and under-estimates pulls the combined estimate toward the truth. The work is done by replacing one wild guess about a large magnitude with several smaller, more anchorable guesses whose errors partly wash out.

That same formalism names the failure condition: cancellation depends on the factors being independent. If two factors share a driver (both scale with the same underlying thing), their errors are correlated, do not offset, and the decomposition can be worse than a single guess.

2. Lineage

Named for Enrico Fermi and the back-of-the-envelope “Fermi problems” tradition (the canonical teaching example being “how many piano tuners in Chicago?”). The technique is a staple of physics pedagogy, quantitative-reasoning courses, and analyst/consulting case interviews.
The controlled-research thread is judgmental decomposition: MacGregor & Armstrong (2007), “Judgmental Decomposition: When Does It Work?” (Decision Sciences); and D. MacGregor (2001), “Decomposition for judgmental forecasting and estimation,” in J. S. Armstrong (ed.), Principles of Forecasting. These study breaking an estimate into parts that are judged separately and recombined.

No trademark. Named descriptively after the technique, not licensed.

3. What the evidence shows, and what it does NOT show

Supported (the M half): judgmental decomposition - splitting an estimate into components, estimating each, and recombining - has some controlled support for reducing estimation error relative to a single holistic guess. MacGregor & Armstrong (2007) and MacGregor (2001) report that decomposition can improve accuracy. That is a real empirical signal for the underlying mechanism this skill uses, and it is why the grade is not pure practitioner-lore.

Three facts cap it below a clean M and demand honest framing:

The benefit is strictly conditional. It shows up for extreme, uncertain targets - quantities that are large and unfamiliar, where a holistic guess is badly anchored. For ordinary, familiar quantities the decomposition benefit was absent, and could even be negative (you add noise by guessing several things instead of one you already know). So this is not “decomposition always helps”; it is “decomposition helps where a direct guess is hopeless.”
The cancellation premise is fragile. The error-offsetting that makes multiplicative decomposition work assumes the component errors are roughly independent. Correlated component errors erode or reverse the benefit. Multiplicative chains are specifically sensitive to this, because correlated factors multiply their errors together instead of averaging them out.
The evidentiary base is thin. It is essentially a single multi-problem study line (MacGregor / MacGregor & Armstrong), plus field lore (the practitioner claim that Fermi estimates land “within an order of magnitude”), plus a statistical argument (the log-normal / geometric-mean cancellation result). It is not replicated across many independent labs and not meta-analytic. Treat it as “promising and partly demonstrated,” not “established.”

Explicitly NOT claimed - no laundered statistics. Practitioner and secondary write-ups float specific figures for the size of the decomposition effect (for instance a “median error factor of 99 for holistic guesses versus 3 for decomposed ones,” or “roughly a 42% error reduction”). Those numbers could not be verified to a primary source in this dossier’s research, and stating them would read as far more precise than the grade warrants. This skill therefore cites no effect-size figure. The honest claim is directional only: decomposition can reduce error for extreme uncertain quantities, under an independence condition, on a thin evidence base.

4. Transferred-evidence flag

All of the above is human-subject evidence: human estimators making judgmental forecasts. It has not been validated for an AI agent. Transferred, not AI-validated. A model can run the same multiplicative decomposition and may inherit the same correlated-factor trap, plus model-specific failure modes (over-confident sub-estimates, anchoring on a remembered figure). The AI value is real but unproven: the structure forces the agent to expose each factor, its basis, and its band, so a reader can challenge one number instead of one opaque total - and the worksheet makes the independence assumption and the dominant uncertainty inspectable.

5. When it works / when it fails

Works best when:

The target is a magnitude with no lookup-able data and no usable reference class - you genuinely have to build the number from factors.
The quantity is large and unfamiliar (the regime where the controlled benefit appeared).
The factors are roughly independent (do not share a single driver), so per-factor errors can partly cancel.
An order-of-magnitude answer with an honest band is useful (sizing, sanity-checking, triage), not a number that must be exact.

Fails or misleads when (poor-fit / anti-patterns):

A genuine reference class with real base-rate data exists. Then build the estimate from that data, not from invented factors - use think-reference-class-forecasting. Fermi is precisely the method for when no such class exists.
The task only needs the question decomposed for coverage, not a number. If the goal is a mutually-exclusive, collectively-exhaustive breakdown of a question (and explicitly no number), use think-issue-tree, which produces a tree, not an estimate.
The quantity is ordinary and familiar. Decomposing something you could estimate directly adds noise; the benefit was absent or negative in that regime.
Factors share a driver (correlated). Multiplicative error-cancellation fails; the chain can be worse than one careful guess. Flag it and stop, or restructure to independent factors.
A point estimate is emitted with no band. A Fermi number without its low/high range hides exactly the uncertainty the method is supposed to make legible. Never emit a point estimate alone.

6. Output artifact

A Fermi decomposition worksheet:

The target quantity (and its unit) stated precisely.
The multiplicative factor chain - the unknown written as a product of sub-quantities.
A per-factor band: low / best / high for each factor, each with the basis for the guess (where the number came from).
The combined point estimate (multiply the best-guesses) and a low/high range (compound the per-factor bands).
An explicit independence check that flags any correlated factors (factors sharing a driver), because correlation breaks the cancellation premise.
A dominant-uncertainty flag naming the one factor whose band most drives the width of the combined range - i.e. where to spend effort to tighten the answer.

7. Sources

MacGregor, D. G., & Armstrong, J. S. (2007). “Judgmental Decomposition: When Does It Work?” Decision Sciences (study of when decomposing an estimate into parts improves accuracy; benefit concentrated on extreme/uncertain quantities).
MacGregor, D. G. (2001). “Decomposition for judgmental forecasting and estimation.” In J. S. Armstrong (ed.), Principles of Forecasting. Kluwer.
Fermi-problem tradition / quantitative-reasoning and case-interview pedagogy (the “within an order of magnitude” field lore) - practitioner, not controlled.
Statistical argument: a product of independent factors is approximately log-normal; the geometric mean of independent over/under estimates cancels (the cross-factor error-cancellation premise).

Verification status: the existence of a conditional decomposition benefit (present for extreme/uncertain quantities, absent or negative for ordinary ones) and the independence sensitivity are the defensible claims and set the M/P grade. Specific effect-size numbers (e.g. “error factor 99 vs 3”, “42% reduction”) are deliberately omitted as unverifiable to a primary source. The honest scope - “directional help for build-from-factors magnitudes under an independence condition, on a thin base, human-subject not AI-validated” - is the core caveat.

Thinking Framework Skills v0.3.0 · 38 frameworks