ICE / RICE / WSJF

Status: Documented, not shipped · Evidence: V · Family: Decision and option evaluation · Verdict: reject (2026-06-03)

ICE / RICE / WSJF is a trademark of SAFe and WSJF are trademarks of Scaled Agile, Inc.; RICE originates at Intercom. RICE (Intercom); WSJF (Scaled Agile / SAFe); ICE (Sean Ellis).

What it is

This entry is not one method but the three most widely used backlog-scoring formulas in product and growth work, grouped because all three do the same thing: turn a small set of subjective ratings into a single number you sort a backlog by.

ICE (Sean Ellis): score each idea on Impact, Confidence, and Ease, each roughly 1-10, then multiply the three. Highest product wins. Built for ranking growth experiments fast.
RICE (Sean McBride at Intercom): score Reach (people affected per period), Impact (a fixed scale, 3 = massive down to 0.25 = minimal), and Confidence (a percentage), then divide by Effort in person-months: (Reach x Impact x Confidence) / Effort. A value-per-effort ratio.
WSJF (Don Reinertsen, adopted by SAFe): Weighted Shortest Job First = Cost of Delay / Job Size, where in SAFe Cost of Delay = Business Value + Time Criticality + Risk Reduction / Opportunity Enablement, each rated on a Fibonacci scale. Highest ratio is sequenced first.

The durable move under all three is the same one, and it is worth naming honestly: weighted multi-criteria scoring of a list of options. You pick a few criteria, rate each option on each criterion, combine the ratings with a fixed arithmetic rule (multiply, or value-over-cost), and read off a rank. That is a real and useful operation. It is also exactly the operation the catalog already ships, with the criteria named explicitly rather than pre-canned. The three brands differ only in their fixed presets: which criteria are baked in (Reach/Impact/Confidence/Effort vs Impact/Confidence/Ease vs Cost-of-Delay/Job-Size), what scale each rating uses, and whether the combining rule is a product or a ratio. The presets are the packaging; the cognitive move is the same scoring move in each case.

The reason this entry carries a false-precision flag rather than a clean grade is that the packaging adds a specific hazard the bare move does not: each formula multiplies or divides several coarse, subjective ratings and presents the result as a precise score (RICE 4.8 beats RICE 4.6), which reads as measurement when it is opinion run through arithmetic.

When it helps / when it misleads

As lightweight presets these genuinely help when a team has a long undifferentiated backlog, needs a shared and fast way to argue about order, and the cost of getting any single rank slightly wrong is low. The honest value is forcing the conversation: ICE makes a growth team say out loud how confident they actually are; RICE’s Reach term pushes people from “I like this” toward a countable estimate of who it touches; WSJF’s Cost-of-Delay framing reminds a team that delivering value sooner is itself worth money. Used as a structured prompt to surface and compare assumptions, the scoring works.

They mislead, or manufacture confidence, when:

The score is treated as a measurement instead of a sorted opinion. Impact “7” means different things to different people on different days; the same idea scored by two teammates can land far apart because nothing pins the scale. The number looks objective; it is a vote in disguise.
Coarse ratings are multiplied and divided as if they were real quantities. This is the central technical objection. The Fibonacci and 1-to-10 scales these formulas use are ordinal (a ranking), not ratio (a true zero and equal intervals). Multiplying and dividing ordinal ratings is not a legitimate arithmetic operation, so the product 4.8 is not “20 percent better than” 4.0 in any defensible sense - it is two rank orders that have been multiplied together.
Error compounds instead of cancelling. Because the formulas multiply and divide rather than add, the margins of error on each input combine: a plus-or-minus 20 percent uncertainty on each of four inputs can yield roughly an 80 percent uncertainty on the resulting score. The precise-looking output is less reliable than any one of the rough inputs that fed it.
The score becomes the decision. When a roadmap is sorted purely by RICE or WSJF, the formula quietly fixes the weighting (Effort always divides; Confidence always multiplies) and hides the judgment calls inside opaque ratings. A “2.3 vs 2.1” gap is treated as a verdict when it is well inside the noise, and genuinely strategic bets lose to easy wins because Effort sits in the denominator.
A preset stands in for the rigorous move. If the job is to compare options against the criteria that actually matter for this decision, with weights you can see and defend, that is decision-option-review. If you cannot put options on a stable scale at all, the honest move is pairwise-comparison. If real probabilities and payoffs are available, it is expected-value-decision-tree. Reaching for a branded scorer in those cases buys a fuzzier version of a tool the catalog already ships.

What the evidence says

Honest governing grade: V (vendor / practitioner-marketing tier). These are useful, real, widely adopted heuristics, but the evidence for them as better decision procedures does not exist, and the way they are usually presented imports a false-precision problem. The flag on this entry is exactly that: include it only with the false-precision caveat.

What the record supports. All three are real, named, traceable, and heavily used. ICE has been a default in growth teams for over fifteen years; RICE was published by Intercom in 2016 and became a product-management staple; WSJF is the SAFe-recommended sequencing method, inherited from a serious body of lean product-development thinking. The qualitative logic behind WSJF is the strongest piece here: Reinertsen’s argument that you should quantify and prioritize by Cost of Delay, and prefer shorter jobs of equal value, is sound queueing-theory reasoning about flow. That is a defensible economic principle. It is not the same claim as “scoring your backlog with these specific Fibonacci ratings produces better outcomes than not.”

What the record does NOT support. There is no controlled or comparative study I can locate that measures ICE, RICE, or WSJF against any alternative on decision quality or downstream outcomes. The critique literature, by contrast, is specific and converges. On measurement theory, S. S. Stevens’ classification of scales establishes that multiplication and division are permissible only on ratio scales, not on the ordinal scales these formulas use - so the arithmetic at the heart of all three is operating on numbers that do not license it. On error propagation, practitioner analyses note that because the formulas divide rather than add, input errors compound (the roughly plus-or-minus 80 percent example above) instead of cancelling. On reliability, the same critiques document that scores vary widely with who is scoring and when, because nothing anchors “Impact = 7.” The fair reading is: the underlying move (weighted multi-criteria scoring) is a legitimate practitioner technique, but these branded presets add a precision claim the inputs cannot bear, and no evidence shows the presets beat naming your own criteria.

Transferred-evidence flag (required). Everything here - the adoption history, the measurement-theory objection, the error-propagation analysis - comes from human product, growth, and statistics contexts. None of it studies any of these formulas executed by or with an AI agent. The evidence is transferred from human use and not validated for AI-augmented prioritization.

Excluded figures (required). No effect-size or “teams using RICE ship N percent more” figure survives the evidence rule, because no traceable primary study produces one; any such number circulating on vendor blogs is excluded and has not influenced the grade. The plus-or-minus 80 percent compounding figure is used only as an illustrative arithmetic consequence of dividing four uncertain inputs, not as a measured empirical result.

Why it is / is not a skill here

Verdict: Reject as a standalone skill; documented with a false-precision flag (status: flag, tier V). The library does not ship ICE, RICE, or WSJF as a skill, for two reinforcing reasons.

One - the move is already shipped, with the false precision removed. The Build burden is to name one distinct, durable cognitive move that no shipped skill produces. These three cannot meet it: their shared mechanism is weighted multi-criteria scoring of a list of options, which is exactly what think-decision-option-review does (the registry already folds the generic multi-criteria-decision-analysis into it). The branded scorers are that skill with the criteria pre-canned to a fixed set, the weights hard-wired into the formula, and an ordinal-times-ordinal arithmetic stapled on top. Where decision-option-review makes the criteria and weights explicit and inspectable, the branded presets hide them inside Reach/Impact/Confidence/Effort and a single product or ratio. For the two cases the scorers handle badly - options you cannot put on a stable scale, or options where real probabilities exist - the catalog already has the better-matched moves: pairwise-comparison (rank without an absolute scale) and expected-value-decision-tree (price genuine uncertainty). There is no residual cognitive move left to ship; only the brand-specific templates, and a template is a prompt, not a new mechanism.

Two - shipping a branded scorer would import the false-precision hazard the catalog exists to strip out. The recurring pattern in this library is to keep the durable move and discard the marketing claim attached to it (the same disposition applied to six-thinking-hats and to the eisenhower-moscow-pareto bundle, where this very entry is already cross-referenced as the flagged sibling vendor scorer). Here the marketing claim is the precise-looking number. Shipping think-rice or think-wsjf as a first-class skill would put the library’s name behind multiplying ordinal ratings and reading a four-decimal verdict off the noise - the opposite of the honesty the catalog is built on. Documenting it with the false-precision caveat, and routing users to decision-option-review with their criteria named explicitly, keeps the useful move and drops the false confidence.

Why flag rather than fold, recipe, or pm: it is recorded as flag (not a clean fold) because there is no single shipped target it collapses into - the scoring reading is decision-option-review, the no-stable-scale reading is pairwise-comparison, the priced-uncertainty reading is expected-value-decision-tree, so the honest map is one-to-three, exactly as with the Eisenhower/MoSCoW/Pareto bundle. It is not a recipe (it is three independent presets, not a fixed multi-skill chain). And it is not handed off to pm-skills: even though RICE and WSJF live mostly in product and agile practice, the underlying move - weighted scoring of options - is a general thinking job whose rigorous form already ships here. The learning value of the flag: three of the most popular prioritization tools in product work share one legitimate move and one shared hazard, and the right use is to treat the score as a structured opinion to argue about, not a measurement to obey.

Lineage and who to read

ICE was created by Sean Ellis, the growth practitioner who coined the term “growth hacking,” to rank growth experiments quickly at companies including LogMeIn and Dropbox. Impact x Confidence x Ease, each scored about 1-10. It is a generic acronym in common use, not a registered product; documented descriptively. The form to read for the honest framing is Ellis’s own growth writing and his later book Hacking Growth (Ellis and Brown, 2017).
RICE was published by Sean McBride and the product team at Intercom in 2016 (“RICE: Simple prioritization for product managers,” the Intercom blog), as an internal standard for choosing what to build next: (Reach x Impact x Confidence) / Effort. RICE’s one genuine improvement over ICE is the Reach term, which pushes one input toward a countable fact rather than pure opinion. RICE originates at Intercom; the acronym itself is generic and is documented descriptively.
WSJF comes from Donald G. Reinertsen, whose The Principles of Product Development Flow (2009) made the economic case for quantifying Cost of Delay and sequencing by value-over-duration, drawing on queueing theory and lean product development. It was then adopted as the recommended prioritization method in the Scaled Agile Framework (SAFe) by Scaled Agile, Inc., in the now-standard Fibonacci-rated form (Cost of Delay = Business Value + Time Criticality + Risk Reduction / Opportunity Enablement, over Job Size). SAFe and WSJF are trademarks of Scaled Agile, Inc.; the queueing principle beneath WSJF is generic and is the part worth reading Reinertsen for.
For the critical read, pair the brands with the measurement objection: S. S. Stevens, “On the Theory of Scales of Measurement” (Science, 1946), establishes that multiplication and division are licensed only on ratio scales, which is why multiplying the ordinal ratings inside these formulas is not a legitimate operation. Read the brand documentation for the procedure and Stevens for why the resulting number is not the measurement it looks like.

Named sources

Sean Ellis (with Morgan Brown), Hacking Growth (Crown Business, 2017), and Ellis’s growth-experiment writing. Origin and intended use of ICE (Impact x Confidence x Ease) for ranking experiments. Practitioner/foundational; no controlled evaluation of the scoring as a method. (V/P)
Sean McBride / Intercom, “RICE: Simple prioritization for product managers” (Intercom blog, 2016). The origin and canonical statement of RICE, (Reach x Impact x Confidence) / Effort. Practitioner/foundational; documents the method, not its comparative effectiveness. (V/P)
Donald G. Reinertsen, The Principles of Product Development Flow: Second Generation Lean Product Development (Celeritas, 2009). The economic and queueing-theory case for Cost of Delay and shortest-weighted-job-first sequencing that WSJF operationalizes. Strong for the underlying flow principle; not a controlled study of the Fibonacci-scored WSJF procedure. (P, for the principle)
Scaled Agile, Inc., “WSJF” (Scaled Agile Framework guidance). The authoritative SAFe definition of WSJF and its Fibonacci-rated Cost-of-Delay-over-Job-Size form. Practitioner/vendor; the source of the specific scored procedure, with no comparative effectiveness evidence. (V)
S. S. Stevens, “On the Theory of Scales of Measurement,” Science 103(2684) (1946): 677-680. Establishes nominal/ordinal/interval/ratio scales and the permissible operations on each; multiplication and division are admissible only on ratio scales. The basis for the objection that multiplying the ordinal ratings inside ICE/RICE/WSJF is illegitimate. (foundational, measurement theory)
Practitioner error-propagation critiques of WSJF and RICE (for example the “Why WSJF is Nonsense” analysis and the RICE compounding-error write-ups). Document the ordinal-scale and divide-rather-than-add objections and the resulting score variability across raters. Used to locate the false-precision hazard, not as primary effectiveness evidence. (critique/practitioner)

Excluded under the evidence rule: any “teams using RICE/WSJF ship X percent more / decide Y percent faster” figure traces to vendor or productivity-site marketing, not a primary study, and is excluded; it has not influenced the V grade. The roughly plus-or-minus 80 percent compounding figure is reported as an arithmetic consequence of dividing four uncertain inputs, not as a measured empirical result.

Was this page helpful?

Thinking Framework Skills v0.8.0 · 56 frameworks