Expected Value Decision Tree

When a decision turns on outcomes you do not control, the reflex is to argue the options in prose and decide on a hunch. An expected-value decision tree refuses that. It prices the uncertainty: lay out the options as a tree of choice nodes (branches the decider controls) and chance nodes (branches nature controls, each carrying a probability), put values at the leaves, then roll the tree back right to left so every chance node collapses to its expected value (the sum of probability times value) and every choice node keeps its best branch. What survives the rollback is the highest-EV option and the path that produces it. The load-bearing ingredient, the thing a deterministic option matrix cannot express, is the chance node. The output is a decision tree with rolled-back EVs, the chosen branch, and a what-flips-it note, never a bare EV number presented as the answer.

When to Use

A decision genuinely hinges on uncertain outcomes you can put rough, sourceable probabilities on (a launch with a real failure rate, an investment with uncertain payoffs).
The structure is sequential - a choice now opens chance events that open later choices (“test first, then decide” vs “commit now”).
The stakes justify making the probability assumptions explicit and inspectable, so a disagreement becomes a disagreement about a named number rather than a clash of intuitions.
You already have a probability to work with (or can source one), and the remaining question is what to do with it.

When NOT to Use

The probabilities and values are guessed and then trusted. A tree renders fabricated inputs in the authoritative grammar of arithmetic, manufacturing false precision - the central failure mode. A number with no defensible source does not become trustworthy by being multiplied. Where the probability is the hard part, source a base rate with think-reference-class-forecasting instead of inventing one inside the tree.
The decision is a one-shot with intolerable downside. EV is an average over many independent repetitions; the law of large numbers guarantees convergence across many bets, not on the single bet in front of you. A positive-EV gamble that includes a small chance of ruin is the wrong call for a one-time, non-repeated decision. The criterion there is risk of ruin or a risk-averse utility, not raw EV - treating the average as the answer is a category error.
It is mistaken for descriptive truth. EV is normative (what a coherent decider should do given those numbers), not a description of good judgment. People predictably depart from it via the certainty effect and nonlinear probability weighting (Allais 1953; prospect theory, Kahneman and Tversky 1979), and some of those departures are real risk preferences. The tool’s job is to surface the tradeoff, not to declare the risk-neutral answer “correct” and the decider’s risk aversion a bias.
The outcome space cannot be enumerated or priced. Deep uncertainty (you cannot even list the outcomes) and incommensurable values that resist a common scale both break the rollback and produce tidy-but-fictional EVs.
The call is reversible and low-stakes. A two-way door does not need a tree; building one is its own small over-process. Triage with think-one-way-vs-two-way-door first, before reaching for quantitative machinery.

Instructions

When asked to choose under uncertainty by pricing outcomes, follow these steps:

Frame the decision and the options. State the one-line choice and list the real, distinct actions under consideration. If the call is reversible and low-stakes, stop and triage with think-one-way-vs-two-way-door instead of building a tree.
Lay out the tree. For each option, draw the sequence of choice nodes (branches the decider controls) and chance nodes (branches nature controls). Put the outcomes at the leaves.
Source the probabilities, do not invent them. For every chance-node fan, assign probabilities that sum to 1, and name where each number came from (a base rate via think-reference-class-forecasting, a measured rate, a stated assumption). Flag any probability that is a guess at the node it enters.
Price the outcomes. Put a value on each leaf in a common unit. Note any outcome whose value resists a common scale (an incommensurable cost), rather than forcing a fake number.
Roll the tree back (fold back), right to left. At each chance node, replace the fan with its expected value (the sum of probability times value). At each choice node, keep the best-EV branch and prune the rest. Carry the arithmetic explicitly so it can be checked.
Run the what-flips-it (sensitivity) step. Identify the single probability or value the recommendation is most fragile to, and state the threshold at which it would flip the chosen branch. This is the deliverable’s spine, not an optional extra.
Check for ruin and risk attitude before recommending. If any branch carries a small probability of an intolerable, non-recoverable loss on a one-shot decision, say so and flag that raw EV is the wrong criterion (risk of ruin or a risk-averse utility governs). If the decider’s risk aversion is a real preference, surface it rather than overriding it with the risk-neutral EV.
Recommend and emit the artifact. State the chosen option, its EV, the path that produces it, the what-flips-it note, and any ruin or incommensurability flags, per references/TEMPLATE.md.

Output Format

Use the template in references/TEMPLATE.md. The deliverable is the filled tree - options, the choice/chance structure, sourced probabilities, the rolled-back EVs with arithmetic shown, the recommendation and path, the what-flips-it note, and a ruin/risk flag - not a prose essay and not a bare EV number.

Quality Checklist

Before finalizing, verify:

The tree separates choice nodes (decider controls) from chance nodes (nature controls), and each chance fan’s probabilities sum to 1.
Every probability names its source, and any guessed probability is flagged at the node it enters - no fabricated input is laundered into the arithmetic.
Outcomes are priced in a common unit; any incommensurable value is noted, not forced into a fake number.
The rollback is shown right to left with explicit arithmetic (chance node to EV, choice node to best branch), so it can be checked.
A what-flips-it note names the single probability or value the recommendation is most fragile to and the threshold at which the choice flips.
A ruin check is run - any small-probability intolerable one-shot loss is flagged, with the note that raw EV is the wrong criterion there.
The output is the tree artifact with the chosen path, not a bare EV presented as the answer.
No overclaiming: the evidence is practitioner-grade and transferred; claim “prices uncertain outcomes and makes the assumptions inspectable,” not a measured gain in decision outcomes (see evidence/dossier.md).

Evidence

Tier P (governing; honest read split, capped at P). The split must not be laundered upward. That expected-value or expected-utility maximization is the normatively correct rule given coherent probabilities and utilities rests on von Neumann and Morgenstern (1944) and Savage (1954) - an S-tier mathematical result, but it measures the wrong thing for a skill (it establishes the rule given the inputs; it is not evidence that drawing a tree decides better than a cheaper rule). The claim a skill actually makes - building a tree and computing EV makes a real decider’s decisions better than the rule they would otherwise use - has only practitioner-level, transferred support (clinical decision analysis: Raiffa 1968; Pauker and Kassirer 1980; Bae 2014). The one nameable comparative finding is mixed and indirect (Mhaskar et al. 2014: decision-analysis results concorded with matching RCT systematic reviews in 73% of cases, 27/37, and with single RCTs in only 50%) - it bounds reliability, it does not lift the grade. Every effectiveness datum is from human deciders; none is from an AI-produced EV tree, so the evidence is transferred and the conservative P governs. No “decision-tree analysis improves decisions by N%” figure traces to a primary source, and none is asserted. Full grading, sources, and caveats: evidence/dossier.md.

Examples

See references/EXAMPLE.md for a completed expected-value decision tree on a real decision.

Deep dive: worked example

A full worked run (the shared Northwind scenario)

Expected Value Decision Tree - Worked Example

A completed run of the think-expected-value-decision-tree skill on a real, consequential decision. This is the quality bar a generated tree should meet.

Uses the shared recurring scenario (Northwind, a B2B SaaS weighing a self-serve free-tier launch) so examples across skills read as one coherent product. Where think-decision-option-review scored the growth options on weighted criteria, this skill takes the one option whose payoff genuinely hinges on an uncertain outcome - the free-to-paid conversion rate - and prices it. See docs/internal/AUTHORING.md.

Decision

Should Northwind launch the self-serve free tier now, run a paid pilot first to learn the conversion rate before committing, or not launch at all.

Options

A: Launch now - ship the free tier to the whole market this quarter.
B: Test first - run a 6-week paid conversion pilot to a matched segment, then decide go / no-go on the real rate.
C: No-go - stay on the current paid-trial motion.

Tree

Choice nodes are squares; chance nodes are circles. The load-bearing uncertainty is the same in A and B - whether free-to-paid conversion lands high or low - but A bets on it blind while B buys the signal first. Values are annualized 12-month contribution in $K, net of infra, support, and cannibalization. Base rate for “high” is sourced, not guessed (see the probability note).

[Decision]
 |
 |--[A: Launch now] ---( conversion )-- p=0.35 high [base rate] --> +1200
 |                                      p=0.65 low  [base rate] --> -300
 |
 |--[B: Test first] -- pilot cost -80 --( pilot signal )-- p=0.35 "high" --> [commit] --> +1200
 |                                                         p=0.65 "low"  --> [do not launch] --> 0
 |
 |--[C: No-go] --> 0   (status-quo baseline, no chance node)

Outcome values

High-conversion launch: +1200 (durable self-serve motion pays for the infra and lifts paid pipeline).
Low-conversion launch: -300 (infra + support + some paid cannibalization, with too-thin paid uptake to cover it).
Pilot cost: -80 (6 weeks of build + run, paid before the signal arrives).
No launch: 0 (baseline).
Common unit: $K annualized contribution vs the status quo.
Incommensurable / unpriced: board optics of shipping (or not shipping) a visible “free tier”, and brand signal. Real, but left out of the arithmetic rather than given a fake dollar value - flagged here so they are weighed as judgment, not laundered into the EV.

Rollback (fold back, right to left)

Probability note: p(high) = 0.35 is a base rate from comparable self-serve B2B launches at this ACP, sourced with think-reference-class-forecasting, not invented inside the tree. The pilot in B is treated, for this illustration, as a clean read of which branch is true; in practice it is itself noisy, which only strengthens the case for testing.
Option A chance node: EV = 0.35 x 1200 + 0.65 x (-300) = 420 - 195 = +225.
Option B pilot signal node: EV = 0.35 x 1200 + 0.65 x 0 = +420, then subtract the pilot cost 80 -> +340. (The fold-back of B keeps the high branch and prunes the low branch because the pilot lets the decision wait - that option to not-launch after a “low” read is exactly what the test buys.)
Option C: 0.
Per-option EV: A = +225; B = +340; C = 0.

Recommendation

Chosen: B, Test first - EV +340, the highest of the three.
Path that produces it: run the paid pilot; commit to the full launch only on a “high” read, walk away on a “low” read. The pilot’s value is that it converts the blind -300 downside of A into an avoidable 0, for an 80 premium.

What-flips-it (sensitivity)

Most fragile input: p(high), the probability that free-to-paid conversion lands in the high branch.
Flip threshold: B beats A as long as p(high) is below 0.73; above that, the value of waiting no longer justifies the 80 pilot premium and Launch now (A) wins. We are at 0.35, far on the test-first side - the recommendation is robust. (For completeness: A only beats No-go once p(high) > 0.20, so even committing blind is positive-EV here, but it leaves 115 of avoidable downside on the table versus testing.)

Ruin / risk flag

Ruin check: a public free-tier launch is hard to unwind - pricing and packaging are visible to the market and to existing customers, so the “-300” low branch is not a clean financial loss you can quietly reverse; it carries reputational and channel-conflict tails the dollar figure understates. That is not literal ruin (it does not end the company), but it is exactly the kind of asymmetric, hard-to-reverse downside that makes the option to test before committing worth more than its raw EV margin suggests. It is also why this decision earns a tree at all rather than a quick reversible call.
Risk attitude: Northwind is capital-constrained this year, so a mild risk aversion is a real preference, not a bias - it reinforces B (cap the downside) over A. Surfaced, not used to override the arithmetic, which already points the same way.

Note how the value is in pricing the uncertainty instead of arguing it: unaided, a strong model tends to debate “should we launch the free tier?” in prose and land on a hedge. The tree forces the one number that matters (the conversion base rate) into the open, shows that the right move is not launch-vs-no-launch at all but buy the signal first, and states exactly how good conversion would have to look (p > 0.73) to justify committing blind. Source the chance-node probability with think-reference-class-forecasting; if the call had been reversible and low-stakes, think-one-way-vs-two-way-door would have triaged it away before any tree was warranted.

Grounding: the full evidence dossier

What the research does and does not show, with graded sources

Evidence Dossier: Expected Value Decision Tree

The single source of truth for the expected-value-decision-tree skill. The SKILL.md, the sidecar (skill.meta.yml), and the eval cases all derive from this file. If a claim is not here, it does not belong in the skill. Research verdict: Build at governing tier P, a deliberate downgrade from the catalog’s M prior.


Skill	`thinking-framework-skills.expected-value-decision-tree` (installable name `think-expected-value-decision-tree`)
Family	decision-and-option-evaluation
Evidence tier	P governing (honest read S/P, capped at P; flags: false-precision risk, single-shot-ruin)
Confidence	Moderate that an explicit EV tree makes the probability assumptions inspectable and the arithmetic checkable; low that any specific decision-outcome gain transfers to agents
Status	draft (admitted from the v0.5.0 discovery shortlist)

1. The mechanism (what actually does the work)

Most consequential choices turn on outcomes the decider does not control. The default response is to argue the options in prose and decide on a hunch. Expected-value (EV) analysis refuses that: it prices the uncertainty. EV weighs each possible outcome by its probability and its magnitude, then sums - EV = sum over outcomes of (probability times value). A decision tree is the structure that makes this tractable when the outcome depends on a sequence of choices and chance events.

The move has two named parts:

Lay out choice nodes and chance nodes. The tree alternates two kinds of node: choice nodes (drawn as a square - branches the decider controls) and chance nodes (drawn as a circle - branches nature controls, each carrying a probability that sums to one across the fan). Values sit at the leaves. The chance node is the defining, load-bearing ingredient: an explicit, probability-weighted representation of outcomes the decider does not control. That is what separates this move from every deterministic comparison - a plain pros-and-cons list or a weighted-criteria matrix scores options on attributes you can assert, while a decision tree forces you to say “with probability p the world does X, worth v” and lets those probabilities, not just your preferences, drive the choice.
Roll back (fold back), right to left. At each chance node, replace the fan with its expected value; at each choice node, keep the branch with the best expected value and prune the rest. What survives the rollback is the option with the highest expected value and the path that produces it.

The output is a decision tree with rolled-back expected values, the chosen branch, and a what-flips-it (sensitivity) note that names the single probability or value which, if it moved past a stated threshold, would reverse the decision. The deliverable is the tree plus the rollback plus the sensitivity note, never a bare EV number presented as the answer. Soft or sourced-by-guess inputs are flagged at the node where they enter, not laundered into the arithmetic. Its durable virtue is auditability: the tree exposes exactly which probability and which value drove the answer.

A second, optional layer is expected utility: replacing raw value with a utility function that bends the value scale to capture risk attitude (the painfulness of a large loss, the diminishing worth of a large gain). EV maximization is the risk-neutral special case of expected-utility maximization. The skill-level move is the EV tree; the utility layer is the principled extension for when variance and ruin matter, not just the average - and the variance / risk-of-ruin dimension is exactly what the “when it fails” walls below protect.

2. Lineage

The normative spine: John von Neumann and Oskar Morgenstern, Theory of Games and Economic Behavior (1944), which axiomatized expected-utility maximization, and Leonard J. Savage, The Foundations of Statistics (1954), which extended it to subjective probability. If you accept the rationality axioms, expected-utility maximization follows.
The applied decision-tree tradition: Ronald A. Howard coined “decision analysis” in 1966; Howard Raiffa’s Decision Analysis: Introductory Lectures on Choices under Uncertainty (1968) is the accessible founding text for the trees and the rollback; Keeney and Raiffa, Decisions with Multiple Objectives (1976) extended it to multi-attribute utility. In medicine, Stephen Pauker and Jerome Kassirer brought decision trees into clinical practice (the threshold approach, NEJM 1980).
The descriptive counter-tradition (the limits): Maurice Allais (1953), whose paradox is the best-known violation of the expected-utility independence axiom, and Daniel Kahneman and Amos Tversky, “Prospect Theory” (Econometrica, 1979), which models the systematic ways people depart from EV.
Naming and IP: “decision tree”, “expected value”, and “decision analysis” are generic descriptive terms in common use. No trademark and no attribution required beyond crediting the originators above. This entry is documented descriptively and is not flagged as branded.

3. What the evidence shows, and what it does NOT show

The honest read is split, S/P, and capped at the conservative governing grade of P. The catalog’s prior M tag is overturned here on a conservative read of what the strong research actually covers. Both the split and the cap matter.

The split, stated plainly. There are two very different evidentiary claims hiding under “expected-value / decision-tree”, and they grade differently:

EV / expected-utility maximization is the normatively correct decision rule given coherent probabilities and utilities. This is genuinely strong - it rests on the axiomatic foundations of von Neumann and Morgenstern (1944) and Savage (1954), among the most consequential results in decision theory. That is an S-tier mathematical result.
Building a decision tree and computing EV makes a real decider’s decisions better than the alternative they would otherwise use. This is the claim a skill actually makes, and the support for it is practitioner-level and transferred, not strong. The richest application base is clinical decision analysis (Howard, 1966; Raiffa, 1968; Pauker and Kassirer, 1980). That base is a respected modeling methodology with candidly stated limits, not a body of controlled evidence that using the tool beats not using it.

Why the governing grade is the conservative half (P), not the optimistic half (S/M). The S-tier work measures the wrong thing for this skill: it establishes that EV-maximization is the right rule given the inputs, and (via Allais and prospect theory) that humans deviate from it. Neither is evidence that an agent who draws an EV tree decides better than one who uses, say, a weighted-criteria matrix or a base-rate anchor. The one nameable comparative finding is mixed and indirect: Mhaskar et al. (2014) found decision-analysis results concorded with matching systematic reviews of RCTs in 73% of cases (27/37), and with single RCTs in only 50% - the model tracked the trial evidence only when fed comprehensive inputs, and even then disagreed a quarter of the time. That bounds the move’s reliability; it does not lift it to S. Per this library’s rule, when the honest read is split and the strong evidence is for a sibling claim (the normative axioms) rather than for this move improving an agent’s decisions, the tier emitted is the conservative one: P, capped from an S/P split. Calling it M or S would launder the axioms’ robustness into a claim about the tool’s effectiveness that the record does not make.

What the record does NOT support, and the excluded number. No controlled study locatable shows that constructing an EV tree improves decision outcomes versus a cheaper rule, for humans or for AI agents. No specific effect-size figure for “decision-tree analysis improves decisions by N%” traces to a nameable primary source; none is asserted here. The Mhaskar concordance figures (73% / 50%) are the only quantitative claims in this dossier and both are sourced to that paper. The clinical-decision-analysis literature is itself explicit that models oversimplify and that outputs “should only be used as a reference… and are not guaranteed as absolute” (Bae 2014).

4. Transferred-evidence flag (required honesty for this library)

Transferred-evidence is true. Every effectiveness datum above is from human deciders - clinicians, traders, experimental subjects. None studies an EV tree produced by or with an AI agent. The evidence is transferred from human contexts and not validated for AI-augmented use; for an agent the realistic value is mechanical - force the probabilities to be named, compute the rollback without arithmetic slips, run sensitivity - and even that is unproven, which is a second reason the conservative P stands. Treat the AI value as: the agent makes the price-the-uncertainty pass cheap and disciplined, refuses to launder guessed inputs, holds the chance-node probabilities to a stated source, and enforces the ruin / risk-attitude check - benefits that do not depend on any unproven outcome claim.

5. When it works / when it fails (drives the eval negative cases and “When NOT to Use”)

Works best when:

A decision genuinely hinges on uncertain outcomes you can put rough, sourceable probabilities on (investing under uncertain payoffs, go/no-go on a project with a real failure probability, clinical “treat / test / wait” choices).
The structure is sequential - a choice now opens chance events that open later choices (“test first or commit?”).
The stakes justify making the probability assumptions explicit and inspectable, so a disagreement becomes a disagreement about a named number rather than a clash of intuitions.

Fails or misleads when (poor-fit / anti-patterns):

The probabilities and values are guessed and then trusted. A tree renders fabricated inputs in the authoritative grammar of arithmetic - the central failure mode: false precision. A number with no defensible source does not become trustworthy by being multiplied. Where the probability is the hard part, source a base rate with think-reference-class-forecasting instead of inventing one inside the tree.
The decision is a one-shot with intolerable downside. EV is an average over many independent repetitions; the law of large numbers guarantees convergence across many bets, not on the single bet in front of you. A positive-EV gamble that includes a small probability of ruin is the wrong call for a one-time decision. The criterion there is risk of ruin or a risk-averse utility, not raw EV; treating the average as the answer is a category error. The worksheet must run the ruin check before recommending.
It is mistaken for descriptive truth. EV is normative (what a coherent decider should do given those numbers), not a description of good judgment. People predictably violate it (Allais 1953; prospect theory, Kahneman and Tversky 1979) via the certainty effect and nonlinear probability weighting, and some of those deviations are real risk preferences the tree must surface, not override.
The outcome space cannot be enumerated or priced. Deep uncertainty (you cannot list the outcomes, let alone probability them) and incommensurable values that resist a common scale both break the rollback and produce tidy-but-fictional EVs.
The call is reversible and low-stakes. A two-way door does not need a tree; building one is its own small over-process. Triage with think-one-way-vs-two-way-door first.

6. Output artifact

The skill must emit a decision tree with rolled-back expected values, not prose: the decision in one line; the options; the tree with choice nodes (square) and chance nodes (circle), each chance fan’s probabilities summing to one with a source per probability; the outcome values in a common unit (incommensurables noted); the rollback shown right to left with explicit arithmetic (chance node to EV, choice node to best branch); the recommendation (chosen option, its EV, and the path that produces it); the what-flips-it note (the single probability or value plus the threshold at which the choice flips); and a ruin / risk flag line. A short summary sits above the tree. The deliverable is never a bare EV number presented as the answer.

7. Sources

John von Neumann and Oskar Morgenstern, Theory of Games and Economic Behavior (Princeton University Press, 1944). The axiomatic foundation of expected-utility maximization. Foundational; S-tier mathematics (normative, not effectiveness evidence for the tool).
Leonard J. Savage, The Foundations of Statistics (Wiley, 1954). Subjective expected utility; the normative case for acting on personal probabilities. Foundational.
Howard Raiffa, Decision Analysis: Introductory Lectures on Choices under Uncertainty (Addison-Wesley, 1968). The founding applied text for decision trees and rollback. Practitioner / foundational.
Jong-Myon Bae, “The clinical decision analysis using decision tree,” Epidemiology and Health 36 (2014): e2014025. Describes the four-stage tree method and states the limits explicitly - oversimplification (e.g. QALY indices), unquantifiable factors (harm, cost, patient preference), and that results “are not guaranteed as absolute.” Teaching article; P. https://pmc.ncbi.nlm.nih.gov/articles/PMC4251295/
Rahul Mhaskar et al., “Concordance between decision analysis and matching systematic review of randomized controlled trials in assessment of treatment comparisons: a systematic review,” BMC Medical Informatics and Decision Making (2014). Decision-analysis results concorded with matching RCT systematic reviews in 73% (27/37) of cases, and with single RCTs in only 50% - the one nameable comparative finding, and it bounds reliability. Survey / comparative; P. https://pmc.ncbi.nlm.nih.gov/articles/PMC4107557/
Maurice Allais (1953), “Le comportement de l’homme rationnel devant le risque” (Econometrica); the Allais paradox - systematic violation of the expected-utility independence axiom. The canonical evidence that EV is normative, not descriptive. Foundational (descriptive critique).
Daniel Kahneman and Amos Tversky, “Prospect Theory: An Analysis of Decision under Risk,” Econometrica 47 (1979): 263-291. Models the certainty effect and nonlinear probability weighting - the predictable ways people depart from EV. S-tier descriptive research; bounds the tool’s normative claim.
Stephen G. Pauker and Jerome P. Kassirer, “The Threshold Approach to Clinical Decision Making,” New England Journal of Medicine 302 (1980): 1109-1117. Foundational application of decision-tree reasoning to clinical choice. Practitioner / foundational.

Verification status: The mechanism descriptions (von Neumann and Morgenstern, Savage, Raiffa, Bae) and the normative-vs-descriptive split (Allais; Kahneman and Tversky) are well-attested and mutually consistent. The Bae (4) and Mhaskar et al. (5) findings were read via the PMC-hosted articles; the Mhaskar concordance figures (73% / 50%) are reported as that paper’s results, not an independently audited constant. None of these gaps changes the conservative governing grade of P.

Excluded on the evidence rule: no specific “decision-tree analysis improves decisions by N%” figure traces to a nameable primary source; none is counted toward the grade. The only quantitative claims used are Mhaskar et al.’s 73% / 50% concordance figures.

Was this page helpful?

Thinking Framework Skills v0.8.0 · 56 frameworks