Skip to content

Natural-Frequency Bayesian Framing

A conditional probability that feels impossible as a percentage becomes obvious once you count people. Imagine 1,000 people, a condition that affects 1%, and a test that is roughly 90% accurate:

graph TD
  A["1,000 people"] --> B["10 have it<br/>(1% base rate)"]
  A --> C["990 do not"]
  B --> D["9 test positive<br/>true positives"]
  C --> F["about 89 test positive<br/>false positives"]
  D --> H["About 98 positive tests,<br/>only 9 are real:<br/>roughly 9% truly have it"]
  F --> H
  classDef has fill:#fde7e7,stroke:#dc2626,color:#7f1d1d
  classDef hasnt fill:#e3f5e8,stroke:#16a34a,color:#14532d
  classDef ans fill:#e6e9ff,stroke:#6366f1,color:#1e1b4b,font-weight:bold
  class B,D has
  class C,F hasnt
  class H ans

Illustrative numbers. The point is the move: stated as natural frequencies (9 of 98), the base-rate trap that “a positive test means I probably have it” is visibly wrong.

People - including experts - reason badly about conditional probabilities stated as percentages, because they neglect the base rate. Re-expressing the same facts as natural frequencies over a concrete population makes the correct answer nearly visible: “Out of 1,000, 10 have it; 9 of those test positive; of the 990 without it, ~89 also test positive; so of ~98 positives, only 9 truly have it - about 9%.” The format does the work by keeping the base rate in the counts. The output is a natural-frequency breakdown. Honest constraint: the base rate and hit rates must be real - the format makes correct reasoning tractable, it does not invent the inputs.

  • Interpreting a test or screening result (medical, fraud, security, lead-scoring, A/B).
  • Any “given a positive signal, what is the actual probability the thing is true?” question.
  • Communicating risk to others so they do not over-read a positive.
  • When you do not have real input rates and would have to invent them.
  • When there is no conditional-probability structure to the question.
  • For general project forecasting (use reference-class forecasting).
  • When a single point estimate is wanted and the base-rate structure is irrelevant.

When asked to reason about a conditional probability, follow these steps:

  1. State the question precisely. What posterior is being asked - usually P(condition | positive signal). Distinguish it from P(positive | condition), which people confuse it with.
  2. Gather the real inputs. The base rate, the true-positive (hit) rate, and the false-positive rate. If any is unknown, say so and stop or clearly flag the estimate as illustrative - do not fabricate numbers.
  3. Build a frequency tree over a concrete population. Pick a round number (e.g., 1,000). Work out: how many have the condition; of those, how many test positive; of those without, how many also test positive.
  4. Compute the posterior as true positives / all positives, and state it plainly.
  5. Name the wrong intuition it corrects. State the answer most people give (usually near the hit rate) and why it is wrong (base-rate neglect).
  6. Emit the natural-frequency breakdown per references/TEMPLATE.md.

Use the template in references/TEMPLATE.md. The deliverable is the frequency tree, the posterior, and the plain-language meaning, not a bare percentage.

Before finalizing, verify:

  • The question distinguishes P(condition | positive) from P(positive | condition).
  • The base rate, true-positive rate, and false-positive rate are real (or missing data is flagged, not invented).
  • A frequency tree over a concrete population is shown.
  • The posterior is computed as true positives / all positives.
  • The common wrong intuition (base-rate neglect) is named.
  • The output is the breakdown artifact, not a bare number.

Tier S. Presenting conditional-probability information as natural frequencies substantially improves Bayesian-inference accuracy - accuracy on these problems rises from roughly 10% to 50-90% with the same facts in frequency format (Gigerenzer & Hoffrage 1995; Sedlmeier & Gigerenzer 2001), replicated across populations including physicians. The format does not supply the inputs; real rates are required. Evidence is from human reasoners, transferred to AI use, not AI-validated. Full grading: evidence/dossier.md.

See references/EXAMPLE.md for a completed breakdown.

A full worked run (the shared Northwind scenario)

Natural-Frequency Breakdown - Worked Example

Section titled “Natural-Frequency Breakdown - Worked Example”

A completed run of think-natural-frequency-bayesian, on the shared Northwind scenario. This is the quality bar a generated breakdown should meet.

Northwind is a B2B SaaS. Sales treats every account its new model flags as “high-intent” as if it almost certainly is. This skill checks what a flag actually means.


  • Posterior asked: P(truly high-intent | flagged “high-intent”) = ?
  • (This is NOT the model’s 80% sensitivity, which is P(flagged | truly high-intent). Sales is confusing the two.)
  • Base rate P(high-intent): 5% - source: historical share of accounts that became opportunities.
  • True-positive (hit) rate P(flagged | high-intent): 80% - source: model validation set.
  • False-positive rate P(flagged | not high-intent): 10% - source: model validation set.
  • Of 1,000: 50 are truly high-intent; 950 are not.
    • Of the 50 high-intent: 40 are flagged (80%).
    • Of the 950 not high-intent: ~95 are also flagged (10%).
  • Total flagged: 40 + 95 = 135.
  • P(high-intent | flagged) = 40 / 135 = ~30%.

What it means / the wrong intuition it corrects

Section titled “What it means / the wrong intuition it corrects”
  • Plain language: when the model flags an account, it is truly high-intent only about 30% of the time - so roughly 2 of every 3 flagged accounts are not.
  • Common wrong answer: ~80% (people read the flag as the sensitivity). Wrong because it ignores that high-intent accounts are rare (5% base rate), so the many false positives from the large low-intent pool swamp the true positives.

Note: the value is converting “the model is 80% accurate” into “a flag is right ~30% of the time,” which completely changes how Sales should treat flags (triage, not trust). The honest constraint held: the three input rates came from real validation data, not invented numbers - without them the right output would have been “we cannot compute this yet.”

What the research does and does not show, with graded sources

Evidence Dossier: Natural-Frequency Bayesian Framing

Section titled “Evidence Dossier: Natural-Frequency Bayesian Framing”

Single source of truth for the natural-frequency-bayesian skill. The SKILL.md, sidecar, and evals derive from this. A strong-evidence anchor.

Skillthinking-framework-skills.natural-frequency-bayesian (installable name think-natural-frequency-bayesian)
Familyreasoning-clarity
Evidence tierS (well-replicated)
ConfidenceHigh - the format effect is one of the most robust findings in judgment research
Statusdraft (authored 2026-05-31 from the discovery corpus)

1. The mechanism (what actually does the work)

Section titled “1. The mechanism (what actually does the work)”

People - including experts - reason badly about conditional probabilities when they are stated as percentages or probabilities. Given “the test is 90% sensitive, the condition affects 1%, the false-positive rate is 9%,” most people (including doctors) wildly overestimate the chance that a positive result means the condition is present, because they neglect the base rate.

Re-expressing the identical information as natural frequencies over a concrete population makes the correct answer almost visible: “Out of 1,000 people, 10 have the condition; of those, 9 test positive. Of the 990 without it, about 89 also test positive. So of ~98 positives, only 9 truly have it - about 9%.” The format does the work: it preserves the base rate in the counts instead of hiding it in a rate. Accuracy on these problems jumps from roughly 10% to 50-90% when the same facts are presented as natural frequencies.

  • Gigerenzer & Hoffrage (1995) on how natural-frequency formats improve Bayesian reasoning; Sedlmeier & Gigerenzer (2001) on teaching it; widely applied in medical decision-making and risk communication.

No trademark. Named descriptively.

3. What the evidence shows, and what it does NOT show

Section titled “3. What the evidence shows, and what it does NOT show”

Strongly supported (the S): presenting conditional-probability information as natural frequencies substantially improves the accuracy of Bayesian inference (the ~10% to ~50-90% jump is well-replicated across studies and populations, including physicians).

What it does NOT do: it does not invent the inputs. The base rate, the true-positive rate, and the false-positive rate must be real; the format makes correct reasoning from those numbers tractable, it does not supply them. And it applies only where there is genuine conditional-probability structure - it is not a general forecasting tool (that is reference-class forecasting).

The evidence is from human reasoners. Transferred to AI use; the model can do the arithmetic, but the value is the same as for humans plus communication: it forces the base rate to be used (countering base-rate neglect in the model’s own answers and in how it explains risk), and it produces an inspectable frequency breakdown rather than a bare percentage. It still must refuse to fabricate the input rates.

Works best when: interpreting a test or screening result (medical, fraud, security, lead-scoring, A/B); any “given a positive signal, what is the real probability” question; communicating risk to others.

Fails or misleads when (poor-fit / anti-patterns):

  • No real input rates - inventing the base rate or hit rate is worse than admitting they are unknown (the central failure for an AI).
  • Ignoring the base rate (base-rate neglect) - the very error the method exists to fix.
  • Confusing P(positive | condition) with P(condition | positive) - state which is which.
  • No conditional-probability structure (then this is the wrong tool).
  • General project forecasting (use reference-class forecasting).

A natural-frequency breakdown: the question; the inputs (base rate, true-positive rate, false-positive rate) with sources or an explicit missing-data flag; a frequency tree over a concrete population (e.g., 1,000); the computed posterior; and a plain-language statement of what it means plus the common wrong intuition it corrects.

  1. Gigerenzer, G., & Hoffrage, U. (1995) - improving Bayesian reasoning with natural-frequency formats.
  2. Sedlmeier, P., & Gigerenzer, G. (2001) - teaching Bayesian reasoning (accuracy gains).

Verification status: the natural-frequency format effect and the rough 10%->50-90% accuracy gain are well-attested; confirm exact figures against the papers before a public quantified claim. The “must use real input rates” constraint is the honest core for AI use.

Thinking Framework Skills v0.3.0 · 38 frameworks