Bibliography

This aggregates the graded sources from every skill’s evidence dossier, so any claim in the library can be traced to its grounding. Tiers are explained in the evidence model. Where evidence is transferred from human studies (not validated for AI use), the dossier says so.

Abstraction Laddering P

Hayakawa, S. I. (1939). Language in Thought and Action - the ladder of abstraction (conceptual root).
Getzels, J. W., & Csikszentmihalyi, M. (1976). The Creative Vision: A Longitudinal Study of Problem Finding in Art - problem finding and how problem formulation shapes outcomes.
Nutt, P. C. (work on decision-making failures, e.g. Why Decisions Fail, 2002) - decisions that fail from poor problem definition / premature framing.
Wedell-Wedellsborg, T. (2017). “Are You Solving the Right Problems?” Harvard Business Review - reframing practice; framing precedes good solutions.
Interaction Design Foundation and design-facilitation toolkits - “abstraction laddering” as the why-up / how-down exercise to find the right altitude for a problem statement.

Verification status: Hayakawa (1) and Wedell-Wedellsborg (4) are well-attested. The problem-finding and decision-failure citations (2, 3) support framing in general and are drawn from secondary synthesis; they should be confirmed against the primary works before any public-facing claim, and they must not be presented as validating the ladder technique specifically. Citation 5 is practitioner literature, not peer-reviewed evidence. These are safe to use inside this dossier because the dossier’s job is to be honest about exactly this gap.

Affinity Mapping P

Kawakita, Jiro (1967). Hassoso - the original KJ method for synthesizing field data bottom-up.
Mizuno, Shigeru, ed. (1988). Management for Quality Improvement: The Seven New QC Tools - affinity diagram in quality management.
Nielsen Norman Group, “Affinity Diagramming: Collaboratively Sort UX Findings & Design Ideas” - the standard UX-practice description of the technique.
Scupin, R. (1997). “The KJ Method: A Technique for Analyzing Data Derived from Japanese Ethnology.” Human Organization 56(2):233-237 - documents the method’s anthropological origin and use.

Verification status: citations 1-2 are the standard historical attributions and well-attested in the discovery corpus; the exact NN/g phrasing (citation 3) and the Scupin page reference (citation 4) were drawn from a secondary research synthesis and should be confirmed against the primary sources before they appear in any public-facing README. They are safe to use inside this dossier because the dossier’s job is to be honest about exactly this uncertainty. The “limited controlled evidence” claim in section 3 is a deliberate statement of absence: it should stay phrased as “no strong controlled evidence found,” not as a positive finding.

After Action Review S

US Army, TC 25-20 - the original After-Action Review guide.
Tannenbaum, S. I., & Cerasoli, C. P. (2013) - meta-analysis of debriefs and performance (ES ~0.79).

Verification status: the Tannenbaum & Cerasoli meta-analysis and its effect-size are well-attested; confirm the exact figure against the paper before a public quantified claim. The “structure is the active ingredient” point is the honest core.

Argument Mapping S

van Gelder, T. (2015) and the Reason!/Rationale argument-mapping studies - effect sizes ~0.7-0.85 for critical-thinking gains.
Toulmin, S. (1958) - the model of argument structure (claim, grounds, warrant, rebuttal) that underpins mapping.

Verification status: the van Gelder effect-size range is well-attested for course-length instruction; keep the “single map != course effect” caveat visible in any public claim. Do not attach the 0.7-0.85 figure to a one-shot use.

Assumption Reversal P

Lateral-thinking / creative-problem-solving practice on assumption reversal and assumption-busting (distinct from inversion).

Verification status: the technique is well-attested in creativity practice; treat effectiveness as practitioner-level. Do not present reversed assumptions as validated options.

Authentic Dissent S

Nemeth, C. et al. (2001) - dissent and decision quality; role-played devil’s advocacy does not replicate authentic dissent.
Nemeth, C. (2018) - In Defense of Troublemakers: The Power of Dissent in Life and Business.

Verification status: the authentic-vs-role-played finding is well-attested and is the load-bearing, honesty-defining result for this skill. Do not let the skill present the model’s own contrarian output as authentic dissent.

Backcasting P

Lovins, A. B. (1976). “Energy Strategy: The Road Not Taken?” Foreign Affairs, 55(1) - early backwards-looking energy-path analysis.
Robinson, J. B. (1982). “Energy backcasting: A proposed method of policy analysis.” Energy Policy, 10(4):337-344 - names and defines backcasting.
Robinson, J. B. (1990). “Futures under glass: A recipe for people who hate to predict.” Futures, 22(8):820-842 - backcasting as normative alternative to forecasting.
Dreborg, K. H. (1996). “Essence of backcasting.” Futures, 28(9):813-828 - when backcasting is appropriate vs forecasting.
Holmberg, J., & Robert, K-H. (2000). “Backcasting from non-overlapping sustainability principles.” International Journal of Sustainable Development & World Ecology, 7(4) - The Natural Step operationalization.

Verification status: citations 1-5 are standard and well-attested references for backcasting’s lineage, drawn from the discovery-corpus synthesis. The exact page numbers and the framing of each finding should be confirmed against the primary papers before they appear in any public-facing README. They are safe to use inside this dossier because the dossier’s job is to be honest about exactly this uncertainty, and the evidence tier (P) is deliberately conservative.

Belief-Update Routine P

Edwards, W. (1968), “Conservatism in human information processing,” in B. Kleinmuntz (ed.), Formal Representation of Human Judgment, Wiley - establishes systematic under-updating relative to Bayes.
Atanasov, P., Witkowski, J., Ungar, L., Mellers, B., & Tetlock, P. (2020), “Small steps to accuracy: Incremental belief updaters are better forecasters,” Organizational Behavior and Human Decision Processes 160:19-35 - incremental evidence-weighted updating tracks forecasting accuracy.
Tappin, B. M., Pennycook, G., & Rand, D. G. (2020), work relating analytic / actively open-minded thinking to more normative belief updating - a dispositional cousin.
Tetlock, P., & Gardner, D. (2015), Superforecasting - recorded probabilistic predictions plus scoring and frequent small updates as the basis for calibration (scored-regime evidence).

Verification status: citations 1, 2, and 4 are standard and well-attested in the judgment-and- decision-making and forecasting literature. Citation 3 (Tappin/Pennycook/Rand) is a dispositional correlate, cited as lineage, not as a test of the routine. The direct experimental evidence for the routine itself is sparse and weak (small, mixed studies, no robust controlled effect), so the skill takes the conservative reading and advertises no effect size. The load-bearing caveat - that the typical non-resolving use sits outside the scored-forecasting regime where the supporting evidence was gathered - does not depend on any single direct study.

Brainwriting S

Rohrbach, B. (1968) - Method 6-3-5 (brainwriting).
Delbecq, A., & Van de Ven, A. (1971) - Nominal Group Technique.
Diehl, M., & Stroebe, W. (1987) - production blocking in brainstorming.
Mullen, B., Johnson, C., & Salas, E. (1991) - meta-analysis: nominal groups outperform interacting brainstorming groups.

Verification status: these are well-attested, frequently-cited findings. The “S” grade is justified for the human evidence; keep the AI-adaptation caveat (mechanism transferred, effect size not measured for AI) visible in any public claim.

Causal Loop Diagrams M/P

Sterman, J. (1989). “Misperceptions of Feedback in Dynamic Decision Making.” Management Science. (Pool A - the failure.)
Sweeney, L. B., & Sterman, J. (2000). “Bathtub Dynamics: Initial Results of a Systems Thinking Inventory.” System Dynamics Review. (Pool A - the failure, in educated subjects.)
“Influence of Causal Loop Diagrams on Systems Thinking” (2025). ScienceDirect, article S2451958825000284. (Pool B - CLD-specific, conditional effect.)
Schaffernicht, M. (2010). “Causal Loop Diagrams: An Analysis of the Reliability of an Inference Tool.” Systems Research and Behavioral Science. (Counter-evidence - subjectivity and non-reproducibility; cited against inflation.)
Sterman, J. (2000). Business Dynamics; Meadows, D. (2008). Thinking in Systems. (Lineage and CLD notation.)

Verification status: Pool A (Sterman 1989; Sweeney & Sterman 2000) is well-attested and is the same misperception base the stocks-and-flows dossier banks - it does NOT by itself prove CLDs work. The 2025 CLD study (S2451958825000284) reports a conditional effect; confirm the exact conditions and any effect size from the source before quoting a number - none is quoted here. Schaffernicht (2010) is cited deliberately as a reliability caution. No effect size is stated in this dossier because none has been verified against the source; do not add one without checking. The honest scope - “externalize and sign loop structure,” not “predict behavior” - is the core caveat.

Concept Mapping M/P

Novak, J. D., & Canas, A. J. (2008). The Theory Underlying Concept Maps and How to Construct Them. IHMC Technical Report (Florida Institute for Human and Machine Cognition). - definition: labeled nodes, labeled linking phrases forming propositions, cross-links as the integrative marker.
Davies, M. (2011). “Concept mapping, mind mapping and argument mapping: what are the differences and do they matter?” Higher Education, 62(3), 279-301. - the three techniques are distinct.
Nesbit, J. C., & Adesope, O. O. (2006). “Learning With Concept and Knowledge Maps: A Meta-Analysis.” Review of Educational Research, 76(3), 413-448. - 55 studies, 67 effect sizes, n = 5,818; outcome = human learning/retention.
Schroeder, N. L., Nesbit, J. C., Anguiano, C. J., & Adesope, O. O. (2018). “Studying and Constructing Concept Maps: a Meta-Analysis.” Educational Psychology Review, 30, 431-455. - 142 effect sizes, n = 11,814; overall g = 0.58; constructing g = 0.72 > studying g = 0.43; outcome = human learning/retention.
Farrand, P., Hussain, F., & Hennessy, E. (2002). “The efficacy of the mind map study technique.” Medical Education, 36(5), 426-431. - cited only to mark the excluded unlabeled mind-mapping (Buzan) method as X-tier.

Verification status: The Novak & Canas definition and the Davies distinction are well-attested and safe to cite. The Nesbit & Adesope and Schroeder et al. study/effect-size figures (counts, n, g = 0.58 / 0.72 / 0.43) are reported as the published meta-analytic values; confirm the exact figures against the primary articles before any public-facing quantified claim, and note that they are cited here as human-retention findings that do NOT transfer to an AI agent, never as evidence of agent benefit. The core caveat - large base, wrong outcome for transfer, therefore M/P not S - is the load-bearing honesty of this dossier.

Decision Journal P

Fischhoff, B. (1975), “Hindsight is not equal to foresight,” J. Experimental Psychology: Human Perception and Performance 1(3):288-299 - establishes hindsight bias.
Roese, N. J., & Vihari, K. (2012), “Hindsight bias,” Perspectives on Psychological Science 7(5):411-426 - review of the robustness of the effect.
Duke, A. (2018), Thinking in Bets - decision journaling, separating decision quality from outcome quality, practitioner source.
Tetlock, P., & Gardner, D. (2015), Superforecasting - recorded probabilistic predictions plus scoring as the basis for calibration.
Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982), “Calibration of probabilities,” in Kahneman, Slovic & Tversky (eds.), Judgment under Uncertainty - calibration literature.

Verification status: citations 1-2 and 4-5 are standard and well-attested. Citation 3 (Duke) and the Parrish/Farnam Street decision-journal templates are practitioner sources, credible but experience-based; they are cited as lineage and as the origin of the practice, not as controlled evidence of outcome improvement. The “no strong controlled evidence that journaling improves outcomes” statement in section 3 reflects the absence of such a study in the discovery corpus as of authoring; it should be re-checked before any public-facing claim, but the honest default is to not claim outcome improvement.

Decision Option Review P

UK Government, MCDA guidance (multi-criteria decision analysis as a support for, not a replacement of, judgment).

Verification status: the UK MCDA guidance and the “support not replace judgment” framing are well-attested. Do not present weighted totals as proof of the right choice.

Evidence vs Inference Sort P

Facione, P. A. (1990). Critical Thinking: A Statement of Expert Consensus (the Delphi Report) - evaluation vs inference as distinct skills.
Structured analytic techniques literature (intelligence analysis) - separating evidence from judgment; key-assumptions checks.
(Adjacent) van Gelder and others on argument mapping effect sizes - supports the broader critical-thinking competence, not this exact technique.

Verification status: the Facione/Delphi distinction is well-attested. The argument-mapping effect sizes are adjacent evidence and should not be presented as evidence for this technique in any public claim; they support the family, not the sort.

Far-Analogy Ideation S

Gentner, D. - structure-mapping theory; Gentner & Smith (2013).
Gick, M., & Holyoak, K. (1980) - analogical problem solving (radiation/fortress).
Dahl & Moreau - far analogies and originality in new-product ideation.

Verification status: the “distant analogies -> more original solutions” and “surface vs structural mapping” findings are well-attested in the analogy literature. Keep the surface-mapping failure mode prominent; it is what separates the evidenced method from a party trick.

Fermi Estimation M/P

MacGregor, D. G., & Armstrong, J. S. (2007). “Judgmental Decomposition: When Does It Work?” Decision Sciences (study of when decomposing an estimate into parts improves accuracy; benefit concentrated on extreme/uncertain quantities).
MacGregor, D. G. (2001). “Decomposition for judgmental forecasting and estimation.” In J. S. Armstrong (ed.), Principles of Forecasting. Kluwer.
Fermi-problem tradition / quantitative-reasoning and case-interview pedagogy (the “within an order of magnitude” field lore) - practitioner, not controlled.
Statistical argument: a product of independent factors is approximately log-normal; the geometric mean of independent over/under estimates cancels (the cross-factor error-cancellation premise).

Verification status: the existence of a conditional decomposition benefit (present for extreme/uncertain quantities, absent or negative for ordinary ones) and the independence sensitivity are the defensible claims and set the M/P grade. Specific effect-size numbers (e.g. “error factor 99 vs 3”, “42% reduction”) are deliberately omitted as unverifiable to a primary source. The honest scope - “directional help for build-from-factors magnitudes under an independence condition, on a thin base, human-subject not AI-validated” - is the core caveat.

Framework Advisor M/C

Verified in a 5-cluster web-verification pass (2026-06-01); reliability noted per item.

The advisor’s own basis (structured-method value):

Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). “Clinical versus mechanical prediction: A meta-analysis.” Psychological Assessment 12(1):19-30. (primary; S; narrow scope)
Dawes, R. M. (1979). “The robust beauty of improper linear models in decision making.” American Psychologist 34(7):571-582. (primary; S; narrow scope)
Meehl, P. E. (1954). Clinical versus Statistical Prediction. Univ. of Minnesota Press. (primary; foundational)
Lovallo, D., & Sibony, O. (2010). “The Case for Behavioral Strategy.” McKinsey Quarterly. (field study, 1,048 decisions; M; correlational, not peer-reviewed)
Kahneman, D., Lovallo, D., & Sibony, O. (2011). “Before You Make That Big Decision.” Harvard Business Review 89(6):50-60. (primary; M; practitioner checklist)
Kahneman, D., Sibony, O., & Sunstein, C. R. (2021). Noise: A Flaw in Human Judgment. Little, Brown Spark. (synthesis/advocacy; P)
Milkman, K. L., Chugh, D., & Bazerman, M. H. (2009). “How Can Decision Making Be Improved?” Perspectives on Psychological Science 4(4):379-383. (peer-reviewed survey; honest about mixed debiasing evidence)

The heft calibrator (reversibility): 8. Bezos, J. P. 2015 Letter to Shareholders (Amazon; released spring 2016), “Invention Machine” section - the Type 1/Type 2, one-way/two-way door framing. (primary; P. NOTE: it is the 2015 letter, not the 2016 letter - a common citation error.) 9. Arrow, K. J., & Fisher, A. C. (1974), QJE 88(2):312-319 (quasi-option value); Bernanke (1983), QJE 98(1):85-106; McDonald & Siegel (1986), QJE 101(4):707-728; Dixit & Pindyck (1994), Investment under Uncertainty. (decision-theoretic shadow; M; supports the principle by analogy, not the specific calibrator.)

The contingency stance (method-fit): 10. Snowden, D. J., & Boone, M. E. (2007). “A Leader’s Framework for Decision Making.” Harvard Business Review 85(11):68-76 (Cynefin). (primary; C; sense-making model, proprietary, limited independent validation.) 11. Klein, G. A. (1998). Sources of Power: How People Make Decisions. MIT Press (RPD/NDM). (primary; M; field/observational.) Plus Klein et al. (1993); Mosier, Fischer, Hoffman & Klein (2018), Cambridge Handbook of Expertise (2nd ed., ch. 23).

The subtraction principle (over-application / choice overload): 12. Kaplan, A. (1964), The Conduct of Inquiry, p. 28; Maslow, A. H. (1966), The Psychology of Science, pp. 15-16. (aphorisms; C - origin of the concept, not proof.) 13. Luchins, A. S. (1942), “Mechanization in problem solving: The effect of Einstellung,” Psychological Monographs 54(6); Luchins & Luchins (1959). (M for the effect; supports the failure mode by analogy.) 14. Iyengar, S. S., & Lepper, M. R. (2000), JPSP 79(6):995-1006; Scheibehenne, Greifeneder & Todd (2010), J. Consumer Research 37(3):409-425 (near-zero mean effect); Chernev, Bockenholt & Goodman (2015), J. Consumer Psychology 25(2):333-358 (moderated). (choice overload is contested; C as a hard justification - soft motivation only.)

Verification status: all citations above were checked in a web-verification pass on 2026-06-01; primary vs reputable-secondary reliability is noted per item. Items 1-3, 5, 8, 10, 11, 14 were confirmed against primary or publisher records; items 9 and 13 are confirmed by citation metadata and standard secondary literature (treat exact internal wording as not line-verified). The honest split grade (M/C) is the load-bearing conclusion and is what the skill claims.

Futures Wheel P

Glenn, J. (1971). The Futures Wheel (foresight method).
UNICEF (2025) foresight primer; foresight and transport/policy literature describing the wheel’s use for second- and third-order consequences.

Verification status: Glenn attribution and foresight usage are well-attested. Do not attach forecast-accuracy claims; the method’s validation is qualitative.

Iceberg Model P

Senge, P. (1990). The Fifth Discipline: The Art and Practice of the Learning Organization - systems thinking, the discipline of surfacing mental models, levels of perspective.
Meadows, D. (1999). Leverage Points: Places to Intervene in a System; and Meadows, D. (2008). Thinking in Systems: A Primer - structures, feedback loops, and where intervention has the most leverage.
Waters Center for Systems Thinking (and related systems-education materials) - the iceberg as a teaching tool for moving from events to patterns to structures to mental models.

Verification status: the Senge and Meadows attributions are standard and well-attested; the specific framing of the four-level iceberg as a teaching diagram is drawn from systems-education practice and a secondary research synthesis and should be confirmed against primary curricula before any public-facing claim. Do not attach outcome-improvement or measured-leverage claims; the method’s validation is qualitative and pedagogical.

Issue Tree P

Minto, B. (1987/2009), The Pyramid Principle - origin of MECE and the grouping/decomposition discipline.
Rasiel, E. (1999), The McKinsey Way; Rasiel & Friga (2001), The McKinsey Mind - issue trees / logic trees as the standard structured-problem-solving device.
Hammond, Keeney & Raiffa (1999), Smart Choices - structuring a problem into its parts before solving.
Adjacent decomposition support: decision-analysis and divide-and-conquer estimation literatures (cited as plausibility, not as direct issue-tree evidence).

Verification status: citations 1-3 are standard and well-attested attributions for issue trees and MECE. The phrasing of the adjacent decomposition support in section 3 (citation 4) is drawn from secondary synthesis and should be confirmed against primary sources before any public-facing claim. The dossier states the tier as P precisely because the issue-tree-specific controlled evidence is thin; that honesty is the point.

Ladder of Inference Check P

Argyris, C. (1990). Overcoming Organizational Defenses; and Argyris’s action-science work on the ladder of inference.
Senge, P. (1990). The Fifth Discipline - popularized the ladder.
The Systems Thinker - practitioner write-up of the ladder and how to use it.

Verification status: Argyris/Senge attribution is well-attested. Treat effectiveness as practitioner-reported; do not attach a quantified effect in public-facing text.

Linear-Model Aggregation S

Meehl, P. (1954). Clinical versus Statistical Prediction.
Dawes, R. (1979). “The robust beauty of improper linear models in decision making.” American Psychologist.
Grove, W. et al. (2000). Meta-analysis of clinical vs mechanical prediction.
Kahneman, D., Sibony, O., & Sunstein, C. (2021). Noise - inconsistency as the mechanism.

Verification status: the Meehl/Dawes/Grove results are well-attested and frequently replicated; confirm the Grove 2000 meta-analytic specifics before a public quantified claim. The “only as good as its cues” and fairness caveats are mandatory honest framing.

Natural-Frequency Bayesian Framing S

Gigerenzer, G., & Hoffrage, U. (1995) - improving Bayesian reasoning with natural-frequency formats.
Sedlmeier, P., & Gigerenzer, G. (2001) - teaching Bayesian reasoning (accuracy gains).

Verification status: the natural-frequency format effect and the rough 10%->50-90% accuracy gain are well-attested; confirm exact figures against the papers before a public quantified claim. The “must use real input rates” constraint is the honest core for AI use.

One-Way vs Two-Way Door P

Bezos, J. - Amazon.com 1997 Letter to Shareholders (the original “Type 1 / Type 2” framing of decision reversibility).
Bezos, J. - Amazon.com 2015 and 2016 Letters to Shareholders (“one-way door / two-way door”; the argument that growing orgs over-apply Type 1 process to Type 2 decisions).

Verification status: the Bezos shareholder-letter attribution (citations 1-2) is well-attested and the framing is widely reproduced, but the exact letter-by-letter wording was drawn from secondary synthesis in the discovery corpus and should be confirmed against the primary letters before any public-facing claim quotes them directly. There is, by design, no outcome-effectiveness citation: section 3 states plainly that none exists, which is the honest position for a P-tier practitioner method.

Parallel Perspectives Review P

de Bono, E. (1985). Six Thinking Hats - the branded lineage (trademarked; mechanism used descriptively).
Moseley, D. et al. - Cambridge review of thinking-skills frameworks (sparse evidence base for the branded framework).
Practitioner/education studies reporting moderate effects of deliberate mode separation.

Verification status: the de Bono lineage and the “493% is uncited” point are well-attested and are deliberately surfaced as an honesty demonstration. Treat the positive effect studies as moderate and context-specific; do not generalize them into a quantified product claim.

Premortem S/M

Mitchell, Russo & Pennington (1989), J. Behavioral Decision Making 2(1):25-38 - prospective hindsight; the ~30%-more-reasons finding.
Klein (2007), Harvard Business Review 85(9) - “Performing a Project Premortem.”
Veinott, Klein & Wiggins (2010), ISCRAM 2010 - premortem reduces overconfidence / improves plan calibration.
Kahneman (2011), Thinking, Fast and Slow - popularization; ties premortem to overcoming optimism bias and groupthink.

Verification status: citations 1-4 are standard and well-attested in the discovery corpus, but the exact effect-size phrasings and the 2024 meta-analysis claim in section 3 were drawn from a secondary research synthesis and should be confirmed against the primary papers before they appear in any public-facing README. They are safe to use inside this dossier because the dossier’s job is to be honest about exactly this uncertainty.

Problem Restatement M/P

Stanford d.school, design-thinking process guide (define mode; framing improves ideation quantity and quality). Practitioner/design-research.
Wedell-Wedellsborg, T. (2017). “Are You Solving the Right Problem?” Harvard Business Review.
Getzels, J. W., & Csikszentmihalyi, M. (1976). The Creative Vision - problem finding and creative performance. Plus later problem-construction work (Runco; Mumford et al.).
Nutt, P. C. (2002). Why Decisions Fail - premature/narrow problem definition as a failure driver.

Verification status: citations 1-2 are well-attested and safe to cite. Citations 3-4 are correctly attributed in substance (problem-finding research; Nutt’s decision-failure work) but the exact page-level claims should be confirmed against the primary sources before any public-facing quantified claim. The Einstein quote is explicitly excluded as apocryphal.

Pyramid Principle P

Minto, B. (1987). The Pyramid Principle: Logic in Writing and Thinking. - the method, MECE grouping, and SCQA introduction; originated as McKinsey house guidance.
Reading-comprehension / advance-organizer literature (e.g. Ausubel’s advance-organizer work and thesis-first expository-text studies) - the general finding that stating the main point up front aids comprehension and recall of structured text. Cited as an adjacent plausibility anchor for the “answer first” move, not as a test of the pyramid method.

Verification status: Minto (citation 1) is the well-attested primary source for the method and its components. The comprehension/advance-organizer link (citation 2) is drawn from secondary synthesis and is deliberately framed as adjacent support, not direct validation; confirm the specific studies against primary sources before any public-facing claim, and never upgrade this from a plausibility anchor to evidence that the pyramid method itself was tested. The “no controlled studies of the named method” statement in section 3 is the honest default and should stand unless a primary study is found.

Question Burst P

Gregersen, H. (MIT Sloan), “Better Brainstorming” (HBR) and the Question Burst method.

Verification status: Gregersen/MIT Sloan attribution is well-attested. Treat participant benefits as practitioner-reported, not a measured decision-quality effect.

Red Team Light P

Red teaming practice (military / intelligence / security).
Nemeth, C. et al. (2001) - authentic dissent vs role-played devil’s advocacy (role-play does not replicate the gains).

Verification status: the Nemeth finding is well-attested and is deliberately surfaced as the honesty flag. Do not present an AI red team as equivalent to genuine dissent.

Reference Class Forecasting S

Kahneman, D., & Lovallo, D. (1993) - timid choices, bold forecasts; the planning fallacy and the outside view.
Flyvbjerg, B. - reference class forecasting for infrastructure; documented cost/schedule overruns; institutional adoption (e.g. UK guidance).
Kahneman, D. (2011), Thinking, Fast and Slow - inside vs outside view popularization.

Verification status: the planning-fallacy and Flyvbjerg findings, and the institutional adoption, are well-attested; the “S” grade is justified. Keep the “use real base rates, not invented ones” constraint front and center for AI use.

SCAMPER P

Eberle, B. (1971). SCAMPER: Games for Imagination Development - the mnemonic, built on Osborn’s idea-spurring checklists.
Delft design guide; IMD innovation guide - SCAMPER as a standard later-stage ideation method.
(Contrast) Diehl & Stroebe and the brainwriting/NGT literature - the methods with stronger generation evidence, noted here so SCAMPER’s tier is not overstated.

Verification status: the Eberle/Osborn lineage is well-attested. The “structured prompts help break fixedness” claim is directional from the creativity literature; do not attach a quantified effect to SCAMPER specifically in any public-facing text.

Stocks and Flows Reasoning S

Sterman, J. (2000). Business Dynamics; and Sterman’s accumulation experiments.
Cronin, M., Gonzalez, C., & Sterman, J. (2009). “Why don’t well-educated adults understand accumulation?” Organizational Behavior and Human Decision Processes.
Meadows, D. (2008). Thinking in Systems.

Verification status: the Sterman accumulation-failure finding is well-attested; confirm the Cronin/Gonzalez/Sterman citation specifics before a public quantified claim. The honest scope - “corrects a specific accumulation error,” not “teaches systems thinking” - is the core caveat.

What Would Have to Be True P

Lafley, A. G., & Martin, R. L. (2013). Playing to Win: How Strategy Really Works.
Martin, R. L. - HBR strategy writing on reverse-engineering options and testing the assumptions you are least sure of.

Verification status: the attribution to Lafley & Martin / Playing to Win is well-attested. Treat all effectiveness as practitioner-reported; do not attach a quantified outcome claim in any public-facing text.

WOOP (Mental Contrasting with Implementation Intentions) S

Oettingen, G. - mental contrasting; Rethinking Positive Thinking (2014).
Gollwitzer, P. - implementation intentions.
Wang, G. et al. (2021) - meta-analysis of mental contrasting with implementation intentions.

Verification status: the 25+ RCT / meta-analysis claim and the “positive fantasy alone backfires” finding are well-attested in Oettingen’s program; confirm the Wang 2021 specifics against the paper before a public quantified claim.

Thinking Framework Skills v0.3.0 · 38 frameworks