Skip to content

Boundary Critique

Every problem frame draws a line before any reasoning starts: who counts, what counts, whose improvement is the point, and who is left on the other side of the line. Those prior decisions are boundary judgments, and they condition both the facts you collect and the values you weigh. The reflex is to reason inside the frame as given. Boundary critique refuses that reflex and makes the frame itself the suspect object. It interrogates each boundary judgment in two modes - how the frame currently draws the line (the is mode) and how it ought to (the ought mode) - and forces an explicit account of the parties who have a stake in the consequences but no seat in the frame. The move is the de-branded core of Critical Systems Heuristics (Werner Ulrich, from 1983, later with Martin Reynolds). The output is a boundary-judgment audit, not a discussion.

  • A plan, metric, proposal, or “solution” already encodes who matters and who does not, and the risk is solving a tidy problem for the people inside the line while externalizing harm onto people outside it.
  • An “improvement” claim rests on an unexamined judgment about whose improvement (a growth metric, an efficiency gain, a success measure that quietly picks a beneficiary).
  • The situation is contested, value-laden, or multi-party: policy, programme or intervention design, evaluation, or any decision where reasonable parties would draw the boundary differently.
  • You suspect a frame is illegitimately bounded - that the affected-but-excluded exist - and want them surfaced before the frame is acted on.
  • The frame is genuinely settled and uncontested. If the stakeholder set and the success measure are already agreed and legitimate, auditing the boundary manufactures doubt and stalls execution. This is the same failure as reframing a correct problem. This is the central wall.
  • The problem is technical or single-party. With one obviously-correct beneficiary and no excluded affected parties, the audit produces empty “ought” columns - ritual, not insight.
  • You expect it to resolve the disputed boundary. Boundary critique surfaces and debates the is-vs-ought gap; it does not adjudicate it. It exposes the boundary question and names who is excluded; it does not settle who is right or compel a powerful actor to widen the line. When the gap is real, route the decision onward (e.g. to think-decision-option-review) - the audit informs that choice, it is not the choice.
  • You only need to hear the stakeholders. If you just want each in-scope party voiced, that is a stakeholder round-up - use think-parallel-perspectives-review (stakeholder mode) or think-problem-restatement (its stakeholder shift). Boundary critique is the different, upstream move: it takes the stakeholder set itself as suspect and audits inclusion-versus-exclusion in is/ought terms. Run it as a round-up and it collapses into the move the library already ships.

When asked to audit who a frame includes and excludes, follow these steps:

  1. State the frame under audit. Record the plan, metric, proposal, or decision exactly as given, in one line, plus the improvement it claims to deliver. If the frame is already settled, agreed, and legitimate, say so and stop - do not manufacture doubt.
  2. Audit the four sources, each in is then ought mode. For each source, answer the boundary question first descriptively (how the frame currently draws the line), then normatively (how it ought to):
    • Motivation - who benefits. Who is the client/beneficiary; what is the purpose; what is the measure of improvement or success? (Is: whose improvement does the current frame actually serve? Ought: whose improvement should it serve?)
    • Power/control - who decides. Who is the decision-maker; what is under their control; what is in the decision environment, outside it? (Is vs ought for who holds the decision.)
    • Knowledge - whose expertise counts. Who is treated as expert; what expertise actually applies; what is assumed to guarantee success - and are those false guarantors? (Is vs ought for whose knowledge is admitted.)
    • Legitimacy - who has standing. Who witnesses for those affected but not involved; what worldview is treated as authoritative; how are competing worldviews reconciled? (Is vs ought for who has standing.)
  3. Name the is-vs-ought gaps. For each source, state the gap between the is-boundary and the ought-boundary in one line. The gap is the finding - where the frame draws the line versus where it should. A source with no gap is a legitimate boundary; say so rather than padding.
  4. List the affected-but-excluded. This is the move the rest of the library does not have. Enumerate the parties with a real stake in the consequences who hold no seat, no voice, and no expertise-standing inside the frame, and for each note who, if anyone, currently witnesses for them. Do not fold this into “voice the stakeholders” - these are precisely the parties a stakeholder walk-through cannot reach, because they are outside the line.
  5. Emit the boundary-judgment audit. Produce the artifact in references/TEMPLATE.md: the four sources answered in both is and ought modes, the is-vs-ought gaps named, and the explicit list of the affected-but-excluded. End by stating what the audit does NOT do - it surfaces the boundary question, it does not adjudicate it - and, where a real gap exists, the onward route for deciding under it. The template’s pre-printed evidence caveat is part of the artifact; carry it through verbatim.

Use the template in references/TEMPLATE.md. The deliverable is the filled audit - the four sources in is/ought, the named gaps, and the affected-but-excluded list - not a prose essay.

Before finalizing, verify:

  • The frame under audit is captured verbatim, with the improvement it claims, in one line.
  • All four sources (who benefits, who decides, whose knowledge counts, who has standing) are answered in both is and ought modes - no source left in one mode only.
  • Each source has an explicit is-vs-ought gap stated (or an honest “no gap - legitimate boundary”), not a restatement of the answers.
  • The affected-but-excluded are listed as a distinct section - parties outside the line with a stake but no voice - not merged into a stakeholder voicing of in-scope parties.
  • The output names what it does NOT do (surfaces the boundary question, does not adjudicate it) and routes a real gap onward (e.g. to think-decision-option-review) rather than presenting itself as the resolution.
  • The output is the boundary-judgment audit artifact, not prose.
  • No overclaiming: the evidence is conceptual and transferred; claim “surfaces who the frame illegitimately includes or excludes, descriptively versus normatively,” not a measured improvement in decisions (see evidence/dossier.md).

Tier C (governing; honest read C/P, capped at C). Critical Systems Heuristics is an influential, well-developed framework in systems thinking, operational research, and evaluation, with a forty-year literature and a clear, teachable apparatus (the twelve boundary questions, the four sources, the is/ought pairing). A 2024 systematic review (Hutcheson, Morton and Blair, Systemic Practice and Action Research 37(4): 499-514) examined 77 peer-reviewed papers and found a real body of applied case work, with utility “best exemplified in an action research context.” But there is no controlled, comparative, or outcome study of boundary critique - the same review reports many papers are theoretical rather than applied, contains no trials or comparison groups, and calls CSH a “relatively underutilised method.” The evidence is the existence and reasoned application of the method, not evidence that running it produces better frames - which is the line between C and P, and why the grade caps at C. All of it is transferred from human policy, evaluation, and action-research practice; none studies an AI-produced boundary critique. No effect-size figure is cited, because none exists. Full grading, sources, and caveats: evidence/dossier.md.

See references/EXAMPLE.md for a completed boundary-judgment audit on a real decision.

A full worked run (the shared Northwind scenario)

A completed run of the boundary-critique skill on a real, consequential decision. This is the quality bar a generated audit should meet.

Uses the shared recurring scenario (Northwind, a B2B SaaS weighing a self-serve free-tier launch) so examples across skills read as one coherent product. Where abstraction-laddering relocated what altitude to work the free-tier problem at, and contradiction-resolution tested whether the generous-vs-limited trade-off could be dissolved, this skill audits whose decision the free-tier launch is - who the frame counts, and who it leaves outside the line. See docs/internal/AUTHORING.md.


  • Frame as given: “Should we launch a self-serve free tier? The model says it lifts top-of-funnel signups and our PLG growth metric; we’re optimizing the free-tier design to maximize qualified conversions.”
  • Improvement it claims: A “better” measured almost entirely as new-logo growth and qualified pipeline - the company’s PLG funnel metric.
  • The user’s actual goal: Grow durable, profitable revenue without quietly trading it away elsewhere.

The free-tier frame is drawn tightly around one beneficiary (the company’s growth team) and one measure (top-of-funnel conversion). Audited in is/ought terms, three boundary judgments show real gaps: the beneficiary is the funnel metric, not the people the product is for (motivation); the decision sits with growth, while the support and existing-customer-success functions who absorb the consequences are outside the decision environment (power); and the worldview treated as authoritative is “users are a funnel,” which has no standing for the free-tier users themselves or for existing paying customers (legitimacy). The most consequential affected-but-excluded parties are existing paying customers, whose support and roadmap attention degrade as free-tier volume floods in, and free-tier users, who are framed as conversion fuel rather than people getting a job done. This audit surfaces those boundary questions; it does not decide whether to launch the free tier. It hands the growth-versus-existing-revenue and the user-as-funnel gaps to a decision step, with the excluded parties now on the table.

Source (boundary question)Is (how the frame draws it now)Ought (how it should)Gap
Motivation - who benefitsThe beneficiary is the growth team’s PLG metric; the purpose is more signups; success is qualified-conversion rate. The “improvement” is the company’s funnel, full stop.The beneficiary should include the people the product serves - free-tier users getting real value, and existing customers whose experience funds the company. Success should be measured net of harm to them, not just funnel lift.Large. The frame optimizes the company’s growth number and silently treats users (free and paying) as inputs to it, not as parties whose improvement is the point.
Power/control - who decidesThe growth/PLG team decides; pricing and funnel design are under their control. Support capacity, existing-customer success, and infra cost-to-serve are treated as the environment - outside the decision.The functions who absorb the consequences (support, customer success, finance/cost-to-serve) should have a real seat, because the decision spends their capacity and budget, not only growth’s.Large. The people who decide are not the people who bear the cost; the decision externalizes load onto functions with no vote.
Knowledge - whose expertise countsGrowth-modeling and conversion-analytics expertise is authoritative. The assumed guarantor of success is “the model says signups go up.”Front-line support knowledge (what flood-of-free-users actually does to response times) and existing-customer-success knowledge (what attention shifts away from renewals) should count. The model is a false guarantor if it omits cost-to-serve and churn of paying accounts.Moderate-to-large. The expertise admitted is the expertise that supports the launch; the expertise that would surface its costs is outside the frame.
Legitimacy - who has standingThe authoritative worldview is “users are a funnel; more top-of-funnel is better.” Standing belongs to whoever moves the growth metric.Free-tier users (as people doing a job, not leads) and existing paying customers (whose service quality is at stake) should have standing. Someone must witness for both, since neither is in the room.Large. No one currently witnesses for the affected-but-excluded; the funnel worldview has no category for “a free user who is treated as fuel” or “a paying customer who quietly gets worse service.”
Affected-but-excluded partyStake in the consequencesWho (if anyone) witnesses for them now
Existing paying customersSupport response times and roadmap attention degrade as free-tier volume floods in; they fund the company but were never part of the free-tier decision.No one - the frame counts new logos, not the experience of current ones. (Customer success could, if given a seat.)
Free-tier users (as people, not leads)Framed as conversion fuel; their actual job-to-be-done and their experience are instrumental to the metric, not an end.No one - the funnel worldview has no standing for them except as a conversion rate.
The support / on-call teamAbsorbs the load the decision creates (ticket volume, off-hours pressure) with no vote and no budget adjustment in the frame.No one in the decision; their capacity is treated as free environment.
Finance / cost-to-serve ownerInfra and support cost-to-serve of a large free base hits a budget the growth metric does not see.Partially - only if cost-to-serve is forced into the model rather than left in the environment.
  • It surfaces the boundary question; it does not adjudicate it. Whether Northwind should launch the free tier is still open. What the audit establishes is that the current frame answers “whose improvement, decided by whom, on whose knowledge, with whose standing” in a way that excludes the parties who bear the cost - so a launch decision made inside this frame would optimize the funnel while externalizing harm onto existing customers, free users, and support.
  • Onward route: take the widened frame to think-decision-option-review - compare “launch as framed”, “launch with a cost-to-serve and existing-customer-experience guardrail”, and “do not launch” with the excluded parties’ stakes now scored, so the choice is made under the real boundary rather than the tidy one. The audit informs that decision; it is not the decision.

Note how the value is in auditing the frame’s membership rather than reasoning inside it: an unaided pass would have helped design the free tier to maximize the PLG metric - accepting the frame’s beneficiary (the funnel) and its measure (conversions) as given. The skill took the stakeholder set itself as the suspect object, contrasted who the frame counts (is) against who it ought to, and surfaced the parties outside the line - existing paying customers, free users-as-people, support - whom a stakeholder walk-through of the in-scope team could never have reached. It did not decide the launch; it made sure the launch would not be decided for a tidy problem at the excluded parties’ expense.

What the research does and does not show, with graded sources

The single source of truth for the boundary-critique skill. The SKILL.md, the sidecar (skill.meta.yml), and the eval cases all derive from this file. If a claim is not here, it does not belong in the skill. Drafted by the think-research-framework engine and admitted as a Build.

Skillthinking-framework-skills.boundary-critique (installable name think-boundary-critique)
Familyproblem-framing
Evidence tierC governing (honest read C/P, capped at C - see “What the evidence shows”)
ConfidenceModerate that the is/ought boundary interrogation surfaces real exclusions on contested frames; low that any outcome benefit transfers to agents
Statusdraft (admitted from the SP6 discovery shortlist)

1. The mechanism (what actually does the work)

Section titled “1. The mechanism (what actually does the work)”

Boundary critique is the central operation of Critical Systems Heuristics (CSH), developed by Werner Ulrich. Its durable cognitive move is to interrogate the boundary judgments that silently define a problem frame - the prior decisions about what and who count as relevant, and what and who are left out - rather than to reason inside the frame as given. In Ulrich’s terms, boundary judgments “determine which empirical observations and value considerations count as relevant and which others are left out,” and they “condition both facts and values”: the facts you collect and the values you weigh both depend on where you drew the line first.

The move is operationalized as a checklist of twelve boundary questions grouped under four sources of influence over any design or intervention:

  1. Motivation (who benefits): who is the client/beneficiary; what is the purpose; what is the measure of improvement or success.
  2. Power/control (who decides): who is the decision-maker; what resources and conditions are under their control; what is outside their control (the decision environment).
  3. Knowledge (whose expertise counts): who is treated as expert/professional; what relevant expertise actually applies; what is assumed to guarantee success (and whether those are false guarantors).
  4. Legitimacy (who has standing): who witnesses for those affected but not involved; where the affected draw emancipation from the premises of the involved; what worldview is treated as authoritative, and how competing worldviews are reconciled.

The decisive structural feature is that every question is asked twice, in two modes: the “is” mode (descriptive - how the frame currently draws the boundary) and the “ought” mode (normative - how it should draw it). The gap between the is-boundary and the ought-boundary is the finding. The fourth group, legitimacy, contains the move the rest of the library does not have: it forces an explicit account of those affected but not involved - parties with a stake in the consequences who hold no seat, no voice, and no expertise-standing inside the frame - and asks who, if anyone, witnesses for them.

The deliverable is a boundary-judgment audit: the categories answered in both is and ought modes, the is-vs-ought gaps named, and an explicit list of the affected-but-excluded the current frame omits. The point is not to round up stakeholders; it is to test whether the frame’s membership - whom it includes and excludes - is legitimate, descriptively versus normatively, before the frame is acted on.

  • Critical Systems Heuristics (CSH) and boundary critique: Werner Ulrich, first set out in Critical Heuristics of Social Planning: A New Approach to Practical Philosophy (Haupt, 1983) and condensed in his 1987 European Journal of Operational Research paper “Critical heuristics of social systems design.” The framework is grounded in practical philosophy - Kant’s regulative ideas, C. West Churchman’s systems approach, and Jurgen Habermas’s theory of communicative action.
  • Later elaboration: developed with Martin Reynolds (the CSH chapter in Systems Approaches to Managing Change, and Reynolds’s work on CSH-based evaluation), which is the standard teaching reference for the four sources and twelve questions in is/ought form.
  • Naming and IP: “Critical Systems Heuristics” and “boundary critique” are generic academic terms in common scholarly use, not trademarks. This skill credits Ulrich (and Reynolds) as lineage but is not branded and needs no trademark string; the attribution-not-branding treatment applies. It ships under a mechanism-over-brand name, boundary-critique, per the library’s first commitment.

3. What the evidence shows, and what it does NOT show

Section titled “3. What the evidence shows, and what it does NOT show”

The honest read is C/P, capped at the conservative governing grade of C. Both the read and the cap matter.

What the record supports. CSH is an influential, well-developed framework in systems thinking, operational research, and evaluation practice, with a forty-year literature and a clear, teachable apparatus (the twelve questions, the four sources, the is/ought pairing). A 2024 systematic review by Hutcheson, Morton and Blair (Systemic Practice and Action Research 37(4): 499-514; epub 29 Nov 2023) examined 77 peer-reviewed papers and found a real body of applied case work across multiple problem domains, with CSH’s “utility … best exemplified in an action research context.” The review also credits CSH’s distinctive reach: it “surpasses soft systems frameworks in its potential to provide insights into coercion” and enables “deep reflection on a problem through the lens of negatively impacted groups.” Ulrich himself frames CSH as “a critical methodology for identifying and debating boundary judgements” rooted in “practical philosophy and systems thinking” - that is, an explicitly philosophical, reflective-practice heuristic, not a technique making an effect-size claim.

What the record does NOT support, and why the grade caps at C. There is no controlled, comparative, or outcome study of boundary critique. The same systematic review reports that “several of the papers reviewed are not practical applications of the framework but contributions to theoretical and methodological discussions,” contains no randomized trials, comparison groups, or quantified effectiveness metrics, and characterizes CSH as a “relatively underutilised method” whose outcomes are largely unmeasured. So there is no demonstration that running a boundary critique produces better decisions, frames, or interventions than not running one. The evidence is the existence and reasoned application of the method, not evidence that it works. That is the line between C and P: there is no practitioner-outcome base to anchor a P, only theory plus applied cases - hence C governs.

No laundered statistics. No effect size or success-rate figure is cited for boundary critique, because there is no nameable primary source for one. None is invented or implied here. A famous, deeply-cited framework can clear the distinctness bar while still grading only C, because influence and a deep literature are not the same thing as evidence that the method works.

Net grade: C (governing), honest read C/P. Claim “surfaces who the frame illegitimately includes or excludes, descriptively versus normatively, and names the affected-but-excluded”; do not claim a measured improvement in decisions, frames, or interventions.

4. Transferred-evidence flag (required honesty for this library)

Section titled “4. Transferred-evidence flag (required honesty for this library)”

The entire literature is human practitioners in policy, evaluation, operational research, and action-research settings. None studies a boundary critique produced by or with an AI agent. The evidence is transferred from human contexts and not validated for AI-augmented use - a second reason the conservative governing grade is C, not higher. There is no S- or M-tier research on this move to borrow from, so there is no optimistic half to inflate from. Treat the AI value as: the agent makes the twelve-question, is/ought pass cheap and disciplined, resists the reason-inside-the-frame reflex, holds the descriptive and normative modes distinct, and enforces the affected-but-excluded accounting that a stakeholder walk-through structurally cannot reach - benefits that do not depend on any unproven outcome claim.

5. When it works / when it fails (drives the eval negative cases and “When NOT to Use”)

Section titled “5. When it works / when it fails (drives the eval negative cases and “When NOT to Use”)”

Works best when:

  • The frame itself is the suspect object - a plan, metric, proposal, or “solution” already encodes who matters and who does not, and the risk is solving a tidy problem for the people inside the line while externalizing harm onto people outside it.
  • An “improvement” claim rests on an unexamined judgment about whose improvement.
  • The situation is contested, value-laden, or multi-party (policy, programme and intervention design, evaluation) - the place the review marks as CSH’s reviewed strength, for its reach into coercion and negatively-impacted groups.

Fails or misleads when (poor-fit / anti-patterns):

  • The frame is genuinely settled and uncontested. If the stakeholder set and the success measure are agreed and legitimate, auditing the boundary manufactures doubt and stalls execution - the same failure as reframing a correct problem. This is the central wall.
  • The problem is technical or single-party. One obviously-correct beneficiary and no excluded affected parties produces empty “ought” columns - ritual, not insight.
  • It is mistaken for a method that resolves the disputed boundary. Boundary critique surfaces and debates the is-vs-ought gap; it does not adjudicate it. The systematic review is blunt that CSH “is predicated on reflective debate” and “may be insufficient in uncovering the cause-and-effect relationships” behind a coercive situation. It exposes the boundary question; it does not settle who is right or compel a powerful actor to widen the line. Route a real gap onward to a decision skill.
  • It is run as a stakeholder round-up. If it degrades into “list the parties and voice each,” it collapses into the move the library already ships (think-parallel-perspectives-review stakeholder mode, or think-problem-restatement’s stakeholder shift) and loses its distinct contribution - the is/ought interrogation and the affected-but-excluded.

The skill must emit a boundary-judgment audit, not prose: the frame under audit captured verbatim with its claimed improvement; the four sources (who benefits, who decides, whose knowledge counts, who has standing) each answered in both is and ought modes; the is-vs-ought gap named for each source (or an honest “no gap - legitimate boundary”); and a distinct, explicit list of the affected-but-excluded - parties outside the line with a stake but no voice, each with a note on who, if anyone, witnesses for them. The artifact closes by stating what it does NOT do (surfaces the boundary question, does not adjudicate it) and, where a real gap exists, the onward route for deciding under it. A short summary sits above the audit.

  1. Werner Ulrich, Critical Heuristics of Social Planning: A New Approach to Practical Philosophy (Haupt, 1983). The foundational statement of CSH and boundary critique. Foundational.
  2. Werner Ulrich, “Critical heuristics of social systems design,” European Journal of Operational Research 31 (1987): 276-283. The condensed OR-facing formulation of the twelve boundary questions and the is/ought modes. Foundational. (C - theoretical/methodological.)
  3. Werner Ulrich, “A Brief Introduction to Critical Systems Heuristics (CSH)” (2005), and the mini-primer of boundary critique on his homepage. The clearest first-person account of boundary judgments conditioning facts and values, and of those affected-but-not-involved. Author’s primer. (C.)
  4. Werner Ulrich and Martin Reynolds, “Critical Systems Heuristics,” ch. 6 in Systems Approaches to Managing Change: A Practical Guide (Reynolds and Holwell eds., Springer/Open University, 2010/2020). The standard teaching reference; the four sources and twelve questions in is/ought form. Practitioner/teaching reference. (P-as-pedagogy, not outcome.)
  5. Mark Hutcheson, Alec Morton and Susan Blair, “Critical Systems Heuristics: a Systematic Review,” Systemic Practice and Action Research 37(4) (2024): 499-514 (epub 29 Nov 2023). 77 peer-reviewed papers; documents the applied-case-and-theory base, the absence of controlled/outcome evidence, and CSH’s underutilisation - the source that bounds the grade at C. Systematic review.
  6. Better Evaluation and Integration and Implementation Insights (i2insights, 2022) primers on CSH for evaluation and research practice. Used here only to corroborate the mechanism (the twelve questions, the four sources, the witness/affected-but-not-involved role), not as evidence of effect. Practitioner reference.

Verification status: The mechanism descriptions (Ulrich 1983/1987/2005, the Reynolds teaching chapter, and the practitioner primers) are well-attested and mutually consistent, as is the generic-term / no-trademark status of “Critical Systems Heuristics” and “boundary critique.” The Hutcheson, Morton and Blair systematic review (5) is the source for the 77-paper count, the theory-versus-application split, the action-research utility note, and the “relatively underutilised” and coercion-reach characterizations; its quoted phrases are reported from the review’s summary rather than an independently audited full read. No effect-size or success-rate figure exists for boundary critique, so none is cited; the absence is itself the reason the governing grade is C. None of these gaps changes the conservative governing grade.

Was this page helpful?
Thinking Framework Skills v0.8.0 · 56 frameworks