Affinity Mapping

Affinity mapping takes a pile of many individual items - raw notes, observations, quotes, data points - and groups them bottom-up by felt similarity until a small set of emergent themes appears, then names each theme so the names become the structure. The load-bearing move is deferred, bottom-up categorization: you do not sort items into predefined buckets, you let the categories surface from the items themselves. This externalizes comparison so patterns hidden in a linear list become visible, resists the frame you walked in with, and compresses many items into a few themes while keeping every item traceable to its theme. The output is a clustered theme map, not a discussion.

When to Use

When dozens to hundreds of existing items - user-research notes, interview quotes, support tickets, survey free-text, retro stickies, workshop output - need to become a few themes.
When the right structure is not known in advance and should emerge from the data rather than be imposed.
When the items already exist and the job is synthesis, not generation.
When traceability matters: you want each theme to point back to the specific items that support it.

When NOT to Use

When there are only a handful of items. With a dozen or fewer you can reason about them directly; the clustering ceremony adds overhead without insight.
When you need a top-down logical structure - a question decomposed into MECE sub-questions or a hypothesis tree. That is top-down decomposition from a question; use an issue-tree skill. Affinity mapping is bottom-up, from items.
When you need to generate ideas or options. Affinity mapping only organizes items that already exist and produces no new ideas. Use an ideation skill (for example brainwriting) to create the items first, then affinity-map them.
When the categories are already fixed and authoritative (a required taxonomy, a compliance schema). Then you are coding into known buckets, not discovering emergent themes.
As a ritual - grouping into a few buckets and slapping confident names on them with no traceability is cargo-cult synthesis, not insight.

Instructions

When asked to run an affinity map, follow these steps:

Frame the question and gather the items. State in one sentence what synthesis is for (for example, “what is blocking free-tier activation?”), and assemble every item as a discrete, comparable unit. If there are only a handful of items, or the categories are already fixed, say so and stop.
Cluster before naming. Place items that feel related together, bottom-up, by similarity. Do not start from predefined buckets and do not name groups yet. Let clusters form, split, and merge as items accumulate. This deferral is the mechanism; naming first defeats it.
Name each emergent theme from its contents. Once clusters are stable, give each a short descriptive name that answers to the items inside it, not to your prior frame. A theme whose items do not cohere is a signal to split or dissolve it, not to force a label.
Keep every item traceable. Each theme records the source items it contains (by list, or by count plus representative examples). Items that did not cluster go to an explicit outliers / parking lot, never silently dropped.
Weight and read the themes. Note each theme’s relative size or strength, and state what the map tells you - which themes dominate, which are thin, what surprised you. Size is a signal of salience, not of truth; flag thin or borderline clusters as tentative.
Emit the theme map and a short summary. Produce the artifact in references/TEMPLATE.md: a one-paragraph “themes and what they tell us” summary above the named-theme table, with outliers kept visible.

Output Format

Use the template in references/TEMPLATE.md. The deliverable is the filled clustered theme map plus its summary, not a prose essay.

Quality Checklist

Before finalizing, verify:

Clustering happened before naming (themes emerged from items, not from predefined buckets).
Each theme has a short name that answers to the items inside it, not to a prior frame.
Every item is traceable to a theme, and items that did not cluster are kept in an explicit outliers / parking lot, not dropped.
Themes are weighted by relative size or strength, and thin or borderline clusters are flagged as tentative, not laundered by a confident label.
The output is the clustered theme map artifact, not prose.
No overclaiming: the skill organizes a scattered pile into named, traceable themes; it does not promise objectively better or bias-free themes (see evidence/dossier.md).

Evidence

Tier P (practitioner). Affinity mapping is a long-standing, widely-taught practitioner standard for synthesizing large qualitative piles (the KJ method; Kawakita 1967), with a plausible cognitive basis in external representation and chunking. It does not have strong controlled evidence that it produces better, more accurate, or less biased themes than another synthesis method, and “group by similarity” remains a subjective judgment. The evidence is transferred from human practice and has not been validated for AI-augmented use. Full grading, sources, and caveats: evidence/dossier.md.

Examples

See references/EXAMPLE.md for a completed run.

Deep dive: worked example

A full worked run (the shared Northwind scenario)

Affinity Map (Clustered Theme Map) - Worked Example

A completed run of the affinity-mapping skill on a real synthesis task. This is the quality bar a generated affinity map should meet.

Uses the shared recurring scenario: Northwind, a B2B SaaS weighing a self-serve free-tier launch. Here the team has already collected a large, scattered pile of qualitative signal and needs to turn it into a few themes before deciding. See docs/internal/AUTHORING.md.

Synthesis subject

Question: What do prospects and trial users actually struggle with, so we know whether a self-serve free tier would help and where it must be strong?
Source of items: 38 sales-call notes from lost or stalled deals + 24 trial-user onboarding survey free-text answers + 19 support tickets tagged “evaluation.”
Item count: 81 discrete items.

Themes and what they tell us (summary)

The pile collapses into five themes. Two dominate: time-to-first-value is too slow (people cannot tell if Northwind works for them before their evaluation window or patience runs out) and buyers cannot evaluate without committing (procurement, seat minimums, and sales gating block hands-on trial). Both point the same way: a low-friction self-serve path that delivers value fast is responding to a real, repeated signal, not a hunch. A thinner but sharp theme, wrong-fit prospects waste cycles, is a warning that a free tier could amplify unqualified volume if there is no light qualification. The two smallest themes (integration gaps, pricing-page confusion) are real but secondary. The headline: demand for self-serve evaluation is well-evidenced; the risk is fit and activation speed, not appetite.

Theme map

#	Theme name	What unifies it (one line)	Size (item count)	Weight	Representative items	Confidence
1	Slow time-to-first-value	Users cannot reach a useful result before patience or the eval window runs out	23	H	”Spent the whole trial just importing data”; “couldn’t get a real report out in two weeks”; “gave up before I saw anything work”	Firm
2	Cannot evaluate without committing	Procurement, seat minimums, and sales gating block hands-on trial	19	H	”Wanted to just try it, got routed to a sales call”; “needed a PO before we could touch it”; “min 10 seats killed our pilot”	Firm
3	Wrong-fit prospects waste cycles	People who were never a fit consumed trials and sales time	14	M	”Solo user, no team to collaborate with”; “expected a free CRM, we’re not that”; “wrong industry, no use case”	Firm
4	Integration gaps block adoption	A missing connector stalls the evaluation before value lands	12	M	”No Salesforce sync so we couldn’t test the real workflow”; “needed our SSO before security would approve a trial”	Firm
5	Pricing-page confusion	Prospects misread what each tier includes and disqualify wrongly	7	L	”Couldn’t tell what was in the Pro plan”; “thought feature X was paid-only, it isn’t”	Tentative

Outliers / parking lot

“Loved the mobile app” - positive, off-question, not a struggle. One-off.
“Competitor offered a free migration” - a single competitive-loss note; could be an early signal of a switching-cost theme if more accumulate, but a singleton today.
“Asked for an on-prem option” - one regulated prospect; parked, not a pattern.

Note how the value is in the deferred, bottom-up clustering with traceability: the themes emerged from 81 scattered items rather than from the team’s prior assumptions, each theme points back to the verbatim signal that supports it, and the thin “pricing-page confusion” cluster is flagged Tentative rather than dressed up as a finding. A naive prompt would summarize the pile into a confident paragraph and lose both the weighting and the trail back to the evidence.

Grounding: the full evidence dossier

What the research does and does not show, with graded sources

Evidence Dossier: Affinity Mapping

The single source of truth for the affinity-mapping skill. The SKILL.md, the sidecar (skill.meta.yml), and the eval cases all derive from this file. If a claim is not here, it does not belong in the skill.


Skill	`thinking-framework-skills.affinity-mapping` (installable name `think-affinity-mapping`)
Family	synthesis
Evidence tier	P (practitioner; limited controlled evidence)
Confidence	Moderate that the mechanism organizes a scattered pile usefully; low that any specific quality or speed gain is established by controlled study
Status	draft (first authored 2026-05-31, against discovery corpus)

1. The mechanism (what actually does the work)

Affinity mapping takes a pile of many individual items - raw notes, observations, quotes, data points, sticky-note ideas - and groups them bottom-up by felt similarity until a small set of emergent themes appears. Each theme is then named, and the names become the structure.

The load-bearing move is deferred, bottom-up categorization. You do not start from predefined buckets and sort items into them. You start from the items, place ones that “feel related” together, and let the categories surface from the data. Three things follow:

It externalizes and parallelizes comparison. A scattered list is hard to hold in mind; laying every item out as a peer and grouping by proximity turns an O(n) memory problem into a spatial one, so patterns that were invisible in a linear list become visible.
It resists premature structure. Because the categories are discovered rather than imposed, the grouping is less likely to just confirm the frame you walked in with. The themes are answerable to the items.
It compresses without discarding. Many items collapse into a few named themes, but every item stays attached to its theme, so the synthesis is traceable back to its evidence rather than replacing it.

The mechanism is what we implement. The branded “KJ method” / “affinity diagram” ritual (sticky notes, silent sorting, dot voting) is the packaging; the durable move is bottom-up clustering of existing items into named, traceable themes.

2. Lineage

The KJ method, the original formulation, named for its creator: Kawakita, Jiro (1967). Hassoso (Abduction / The Idea-Generation Method). Tokyo: Chuokoron-sha. Developed for synthesizing field-research data in cultural anthropology.
Affinity diagram as one of the “Seven Management and Planning Tools” in Japanese quality management, from which it entered Western quality and design practice (Mizuno, Shigeru, ed., Management for Quality Improvement: The Seven New QC Tools, 1988).
Adoption in UX / design / product practice as the standard way to synthesize user-research notes and workshop output into themes (widely documented by IDEO, the Nielsen Norman Group, and the design-sprint literature).

“KJ Method” is associated with Jiro Kawakita and is sometimes treated as a registered designation in Japan. We name the skill descriptively (affinity-mapping) after the durable mechanism rather than the named method, cite the lineage here, and require no attribution.

3. What the evidence shows, and what it does NOT show

This is the honest core of the dossier. The skill must not overclaim.

What is reasonably supported (the practitioner basis):

Affinity mapping is a long-standing, widely-taught practitioner standard for synthesizing large qualitative piles in anthropology, quality management, UX research, and facilitation. Its longevity and breadth of adoption are real and are the main evidence for it.
The underlying cognitive idea - that externalizing items and grouping by similarity makes patterns easier to see than a linear list - is consistent with well-established findings on external representation and chunking in cognition. That is supporting, not direct, evidence.

What is NOT shown (the caveat that keeps the skill honest):

There is no strong body of controlled evidence that affinity mapping produces better themes, more accurate synthesis, or better downstream decisions than another synthesis method or than an expert reading the items closely. The claim for it is practitioner consensus and plausibility, not measured outcome.
The method is sensitive to the grouper’s bias. “Group by similarity” is a subjective judgment; two people (or two runs) can produce different theme sets from the same items. Bottom-up framing reduces, but does not remove, the risk of the grouping merely re-encoding the analyst’s prior frame.
Theme names can launder weak groupings. A confident label on a thin or incoherent cluster makes it look like a finding. The presence of named themes is not evidence that the themes are real.
It does not generate the items. Affinity mapping only organizes what is already in the pile; if the inputs are sparse, skewed, or low-quality, the themes inherit those defects.

Net grade: P. Useful, durable, widely-practiced synthesis method with a plausible cognitive basis, but limited controlled evidence for any specific quality or speed gain. The skill should claim “organizes a scattered pile into a small set of named, traceable themes” and explicitly disclaim “produces objectively better or bias-free themes.”

4. Transferred-evidence flag (required honesty for this library)

All of the basis above comes from human practice and human-subject cognition research in research-synthesis, quality, and design settings. There is no direct study of affinity mapping run by, or with, an AI agent, and none of whether an AGENT-produced affinity map improves a human’s synthesis or decision. The evidence supporting this skill is therefore transferred from human practice, not validated for AI-augmented use. This skill must say so. Treat the AI value as: the agent makes the clustering cheap to run at scale, enforces the bottom-up discipline (cluster before naming), keeps every item traceable to its theme, and produces a durable, reusable artifact - benefits that do not depend on any contested quality claim.

5. When it works / when it fails (drives the eval negative cases and “When NOT to Use”)

Works best when:

There are many items (roughly dozens to hundreds) - research notes, support tickets, survey free-text, retro stickies, interview quotes - that need to become a few themes.
The right structure is not known in advance and should emerge from the data rather than being imposed.
The items already exist; the job is synthesis, not generation.
Traceability matters: you want each theme to point back to the specific items that support it.

Fails or misleads when (poor-fit / anti-patterns):

Only a handful of items. With a dozen or fewer items you can reason about them directly; the clustering ceremony adds overhead without insight. (Anti-trigger.)
You need a top-down logical structure - a question decomposed into MECE sub-questions or a hypothesis tree. That is issue-tree decomposition (top-down, from a question), not affinity mapping (bottom-up, from items). (Near-miss anti-trigger.)
You need to generate ideas or options. Affinity mapping organizes items that already exist; it produces no new ideas. Use an ideation method (for example brainwriting) to create the items first, then affinity-map them. (Near-miss anti-trigger.)
The categories are already fixed and authoritative (a required taxonomy, a compliance schema). Then you are coding/sorting into known buckets, not discovering emergent themes.
Run as ritual - grouping into a few buckets and slapping confident names on them with no traceability and no discipline against the analyst’s prior frame produces cargo-cult themes. The skill must keep items attached to themes and force naming to come after grouping.

6. Output artifact

The skill must emit a clustered theme map, not prose: a small set of named themes, each with a one-line description of what unifies it, the list (or count plus representative examples) of source items it contains, and its relative size/weight, preceded by a short “themes and what they tell us” summary. Items that did not cluster (“outliers / parking lot”) are kept visible, not silently dropped. The artifact is the deliverable; the conversation is not.

7. Sources

Kawakita, Jiro (1967). Hassoso - the original KJ method for synthesizing field data bottom-up.
Mizuno, Shigeru, ed. (1988). Management for Quality Improvement: The Seven New QC Tools - affinity diagram in quality management.
Nielsen Norman Group, “Affinity Diagramming: Collaboratively Sort UX Findings & Design Ideas” - the standard UX-practice description of the technique.
Scupin, R. (1997). “The KJ Method: A Technique for Analyzing Data Derived from Japanese Ethnology.” Human Organization 56(2):233-237 - documents the method’s anthropological origin and use.

Verification status: citations 1-2 are the standard historical attributions and well-attested in the discovery corpus; the exact NN/g phrasing (citation 3) and the Scupin page reference (citation 4) were drawn from a secondary research synthesis and should be confirmed against the primary sources before they appear in any public-facing README. They are safe to use inside this dossier because the dossier’s job is to be honest about exactly this uncertainty. The “limited controlled evidence” claim in section 3 is a deliberate statement of absence: it should stay phrased as “no strong controlled evidence found,” not as a positive finding.

Thinking Framework Skills v0.3.0 · 38 frameworks