Myers-Briggs Type Indicator (MBTI)

Status: Documented, not shipped · Evidence: X · Family: Self and team awareness · Verdict: reject (2026-06-11)

Myers-Briggs Type Indicator (MBTI) is a trademark of MBTI and Myers-Briggs Type Indicator (trademarks of The Myers and Briggs Foundation / The Myers-Briggs Company). Katharine Cook Briggs and Isabel Briggs Myers (from Jungian types).

What it is

The Myers-Briggs Type Indicator is a self-report questionnaire that sorts a person into one of sixteen four-letter types built from four dichotomies: Extraversion vs Introversion (E-I), Sensing vs Intuition (S-N), Thinking vs Feeling (T-F), and Judging vs Perceiving (J-P). You answer forced-choice items, the instrument allocates you to one pole of each dichotomy, and the four poles concatenate into a type label such as INTJ or ESFP. The labels then drive interpretation: type descriptions, team workshops, “type tables,” and advice about how the sixteen types communicate, lead, or pair.

The mechanism that matters for this library is the typological one. MBTI does not present its output as a position on continuous traits; it presents it as membership in a qualitatively distinct category. The claim built into the instrument is that people come in kinds - that an INTJ and an ENFP differ in type, not merely in degree - and that knowing the kind tells you something stable and useful about how the person thinks and decides. That categorical, whole-person typing is the thing to grade, and it is the thing the evidence does not support.

Two framings have to be kept apart, because the popular defense of MBTI quietly swaps between them:

(1) MBTI as a typology - sixteen discrete kinds of person, the product’s actual output and the basis of nearly all MBTI practice. This is what the registry entry describes and what this dossier grades.
(2) The four underlying dimensions read continuously - E-I, S-N, T-F, J-P treated as graded scales rather than as cut-at-the-middle categories. Read this way the instrument correlates respectably with mainstream trait dimensions, but read this way it is no longer “MBTI the typology”; it is a worse-calibrated restatement of trait measurement that already has a better-validated home.

When it helps / when it misleads

The honest “when it helps” is narrow and social, not epistemic. As an icebreaker, a shared vocabulary, or a low-stakes prompt that gets a team talking about differences in working style, MBTI is harmless and sometimes useful - the value is the conversation it sparks, not the accuracy of the label. Many people also report that reading their type description feels insightful, which is exactly the problem below.

It misleads whenever the label is treated as a measurement and used to decide anything:

It manufactures false stability. A four-letter type feels like a fixed fact about you, but the four-letter classification is not stable on retest (see the evidence), so any decision keyed to the whole type is keyed to a coin that lands differently on a second flip.
It invites the Forer/Barnum effect. Type descriptions are written in broadly flattering, near-universal language that almost anyone endorses as accurate, which produces a strong felt sense of validity that is not evidence the category is real.
It is wrong for selection, placement, and pairing. Using a type to hire, assign, promote, or match people imports a categorical claim the data do not support; even the publisher states the instrument should not be used for hiring or selection.
It crowds out the better tool. Because MBTI is famous and free-feeling, it gets used where a graded trait read or a structured perspective move would do real work, and the type label ends the inquiry instead of opening it.

The single rule: MBTI is fine as a conversation starter and unfit as a decision input. The moment a four-letter type is used to predict, select, or assign, it is being asked to carry weight it cannot bear.

What the evidence says

Honest grade: X (poor or contradictory) for the typology - the move the instrument actually sells. This confirms the registry’s preliminary X. The grade is not “no evidence exists”; it is “the best evidence contradicts the core claim,” which is the specific meaning of X. One nuance has to be stated honestly and up front, because getting it wrong is how MBTI defenders launder the grade: some narrow psychometric properties of the four scales are actually decent. That does not rescue the typology, and the section below separates the two carefully.

The four-letter type is not reliable on retest. This is the most damaging finding because reliability is a precondition for everything else. Across studies, a large share of people get a different four-letter type when retested after intervals as short as five weeks; reviews summarizing this report that roughly half of respondents change on at least one of the four dichotomies on a second administration. The mechanism is built into the design: because each continuous scale is cut at the middle to force a categorical pole, people who sit near a midpoint flip poles on small measurement noise, and four near-midpoint cuts compound into frequent whole-type change. Capraro and Capraro (2002, Educational and Psychological Measurement), a meta-analytic reliability generalization of the MBTI, found that the individual scale scores show reasonable internal consistency and continuous test-retest reliability (median stability coefficients in roughly the .69-.78 range across the four scales) - but that reliability lives at the level of the continuous score, not the dichotomized category, and it is the category that the product reports and that practice uses. So the most-quoted reliability defense (“the scales are reliable”) is true and beside the point: the typology is what flips.

The typology itself has no empirical support. McCrae and Costa (1989, Journal of Personality) reinterpreted the MBTI against the Five-Factor Model in a sample of 468 adults and found “no support for the view that the MBTI measures truly dichotomous preferences or qualitatively distinct types.” Preference-score distributions are not bimodal - there is no empirical gap between “Thinking people” and “Feeling people,” just a continuous spread with most people near the middle - so cutting the scale into two kinds creates a boundary the data do not contain. What the same analysis did find is the rescue framing (2) above: the four MBTI indices correlate with four of the Big Five dimensions (E-I with extraversion, S-N with openness, T-F with agreeableness, J-P with conscientiousness). In other words, when MBTI measures anything real, it is measuring well-established trait dimensions less well than the instruments built for them.

As a decision tool it lacks predictive validity. Druckman and Bjork (1991), the National Academy of Sciences committee report In the Mind’s Eye, reviewed the MBTI for applied use and concluded there was not sufficient, well-designed research to justify using it in career-counseling programs, noting that the validity claims outran the evidence and that the MBTI-sponsored research base showed inconsistent and incomplete data. Pittenger (1993, “Measuring the MBTI … and Coming Up Short,” Journal of Career Planning and Employment; and his 1993 review in Review of Educational Research) is the frequently-cited synthesis of the case: the type categories are not independent, the bimodality the model assumes is absent, and the instrument does not predict the job, team, or relationship outcomes it is marketed to predict.

The honest counterweight, stated and bounded. A recent synthesis, Erford et al. (2025, “A 25-Year Review and Psychometric Synthesis of the MBTI Form M,” Journal of Counseling and Development), reports that the modern Form M scales yield scores with strong internal consistency and respectable continuous test-retest reliability. This is real and is included so the grade is not a caricature. But it is reliability of the scales read continuously, not validity of the types; it is the same finding as Capraro and Capraro, updated. It does not touch the two facts that govern the grade: the dichotomous typology has no empirical basis (McCrae and Costa), and the instrument lacks the predictive validity needed to be a decision tool (Druckman and Bjork; Pittenger). A reliable scale that is then carved into categories the data do not contain, and that does not predict outcomes, grades X as a typology even when its underlying scales are internally consistent.

No transferred-evidence rescue, and no laundered statistics. There is no controlled evidence that “thinking by your MBTI type” improves any reasoning or decision outcome - the question this library actually cares about - so there is nothing to transfer even optimistically. The often-quoted “the MBTI is the most widely used personality instrument, taken by ~2 million people a year” figure is a popularity statistic, not an evidence statistic, and popularity is not validity; it is noted here only to be set aside. No effect size in this dossier is asserted without a named source.

Why it is / is not a skill here

Verdict: Reject (status: excl - excluded on the merits). This honors the registry’s preliminary reject and X, and resolves the open question (flag vs excl) toward excl with stated reasons.

It is not a thinking move. Everything this library ships operates on a problem - it reframes, stress-tests, decomposes, weighs options, surfaces risks - and emits a reusable artifact. MBTI operates on a person: it consumes questionnaire answers and emits a category label. A label is not a cognitive move and not a structured deliverable in the library’s sense; there is no “run MBTI on a decision” that produces analysis. Even setting evidence aside, the instrument does not match the shape of a skill here.

It fails the evidence gate on its central claim. The library’s identity is honest evidence grading, and the typology grades X: the categories are not reliable on retest, the dichotomies are not empirically real (McCrae and Costa), and the instrument does not predict applied outcomes (Druckman and Bjork; Pittenger). Shipping a famous instrument whose core claim the best evidence contradicts is precisely the failure this library exists to prevent. That a vendor synthesis reports decent scale reliability does not change the verdict, because the reliable thing (continuous scales) is not the thing the product sells (discrete types).

Why excl and not flag. A flag (include only with caveats) would be right if the method carried a defensible standalone use that survives the caveats. MBTI does not: its only honest “helps” is social ice-breaking, which is not a thinking capability, and its epistemic uses are the ones the evidence contradicts. The defensible signal it gestures at - that people differ along stable dispositions worth taking a perspective on - is real, but it belongs to the generic trait dimensions, not to the sixteen types. That signal is already queued in this catalog as trait-lens-perspective (Big Five / HEXACO, a candidate lens), which takes a decision and views it through contrasting trait viewpoints without claiming discrete kinds of person. So there is nowhere defensible for MBTI to “include with caveats” that trait-lens-perspective does not cover better and more honestly.

Why excl and not fold. A fold requires a single shipped skill whose mechanism subsumes this one; foldInto must resolve to a status shipped entry. The nearest generic counterpart, trait-lens-perspective, is itself only a cand (unbuilt) - there is no shipped target to fold into - and in any case MBTI’s typology is not a near-twin of a trait lens; it is a categorical instrument the trait lens deliberately rejects. With no shipped fold target and a move that fails the evidence gate on the merits, the correct landing is excl, the same landing the catalog gave the cognitive-bias-checklist and the sibling instruments (CliftonStrengths, DISC) it sits beside.

The learning value of this decision: MBTI is the most famous personality instrument in the world and one of the least defensible as a measurement. Documenting it as a deliberate exclusion - with the trademark named, the popularity acknowledged, and the evidence laid out - keeps the catalog honest, separates the real underlying signal (trait dimensions) from the brand, and points anyone reaching for “what type is this person” toward the graded perspective move (trait-lens-perspective) instead of a sixteen-box label.

Lineage and who to read

MBTI was developed by Katharine Cook Briggs and her daughter Isabel Briggs Myers, neither a trained psychologist, from a popular reading of Carl Jung’s Psychological Types (1921); Isabel Myers built the first questionnaire forms through the 1940s-1960s. “Myers-Briggs Type Indicator” and “MBTI” are trademarks of The Myers and Briggs Foundation, with the assessment published by The Myers-Briggs Company; this entry names and attributes the brand descriptively and does not use it as a product. Adjacent typologies that share the same categorical-typing move and the same evidence problems include the Keirsey Temperament Sorter and Socionics, and the free “16 Personalities” knockoff.

For the critical read, start with McCrae and Costa (1989) on the absence of true types and the mapping onto the Five-Factor Model, Druckman and Bjork (1991) for the National Academy of Sciences applied-use verdict, and Pittenger (1993) for the consolidated psychometric case. Pair those with Capraro and Capraro (2002) and Erford et al. (2025) so you can see the honest counterweight - the scales, read continuously, are reasonably reliable - and understand exactly why that does not rescue the typology. For the popular history of how a non-empirical instrument became ubiquitous, Merve Emre’s The Personality Brokers (2018) is the standard account.

Named sources

Robert R. McCrae & Paul T. Costa Jr., “Reinterpreting the Myers-Briggs Type Indicator From the Perspective of the Five-Factor Model of Personality,” Journal of Personality 57(1) (1989): 17-40. Sample of 468 adults; found no support for dichotomous preferences or qualitatively distinct types and no bimodality, but found the four indices map onto four Big Five dimensions. The core construct-validity finding. (Peer-reviewed; governs the grade)
Daniel Druckman & Robert A. Bjork (eds.), In the Mind’s Eye: Enhancing Human Performance, National Research Council / National Academy of Sciences (National Academy Press, 1991), MBTI chapter. Concluded there is not sufficient well-designed research to justify MBTI use in career counseling and that validity claims outran the evidence. (Expert committee review)
David J. Pittenger, “Measuring the MBTI … and Coming Up Short,” Journal of Career Planning and Employment 54(1) (1993): 48-53; and “The Utility of the Myers-Briggs Type Indicator,” Review of Educational Research 63(4) (1993): 467-488. The frequently-cited synthesis: non-independent categories, absent bimodality, weak predictive validity. (Peer-reviewed review)
Robert M. Capraro & Mary Margaret Capraro, “Myers-Briggs Type Indicator Score Reliability Across Studies: A Meta-Analytic Reliability Generalization Study,” Educational and Psychological Measurement 62(4) (2002): 590-602. The continuous scales show reasonable internal consistency and test-retest reliability (medians ~.69-.78), while the dichotomized type classification is far less stable. The honest counterweight, correctly scoped to scales not types. (Meta-analysis)
Bradley T. Erford et al., “A 25-Year Review and Psychometric Synthesis of the Myers-Briggs Type Indicator (MBTI) - Form M,” Journal of Counseling and Development (2025). Reports strong internal consistency and respectable continuous test-retest reliability for Form M scales; scoped to scale reliability, not type validity. (Recent psychometric synthesis)
Carl G. Jung, Psychological Types (1921). The Jungian source the typology is loosely derived from; read to see how far the instrument departs from its claimed origin. (Foundational source of the typology idea)
Merve Emre, The Personality Brokers: The Strange History of Myers-Briggs and the Birth of Personality Testing (Doubleday, 2018). The history of the instrument’s creation and rise; explains the gap between popularity and validation. (History)

Noted and set aside, not counted toward the grade: the widely-quoted “~2 million people take the MBTI each year” / “most widely used personality instrument” figures are popularity statistics, not validity evidence, and do not influence the X grade.

Was this page helpful?

Thinking Framework Skills v0.8.0 · 56 frameworks