The 9-box grid: what it is good for, and where it misleads

The 9-box grid is the most widely used talent-review tool there is, and for good reason. It forces a calibration conversation and gives a room one shared language. Here is what it does well, where it quietly misleads, and how to use it without being used by it.

Ricardo Albertini · Co-Founder, CapabilityFX7 June 2026 · 9 min read

Photo by Memento Media on Unsplash

Somewhere this quarter, a room full of senior people will spend two hours moving names around a three-by-three grid. Someone will be a "star". Someone else will be "solid". One name will sit, a little awkwardly, in the bottom corner. By the time the meeting ends, those labels will feel like findings. They are not findings. They are the output of a conversation, and the conversation is worth far more than the box.

What the grid is, and why it endures

The 9-box grid plots people on two axes: current performance along one, future potential along the other. Each axis is cut into low, medium, and high, which gives the familiar three-by-three of nine cells. A person lands in one cell, and that cell carries a shorthand: high performance and high potential in one corner, low and low in the other, with the recognisable middle inhabited by the steady contributor.

It is associated with the talent-review work McKinsey developed for GE around the early 1970s, and it has endured for roughly half a century because it does something genuinely useful. It is worth being clear about what it does well before saying anything about where it misleads, because the criticism only makes sense once the value is on the table.

It forces a calibration conversation. The single most valuable thing the grid does has almost nothing to do with the grid itself. To place a name, a group of managers has to say out loud what they think a person is capable of, and then defend it to peers who may see that person differently. That conversation surfaces disagreement that would otherwise stay buried in separate heads. One manager rates someone a star, another has watched the same person struggle the moment a problem turned unfamiliar, and the gap between those two views is exactly the thing a talent review exists to examine. The grid is the excuse to have the argument. The argument is the point.

It gives a room one shared language. Before the grid, every manager carries a private and slightly different idea of what "high potential" means. The grid imposes a common frame, two axes everyone is looking at together, so that for one afternoon a diverse group is at least arguing about the same two questions rather than talking past each other. That shared language is not trivial. It is the difference between a structured review and nine people swapping anecdotes.

It is simple enough to actually use. A tool that needs a manual does not get used. The grid fits on a whiteboard, a new manager can grasp it in two minutes, and it scales from a team of six to a function of six hundred. Simplicity is a feature, not a compromise. Plenty of more sophisticated instruments gather dust because they are too heavy to run at the cadence a real talent review needs.

So the grid is a good convening device. Hold on to that, because the rest of this piece is about its limits, and the limits are easy to mistake for an attack on the tool. They are not. A scalpel is excellent and still cuts the careless hand that holds it.

Where it quietly misleads

The grid's weaknesses are not failures of the grid. They are failures of how the grid is read. The trouble starts the moment a placement stops being treated as the record of a conversation and starts being treated as a measurement.

Potential is judged subjectively, then displayed as if it were measured. Performance, the horizontal axis, has some real evidence behind it: targets, delivery, a track record. The vertical axis, potential, usually has nothing of the kind. It is a judgement, often formed in seconds, frequently a polite restatement of how much the person reminds the assessor of themselves at that stage. The grid then renders that soft judgement in exactly the same visual register as the harder one. Two axes, identical gridlines, equal weight. The format quietly promotes a guess to the status of a finding, and the eye believes the format. This is the deeper problem we set out in our piece on assessing potential versus performance: performance and potential are different constructs, and reading one off the other is the most common error in talent decisions. The grid does not cause that error, but its layout makes the error look rigorous.

Labels stick long after the evidence has moved. Once a person is "in the bottom row", the label tends to follow them. The next year's reviewers see last year's placement, anchor on it, and look for evidence that confirms it. Stretch assignments, sponsorship, and the benefit of the doubt flow to the top-corner names, which means the placement starts to produce the very outcomes it claimed only to describe. A label that should be a snapshot becomes a self-fulfilling forecast. The person rated low gets fewer chances to prove otherwise, and the thinness of their record next year is then read as confirmation rather than consequence.

A snapshot gets mistaken for a verdict. A grid placement is a reading taken on one day, by people with partial information. It is a photograph, not an X-ray. Treated as a verdict, it freezes a person at a single moment and ignores the thing that matters most for potential, which is direction of travel. Someone climbing fast from a low base and someone sliding from a high one can occupy the same cell. The grid cannot tell them apart, because it has no axis for momentum.

Recency and halo effects do the placing. Human judgement made quickly is shaped by what is most recent and most vivid. A strong final quarter lifts someone above a mediocre year. One impressive presentation halos across to "high potential" in domains the presenter has never been tested in. A single visible stumble drops a steady performer a row. None of this is in the data. All of it ends up on the grid, because the grid records the judgement, not the reality the judgement was supposed to capture.

Notice that every one of these is a reading error, not a design flaw. The grid is honest about what it is: a way to organise a conversation. It becomes misleading only when the picture it produces is treated as more certain and more permanent than the soft judgements that went into it.

How a good talent review uses it

The fix is not to throw the grid away. It is to keep the grid as the convening device it is good at being, and to put real instruments underneath the axis it cannot measure. That axis is potential, and potential is where placements go wrong.

This is where the underlying philosophy matters. Potential is not more performance. It is the capacity to grow into demands a person has not yet faced, and that capacity lives at the level of who someone is, not only what they can currently do. Lasting capability is built inside-out, which is the foundation of our DUAL model: Discover, Understand, Accept, Lead. A grid placement made without any read on a person's self-awareness, their relationship to hard feedback, or their fixed or growable self-image is a placement made blind to the very things that determine whether the potential rating is true.

So the practical move is to bring evidence to the vertical axis before the meeting, not to invent it during the meeting. For the developmental and identity-level read, the patterns that govern whether someone can actually grow, we use Ennea International's Five Lens Development Platform. It reads a person at the level of motivation rather than collapsing them into a single dot, and CapabilityFX is licensed to use it; the framework belongs to Ennea International. For the forward-looking read, whether a person is equipped for conditions they have not yet met, we work with the Tomorrows Compass future-readiness assessment, which Tomorrows Compass owns and CapabilityFX measures with as a licensed partner. Both sit behind the grid rather than replacing it. You can see how they fit together on our assessments page. The grid still convenes the conversation. The assessments give the potential axis something other than a hunch to stand on.

A few habits separate a review that uses the grid from one the grid uses:

Make people show their working. No placement without a sentence of evidence behind each axis. "High potential because" forces the room to notice when the only reason is a strong recent quarter or a personal resemblance.
Date the placement and review the drift. Treat every cell as a reading taken on a day, and explicitly compare it to last year's. Who moved, and what actually changed? Momentum is the information the static grid throws away.
Separate the two conversations. Discuss performance with performance evidence, then change gears and discuss potential with potential evidence. When the two are argued at once, the harder performance data quietly does the work for both, and potential is never really examined.
Treat the bottom rows as questions, not sentences. A low placement should trigger a development conversation, not a quiet write-off. Often the placement is telling you about a mismatch of role and strength, which is fixable, rather than a ceiling, which is not.

What it looks like in practice

The grid's failure mode and its proper use are both easiest to see in named cases.

The corner star who was a recency effect. A commercial manager in a packaging business closed an unusually large account in the final quarter before the annual talent review, and walked into the meeting a "high performance, high potential" star in the top corner. The placement felt obvious to everyone in the room, which is exactly why nobody interrogated it. What the grid hid was that the deal had been substantially inherited from a departing colleague, and that the manager's pattern over three years was steady rather than exceptional, with a marked tendency to avoid the harder conversations that leadership at the next level would demand. A developmental assessment surfaced the avoidance quickly; it was an identity-level pattern, not a skills gap, and no sales figure would ever have shown it. The grid had not lied. It had faithfully recorded a judgement made in the afterglow of one good quarter, and displayed it as if it were a measurement.

The bottom-row name who was climbing. In the same review, an operations team leader sat in the bottom row, placed there a year earlier after a rough patch managing a site through a botched system migration. The label had followed her. This year's reviewers anchored on last year's cell and were ready to confirm it, until one manager asked the dated-placement question: what has actually changed since? A great deal had. She had run the next two transitions cleanly, was now the person other site leads called for advice, and a forward-looking read showed unusually high learning agility, the capacity to make sense of unfamiliar situations and draw the right lesson fast. The grid had frozen her at her worst quarter. The momentum question unfroze her. Had the room read the cell as a verdict rather than a snapshot, she would have spent another year overlooked for the exposure she was plainly ready for.

One placement erred high and the other erred low, yet both failures trace back to the same root. The star was overrated because a vivid recent event haloed across an axis the grid could not actually measure. The bottom-row leader was underrated because a sticky label outlived the evidence behind it. In both, the tool did its job. It recorded the conversation. The error lived in treating the record as a finding. To see the same dynamic worked through in real organisational settings, browse our use cases.

The reader's next step

Before your next talent review, it is worth putting the meeting itself to a short test, not the grid.

Does each placement come with evidence on both axes, or only one? If the potential rating has nothing behind it but a feeling, you are displaying a guess in the same frame as a measurement. Name what you actually know about the person's capacity for a role they have not yet held.
Are you comparing this year's cell to last year's? A placement with no date and no comparison is a snapshot pretending to be a verdict. The drift between readings is the most useful information in the room, and a static grid hides it.
Whose recent quarter or single presentation is doing the placing? Ask the room to separate what someone has demonstrated over time from what they did most recently and most vividly. Recency and halo are not data, but they end up on the grid all the same.
Are the bottom rows triggering development, or a quiet write-off? A low cell should open a conversation about fit and growth, not close one. The label sticks only if you let it.

If the honest answers reveal that your placements rest mostly on recent performance and gut feel about potential, that is the gap. It is fixable, and the fix is not a better grid. It is real evidence on the axis the grid cannot measure, fed in before the conversation rather than improvised during it. Our 4D method sets out how that evidence is built into how leaders are actually developed, rather than assumed at review time.

Use the grid, do not be used by it

The 9-box grid is a good tool that has earned its longevity. It convenes a calibration conversation that would otherwise never happen, and it gives a room one shared language for a hard set of judgements. None of that is in question. What is in question is the quiet promotion of a soft judgement to the status of a finding, the label that outlives its evidence, and the snapshot mistaken for a verdict. Keep the grid for what it is good at, convening the conversation, and put real assessment behind the axis it cannot measure. If you want to see what an evidence-led potential read looks like sitting underneath your own talent review, start a conversation. For the deeper argument on why performance and potential are different things in the first place, the companion piece is here.

The leaders described here are representative composites drawn from patterns we observe in practice, not identifiable individuals.

Ricardo Albertini · Co-Founder, CapabilityFX

Ricardo Albertini is a co-founder of CapabilityFX. His career spans leadership consulting, EdTech, FinTech, and media across South Africa and internationally. He launched Africa's first multiplayer VR training tool and has designed development programmes for some of the country's largest financial and automotive organisations. He holds certifications in team performance and Enneagram-based coaching, and writes about what it takes to build capability that lasts.

The 9-box grid: what it is good for, and where it misleads

What the grid is, and why it endures

Where it quietly misleads

How a good talent review uses it

What it looks like in practice

The reader's next step

Use the grid, do not be used by it

New thinking, when it lands.

How to choose a leadership assessment that predicts, not just describes

The leadership assessment tools landscape: what each kind is good for

The limits of competency frameworks (and where they still earn their place)