Bias in leadership assessment: where it hides and how to reduce it

Assessment is meant to make leadership decisions fairer. Yet bias enters through instrument design, rater judgement, similarity effects, and cultural assumptions. Here is where it hides in common practice, and the practical steps that reduce it.

Dr Eric Albertini · Co-Founder, CapabilityFX29 April 2026 · 8 min read · 1 views

Photo by Redd Francisco on Unsplash

Assessment is sold as the antidote to bias. Swap a manager's gut feel for a validated instrument, the argument goes, and leadership decisions become fairer and more objective. The promise is real. The trouble is that bias does not disappear when you introduce a tool. It moves. It travels from the manager's intuition into the design of the instrument, the judgement of the rater, and the assumptions baked into what "good leadership" is taken to look like. The decision feels more objective because it now arrives with a score attached. Whether it is actually fairer is a separate question, and one that too few organisations stop to ask.

Why objectivity is not the same as fairness

A score is a number, and numbers carry an air of neutrality that human judgement does not. That air is exactly what makes assessment risky when it is treated as a verdict rather than as evidence. A biased process does not announce itself. It produces clean, confident, well-formatted output, and the people relying on it have no reason to doubt it because the messiness of human judgement has been hidden inside the method.

This matters more in some settings than others. In a South African leadership population, where teams routinely span several home languages, cultural frames, educational backgrounds, and lived experiences of the same workplace, an instrument's hidden assumptions get exercised hard. A tool that quietly equates leadership presence with a particular communication style, or readiness with a particular career path, will read a diverse group unevenly. It will not flag that it is doing so. It will simply return scores, and the unevenness will look like a finding about the people rather than a property of the tool.

The aim of this piece is not to argue against assessment. It is to be precise about where bias actually enters, because precision is what makes it reducible. Vague worry about fairness leads nowhere. Knowing that bias hides in four specific places lets you do something about each one.

The four places bias hides

Bias in leadership assessment is not a single failure. It enters at four distinct points, and each calls for a different response.

Instrument design. Every assessment encodes a theory of what good leadership is. That theory was built somewhere, by someone, against some reference population. A tool normed largely on one demographic or one management culture carries that population's patterns as its baseline, and anyone who differs from it reads as a deviation. The bias is upstream of any individual administration. It is in the construct itself, in which behaviours the tool decided to treat as evidence of capability and which it ignored.

Rater judgement. Wherever a human scores, observes, or interprets, the rater's own frame enters the data. This is most visible in 360-degree feedback and structured observation, where the rating reflects the observer as much as the observed. A rater who associates confidence with volume will under-rate a quietly competent leader. A rater having a difficult quarter will mark more harshly. None of this is malice. It is ordinary perception doing what perception does, and it lands in the score as if it were a property of the person being assessed.

Similarity effects. People rate people like themselves more favourably. The affinity is often unconscious and it is well documented in selection research: assessors give higher marks to candidates who share their background, communication style, or career path. In leadership assessment this is corrosive precisely because leadership populations are not yet diverse at the top, so the raters and the reference points skew toward an incumbent profile. The leader who resembles the people already in charge gets a quiet, invisible tailwind.

Cultural assumptions. Closely related, but worth separating. Many instruments treat culturally specific behaviour as if it were universal. Direct verbal assertiveness, comfort with self-promotion, a particular relationship to hierarchy: these vary across cultures, and an instrument built around one set of norms will systematically misread leaders raised in another. In a country as plural as South Africa, this is not an edge case. It is the daily condition of most diverse teams.

A capability lens helps, but does not exempt you

Part of what makes bias durable is shallow measurement. When an assessment reads only surface traits and stated preferences, it leans heavily on presentation, and presentation is exactly where cultural and similarity effects do their work. A leader who presents in the expected register scores well. One who does not gets under-read, and the tool cannot tell the difference between a genuine gap in capability and a mismatch with its own assumptions. We have written separately about how this plays out when assessments measure the wrong thing entirely.

Moving toward capability rather than disposition reduces some of this exposure. An assessment that examines what a leader actually does under load is harder to fool with self-presentation than one that scores a self-reported style. This is why the instruments we work with are built to read beneath the surface. Ennea International's Five Lens Development Platform, which we are licensed to facilitate, reads motivation and identity rather than collapsing a person into a presented type, which makes it less hostage to whether a leader performs the expected leadership manner. The future-readiness assessment developed by Tomorrows Compass, for which CapabilityFX is a licensed measurement partner, looks forward at readiness for conditions to come rather than rewarding fluency with the conditions an incumbent profile already mastered. You can see how we combine them on the assessments page.

A capability lens helps. It does not exempt anyone. Every instrument still encodes a construct, still depends on skilled interpretation, and can still be administered carelessly. The honest position is not that the right tool removes bias. It is that the right tool, used with discipline, reduces it, and that the discipline matters as much as the tool. Our DUAL model (Discover, Understand, Accept, Lead) applies as much to an assessment process as to a leader: the work begins with discovering and accepting the uncomfortable possibility that your own process is reading people unevenly.

What it looks like in the room

Bias is abstract until you watch it score a real person.

The articulate graduate and the quiet supervisor. Consider two candidates for the same first-line leadership role in a logistics operation. One is a recent graduate, fluent in the language of the assessment centre, quick to volunteer answers, comfortable with the self-promotion the exercises implicitly reward. The other is a shift supervisor who came up through the floor, speaks English as a third language, and is sparing with words. On a self-report and a lightly structured panel, the graduate scores higher across the board. Watch them at work and a different picture appears: the supervisor's shift has the lowest incident rate and the highest retention in the depot, built on a steady, low-drama way of holding people to account. The assessment did not measure that. It measured fluency in its own format, and mistook fluency for capability. The bias was not in any rater's intent. It was in a design that rewarded presentation and a panel that, sharing the graduate's frame, found him easy to read.

The 360 that measured the rater. Consider a divisional head in a financial services firm whose 360-degree feedback came back markedly lower than her peers'. The instrument was sound and widely used. When the results were unpacked, a pattern emerged in who had rated her. Her toughest scores came from a cluster of long-tenured managers whose own way of leading was assertive and visible, and who read her more consultative, deliberate style as a lack of decisiveness. Her quieter approach was not weaker. It was different, and it was being marked down by raters for whom "leadership" meant something that looked like themselves. A multi-rater tool had not removed individual bias. It had aggregated it, then presented the average as an objective reading of her. Separating the rating from the rater turned a damaging verdict back into useful, contextual evidence.

Across both, the same lesson holds. The number looked objective and was not. The fix in each case was not a better single instrument. It was a process built to catch the bias the instrument could not see on its own.

Four practices that reduce it

You cannot eliminate bias from assessment. You can build a process that systematically reduces it, and the practices that do so are unglamorous and well evidenced.

Structure the criteria before you assess

Decide what you are looking for, in observable terms, before anyone is scored, and apply the same criteria to everyone. Unstructured judgement is where similarity effects and cultural assumptions flourish, because an undefined standard quietly defaults to the assessor's own picture of a leader. Structured criteria, defined in advance and tied to behaviour rather than impression, are the single most effective guard against rater drift. The discipline is in writing the standard down and holding to it, not in revising it once you have seen who is in the room.

Use multiple methods, never a single tool

No instrument is unbiased, so do not let any one of them stand alone. Combining methods that fail in different ways, a depth instrument with a forward-looking behavioural one, self-report with observed behaviour, lets the strengths of one offset the blind spots of another. Where two well-chosen methods disagree, you have found something worth examining rather than a number to act on. We set out how to choose and combine tools for this reason in our guide to choosing a leadership assessment.

Diversify and calibrate the raters

Wherever humans rate, the composition of the rater pool shapes the result. A panel or a 360 sample drawn from a single demographic or a single leadership culture will encode that group's assumptions as the standard. Widen the pool deliberately. Then calibrate: have raters score against the agreed criteria, compare where they diverge, and surface the pattern. Calibration does not make raters identical. It makes their biases visible, which is the point. A bias you can see is a bias you can correct for.

Validate in your own context

A tool's published validity was established somewhere, on someone. Ask whether that evidence applies to your sector, your role levels, and your cultural setting, because a tool normed elsewhere may not transfer cleanly to a diverse South African leadership population. Then build your own feedback loop. Track whether the leaders the assessment rated highly actually went on to lead well in your context, and whether anyone it under-rated outperformed the reading. Local validation is how you find out, in your own setting, whether the instrument is fair before you have trusted it with too many decisions.

The questions worth asking first

Before you next run an assessment, put your own process to a short test.

Where does a human judgement enter, and whose? Trace every point where a person scores, observes, or interprets. Ask whether that pool reflects the diversity of the people being assessed, or an incumbent profile.
What does this tool treat as evidence of leadership, and who built that picture? If the construct rewards a particular communication style or career path, anyone who differs is reading as a deviation from a baseline, not a finding.
Would this process survive a diverse panel disagreeing? If a single score decides the outcome, you have nowhere to catch a biased reading. If two methods and a calibrated panel have to converge, you do.
Have we checked it against our own outcomes? If you cannot say whether the people this tool rated highly actually led well in your context, you are trusting borrowed validity.

These questions sit at the front of how we structure assessment within the CapabilityFX method. The work is not to find a perfect instrument. It is to build a process honest enough to catch its own errors.

Fairer is a practice, not a purchase

Bias in leadership assessment is not solved by buying a better tool, and it is not an argument for trusting instinct over evidence. Instinct is more biased, not less. The path to fairer leadership decisions runs through discipline: structured criteria set in advance, multiple methods that fail differently, a diverse and calibrated rater pool, and validation in your own context. None of it is dramatic. All of it is the difference between a process that looks objective and one that is actually fairer to the range of people it assesses. If you want to pressure-test how your organisation assesses leaders for bias, start a conversation.

The leaders described here are representative composites drawn from patterns we observe in practice, not identifiable individuals. References to assessment validity describe established principles in selection and assessment research, applied to the CapabilityFX approach.

Dr Eric Albertini · Co-Founder, CapabilityFX

Originator of the DUAL model, developed through his doctoral research at the University of Johannesburg. Eric has spent his career building leadership capability inside executive teams.