Adverse impact and the validity-diversity tradeoff
Cognitive-ability tests carry the largest subgroup score gaps (Black-White d ≈ 1.0) while structured interviews and work samples show smaller gaps at comparable or better validity — and because the 2022 reanalysis lowered cognitive ability's validity, dropping or de-weighting it now costs far less validity than the old tradeoff implied.
“Adverse impact” means a selection method screens out a protected group at a substantially higher rate than others. The historic dilemma — the validity-diversity tradeoff (Ployhart & Holtz, 2008) — was that some of the most valid predictors also produced the largest subgroup score gaps, so reducing adverse impact seemed to require sacrificing accuracy.
The size of the gaps. Cognitive-ability tests show the largest Black-White standardized mean difference. Roth, BeVier, Bobko, Switzer & Tyler (2001) put it at about d ≈ 1.0 for tests of general ability among job applicants in corporate settings — considerably larger than the Black-White gap in measured job performance itself. Structured interviews show much smaller gaps: Huffcutt & Roth (1998) report Black-White differences around d ≈ 0.23-0.56, depending on how cognitively loaded the questions are. Work samples were long assumed to be low-impact, but Roth, Bobko, McFarland & Buster (2008) found the gap “markedly larger for samples of job applicants (d = .73)” than the long-quoted incumbent-based value of about d = .38.
What the 2022 recalibration changed. Because cognitive ability’s validity was revised down to .31 — no longer the stand-out predictor — the old “you must accept adverse impact to keep validity” framing weakens. Berry, Lievens, Zhang & Sackett (2024) reran the tradeoff with the updated matrix and concluded that excluding cognitive-ability tests generally has little to no effect on overall validity but substantially reduces adverse impact. In other words, the tradeoff conversation should shift away from GMA toward the other methods. They are careful to note this does not make the dilemma vanish entirely — some validity reduction may still be needed to reach a given impact ratio — but the cost of de-weighting cognitive tests is far smaller than once believed.
The practical implication. A structured interview plus a work sample can reach validity comparable to a battery anchored on a cognitive test, with less adverse-impact exposure. For most SMBs that is the better-aligned choice on both accuracy and fairness.
For an Ontario SMB, this is not optional good practice — the duty not to discriminate in hiring is set out in the Ontario Human Rights Code, and the duty to accommodate disability in the selection process (including testing) flows from the Code and the Accessibility for Ontarians with Disabilities Act (AODA). Those obligations, and how to run a defensible accommodation process, are covered in the Compliance cluster; this note only addresses the empirical size of subgroup differences across methods. Choosing lower-impact, equally-valid methods is how the science and the legal duty point in the same direction. This is general information, not legal advice.
Source: Ployhart & Holtz, "The diversity-validity dilemma," Personnel Psychology, 2008 ·
Last reviewed .
Confidence: Directional
Related notes
- What actually predicts job performance: the selection-method validity hierarchy (post-2022) — Structured interviews, job-knowledge tests, empirically-keyed biodata, and work samples now predict job performance at or above general mental ability, whose operational validity was revised down from ~.51 to ~.31 in 2022 — so any ranking that still puts cognitive ability on top is outdated.
- Work-sample, ability, and job-knowledge tests: how well they predict and when to use them — Job-knowledge tests (.40) and work samples (.33) are strong, job-specific predictors and now sit at or above cognitive ability (.31) — but work-sample validity was revised down sharply in 2022, work samples and knowledge tests only work for candidates who already have the skills, and cognitive tests carry the largest adverse-impact risk.
- Do structured interviews predict performance better than unstructured ones — and what makes an interview "structured"? — Structured interviews substantially out-predict unstructured ones — McDaniel et al. (1994) put corrected validity at about .44 vs .33, and Sackett et al. (2022) at .42 vs .19 — and "structure" means job-analysis-based questions asked identically of every candidate, scored on anchored rating scales by trained raters.
- Which hiring / selection methods actually predict job performance? — On the best current peer-reviewed evidence (Sackett, Zhang, Berry & Lievens, 2022), structured interviews are the single strongest predictor of job performance (operational validity ≈ .42), ahead of job-knowledge tests (.40), empirically-keyed biodata (.38), work samples (.33), and general mental ability/cognitive tests (.31) — a major reordering from Schmidt & Hunter's widely-cited 1998 ranking, which over-stated cognitive ability at .51 due to range-restriction overcorrection.