Work-sample, ability, and job-knowledge tests: how well they predict and when to use them
Job-knowledge tests (.40) and work samples (.33) are strong, job-specific predictors and now sit at or above cognitive ability (.31) — but work-sample validity was revised down sharply in 2022, work samples and knowledge tests only work for candidates who already have the skills, and cognitive tests carry the largest adverse-impact risk.
A work sample asks the candidate to do a representative slice of the actual job — write the code, draft the memo, fix the part, run the till. A job-knowledge test asks what they know about the work. A cognitive-ability (GMA) test measures general reasoning and learning speed. All three are useful, but the 2022 reanalysis changed how they rank.
The numbers (Sackett et al., 2022).
- Job-knowledge tests — .40
- Work-sample tests — .33
- Cognitive-ability / GMA tests — .31
So a work sample and a job-knowledge test now predict performance at or above a cognitive test. Flag the revision honestly: work samples were the headline casualty of the recalibration. Schmidt & Hunter (1998) reported .54; Roth, Bobko & McFarland (2005) — the meta-analysis Sackett et al. relied on — found a mean observed validity of .26 rising to .33 after correcting only for criterion unreliability, “approximately one third less than previously thought.” Anyone quoting .54 for work samples is using the old figure.
When to use each.
- Work samples and job-knowledge tests only work for candidates who already have the skill. You cannot give a work sample to someone you intend to train from scratch — there is nothing yet to sample. They shine for experienced hires and skilled trades.
- Cognitive-ability tests work for inexperienced candidates because they predict how fast someone will learn the job, and their validity rises with job complexity.
The practical tradeoffs.
- Cost and fidelity. A high-fidelity work sample (realistic task, realistic conditions) is more predictive but expensive to build and score; a low-fidelity simulation is cheaper but weaker. Job-knowledge and cognitive tests are cheap to administer at scale.
- Adverse impact. This is the decisive practical difference. Cognitive-ability tests produce the largest subgroup score gaps. Roth, BeVier, Bobko, Switzer & Tyler (2001) put the Black-White standardized mean difference at about d ≈ 1.0 for tests of general ability among job applicants in corporate settings — which translates into the greatest risk of disproportionately screening out protected groups. Work samples are usually lower, but they are not the panacea the textbooks suggest: Roth, Bobko, McFarland & Buster (2008) found work-sample Black-White differences “markedly larger for samples of job applicants (d = .73)” than the long-quoted meta-analytic value of about d = .38 drawn from incumbents. The diversity tradeoff is taken up in the adverse-impact note.
For a K-W SMB hiring into a skilled role, a structured job-relevant work sample is often the most persuasive and defensible single test you can run — candidates accept “show me you can do the work” more readily than an abstract aptitude test, and it ties directly to the job. Reserve cognitive testing for roles where you are hiring for learning potential, and get advice on adverse-impact and accommodation duties (Compliance cluster) before you deploy any standardized test.
Last reviewed .
Confidence: Verified
Related notes
- What actually predicts job performance: the selection-method validity hierarchy (post-2022) — Structured interviews, job-knowledge tests, empirically-keyed biodata, and work samples now predict job performance at or above general mental ability, whose operational validity was revised down from ~.51 to ~.31 in 2022 — so any ranking that still puts cognitive ability on top is outdated.
- Adverse impact and the validity-diversity tradeoff — Cognitive-ability tests carry the largest subgroup score gaps (Black-White d ≈ 1.0) while structured interviews and work samples show smaller gaps at comparable or better validity — and because the 2022 reanalysis lowered cognitive ability's validity, dropping or de-weighting it now costs far less validity than the old tradeoff implied.
- Why combine selection methods? Incremental validity and the cost of the interview-only hire — No single method predicts performance well enough on its own, but methods that tap different things add incremental validity — a structured interview plus a cognitive or work-sample measure pushed composite validity above .60 in the classic data — which is why hiring on one unstructured interview is the weakest defensible basis for a decision.
- Which hiring / selection methods actually predict job performance? — On the best current peer-reviewed evidence (Sackett, Zhang, Berry & Lievens, 2022), structured interviews are the single strongest predictor of job performance (operational validity ≈ .42), ahead of job-knowledge tests (.40), empirically-keyed biodata (.38), work samples (.33), and general mental ability/cognitive tests (.31) — a major reordering from Schmidt & Hunter's widely-cited 1998 ranking, which over-stated cognitive ability at .51 due to range-restriction overcorrection.