Yes — structured interviews predict job performance markedly better than unstructured ones, and this is one of the most replicated findings in the hiring literature. McDaniel, Whetzel, Schmidt & Maurer (1994), a meta-analysis of 245 coefficients from 86,311 people, found a corrected validity of about .44 for structured interviews versus .33 for unstructured ones (with situational interviews highest at about .50, job-related at .39, and psychologically-oriented at .29). The 2022 reanalysis widened the gap further: Sackett et al. estimate .42 for structured and .19 for unstructured interviews. Either way, the unstructured interview — the free-flowing “let’s just chat and see if I like them” conversation that most small employers default to — is among the weakest tools in common use.

Why structure works. An unstructured interview lets first-impression bias, halo effects, similarity bias, and confirmation bias drive the rating, and it compares candidates on different questions, so the “data” is noise. Structure raises inter-rater reliability, and reliability sets the ceiling on validity.

What actually makes an interview “structured.” Campion, Palmer & Campion (1997) catalogued 15 components; Levashina et al. (2014) distilled the evidence to a practical core. The components that matter most:

Base the questions on a job analysis — ask about things the role actually requires.
Ask every candidate the same questions in the same order.
Use better question types — situational (“what would you do if…”) or behavioural (“tell me about a time you…”) questions, not trivia or rapport chit-chat.
Score each answer as it is given, on anchored rating scales — i.e. behaviourally-anchored scales where each point has a concrete example, rather than an overall gut-feel rating after the fact.
Use multiple trained interviewers who rate independently, then combine, to cancel out individual bias.
Take notes and document the basis for each rating.

Structure is a dial, not a switch: each increment of structure adds validity. You do not need software or a psychologist — you need a question set tied to the job, a scoring guide, and an hour of interviewer calibration.

One honest caveat. Structured-interview validity is the most variable among the top predictors (the .42 estimate carries an 80% credibility interval of roughly .24-.66). A structured interview can still be built poorly. The components above are what move it toward the high end.

For an Ontario SMB, this is the highest-leverage, lowest-cost change you can make: you are almost certainly already interviewing, so the marginal cost of structuring it is a few hours of prep. Keep the questions job-related and consistently applied — this is also what makes the interview more defensible under the human-rights and accommodation rules covered in the Compliance cluster. Pair it with a second method, as set out in the note on combining selection methods.

Related notes

What actually predicts job performance: the selection-method validity hierarchy (post-2022) — Structured interviews, job-knowledge tests, empirically-keyed biodata, and work samples now predict job performance at or above general mental ability, whose operational validity was revised down from ~.51 to ~.31 in 2022 — so any ranking that still puts cognitive ability on top is outdated.

Why combine selection methods? Incremental validity and the cost of the interview-only hire — No single method predicts performance well enough on its own, but methods that tap different things add incremental validity — a structured interview plus a cognitive or work-sample measure pushed composite validity above .60 in the classic data — which is why hiring on one unstructured interview is the weakest defensible basis for a decision.

Which hiring / selection methods actually predict job performance? — On the best current peer-reviewed evidence (Sackett, Zhang, Berry & Lievens, 2022), structured interviews are the single strongest predictor of job performance (operational validity ≈ .42), ahead of job-knowledge tests (.40), empirically-keyed biodata (.38), work samples (.33), and general mental ability/cognitive tests (.31) — a major reordering from Schmidt & Hunter's widely-cited 1998 ranking, which over-stated cognitive ability at .51 due to range-restriction overcorrection.