Eliminating Hiring Bias From The Recruitment Process Through Machine Learning

In recent years, machine learning has become widespread as a means to capitalize on data and enable the automation of countless tasks that are routine for people but hitherto intractable for computers. With the ever-accelerating deluge of data — the amount of it stored on our planet doubles every 18 months — the capacity for machine learning to facilitate social benefits, such as increased job satisfaction, more extensive leisure time, and medical breakthroughs, accelerates as well. However, this data deluge comes with risks — such as propagating biases against particular demographic groups — that must be eliminated to ensure an even spread of machine learning’s benefits.

At GQR, we develop machine learning algorithms for use in the field of human resources to automate repetitive tasks and enable insights that would otherwise be impossible or, at best, extremely time-consuming.

As an example, many global corporations receive millions of job applications each year to thousands of different roles. The ideal candidate for a given hard-to-fill role may have applied to a different job at the firm that they’re less suited to. They may have been encountered at a university career fair, or they may have been identified as a strong general prospect via an online search. Using GQR’s machine learning models, this ideal candidate is automatically suggested to the relevant internal recruiter in a newsfeed-style interface we call Talent Vault.

Without this tool, the ideal candidate would in all likelihood have been ignored.

An independent evaluation by a Fortune 500 client determined that — relative to their previous best-practice of using elaborate (and wearying to devise) keyword-based searches — the GQR model identifies 12 times as many strong candidates.

From a pool of 50,000 prospects, for example, our algorithm found 764 strong candidates for a niche role while the best-practice keyword-based search yielded merely 64.

A further advantage of GQR’s results is that each candidate considered for a given role is provided with a “match score” out of 100, meaning that the 764 prospects can be sorted, so the most promising can be contacted first. Meanwhile, the 64 strong prospects suggested by keyword search cannot be sorted and — even worse — were suggested alongside 360 “false positives,” meaning that 85% of the encountered results were irrelevant.

To be so effective, machine learning models like our job-to-candidate matching algorithm must be trained on myriad real-world data points (in our case, hundreds of millions of candidate profiles and job descriptions). However, our clients share a justified concern that these data could contain unwanted biases against particular demographic groups that could then be propagated by our software. To ensure that these unwanted biases are eliminated, we have devised a proprietary process for scrubbing these biases. There are three broad aspects to this process, which are detailed in turn below.

1. Cleaning the Data

Biases can be explicit, as would be the case if a hiring manager consciously excluded a candidate from a job search because of a demographic characteristic such as gender or ethnicity. Biases can also be implicit, in which unconscious prejudices result in a candidate being overlooked or appraised less highly. Before doing any model-building, the data are cleaned to strip out language that could be associated with explicit biases (e.g., personal pronouns) and implicit biases (e.g., writing style) alike.

2. Specialized Modeling

At the heart of our proprietary process is our specialized modeling of the data. At a high level, we have devised a model-training procedure that selects positive examples (i.e., where a candidate is known to be well-suited for a role) and negative examples (i.e., where a candidate is known to be unqualified for a role) in a manner that minimizes the impact of any biases that may have furtively persisted through the data-cleaning steps.

3. Rigorous Testing

Despite our efforts to clean the data and devise models that eliminate unwanted biases, the only way to be certain they’ve been removed is through quantitative, statistical testing of the model’s outputs. To do this, we randomly sampled 400 candidates — a hundred of each of the following ethnicities: Asian, Hispanic, non-Hispanic Black, and non-Hispanic White. Reassuringly, none of our model’s output scores differ by ethnicity (as determined by a standard statistical evaluation called Student’s t-test). Likewise, randomly sampling by gender — samples of 100 women and 100 men — does not result in statistically different scores. To ensure there are no systematic changes to our data or models over time with respect to bias, we run these tests quarterly.

As these above processes illustrate, devising algorithms that stamp out unwanted biases without skimping on accuracy or performance adds time and effort to the machine learning model-design process. When algorithms can have a considerable social impact, as ours do in the human-resources space, investing this time and effort is essential to ensuring equitable treatment of all people.

Dr. Jon Krohn is Chief Data Scientist at GQR and a member of the firm’s multi-disciplinary “ByeBias” working group, which formally aims to “achieve fair and equal employment opportunities through the elimination of conscious and unconscious bias.”

About the Author

Dr. Jon Krohn is GQR’s Chief Data Scientist, based out of New York.

As the Chief Data Scientist, he manages scientists and engineers in order to devise intuitive and efficient machine learning algorithms for embedding within products and processes. Dr. Krohn’s particular specialization is data modeling approaches that involve passing the natural language of billions of documents through deep neural networks.

The algorithms he has designed automate aspects of millions of job applications made worldwide each year. He accelerates hiring managers’ capacity to fill their vacancies and the speed with which recruitment consultants can identify roles that candidates are perfectly suited for.

Blue-chip corporates have done global searches across hundreds of vendors that automate recruitment and Krohn’s models placed first. Third-party investigations of his models have found they offer orders of magnitude accuracy improvements relative to their existing approaches. Large HR tech platforms trust these algorithms that lie behind their screens. Krohn has published his results and applied for a patent, with more patents to come.

Dr. Krohn’s first book, Deep Learning Illustrated, was published in 2019 and became an instant #1 bestseller that was translated into six languages. He’s renowned for his lectures at Columbia University, New York University, the NYC Data Science Academy, prestigious industry conferences, and a range of digital channels including his reliably sold-out 600-person classes in the O’Reilly learning platform. He holds a Ph.D. in neuroscience from Oxford and has published on machine learning in leading academic journals since 2010; his papers have been cited over a thousand times.