When it comes to recruitment and selection of new employees, one of the key focuses at the forefront of an HR analyst’s priorities is to minimize bias as much as possible. In this post, we will examine diversity-related analysis on recruitment and selection activities. For example, what are the gender – male and female ratios of people who are shortlisted, interviewed, and ultimately offered a job? Also, the analyst may wish to conduct a similar analysis linked to ethnicity and other important demographic categorizations other than gender.

The following example uses a dataset that has been compiled in order to conduct inferential testing linked to gender and BAME profiles of the pool of the applications received, versus the applications shortlisted / interviewed, and ultimately those who are offered a job.

The first thing we want to do is to look at the patterns of gender and BAME in the application pool in general. To do this we can do a frequency analysis with the dataset on the following variables:

- Gender (1=male or 2=female)
- BAMEyn (Black, Asian, or Minority ethnic: 1=yes or 2=no)
- ShortlistedNY (0=rejected or 1=shortlisted)
- Interviewed (0=not interviewed or 1=interviewed)
- FemaleOnPanel (1=male only panel or 2=female member on panel)
- OfferNY (1=made an offer or 0=not offered)
- AcceptNY (1=accepted or 0=declined)
- JoinNY (1=joined or 0=not joined)

A key question we want to explore is whether the proportion of those who are shortlisted for interview who are female or BAME are representative of these groups from the application pool. For example, if it is the case that the proportion of female applicants is the same as the proportion of those who are shortlisted, then it suggests that there is no evidence of gender bias. Besides the frequency table, we will also produce Chi-square testing to check if there is a statistically significant evidence of any gender (or BAME) preferences in the process of shortlisting.

Shortlisted-No Shortlisted-Yes Male 40 38 Female 152 50 > chisq.test(gen.short.tab) Pearson's Chi-squared test with Yates' continuity correction data: gen.short.tab X-squared = 13.905, df = 1,p-value = 0.0001923

Shortlisted-No Shortlisted-Yes BAME-Yes 102 19 BAME-No 90 69 > chisq.test(bame.short.tab) Pearson's Chi-squared test with Yates' continuity correction data: bame.short.tab X-squared = 23.184, df = 1,p-value = 0.000001472

From the above results, we can be confident that there are definitely pro-male and pro-non-BAME applicant preferences in the shortlisting process.

Next we will take gender and BAME variables into account at the same time to predict shortlisting. Logistic regression is the method we will use to apply to the dataset since the response variable is binary.

The analysis below proves that both factors are statistically significant and account for whether the applicants are shortlisted. The odds ratio will give us more on how both are related to the response variable – shortlisted.

Call: glm(formula = ShortlistedNY ~ BAMEyn + Gender, family = binomial(link = "logit"), data = apps) Deviance Residuals: Min 1Q Median 3Q Max -1.4371 -0.9342 -0.4764 0.9382 2.1130 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.2432 0.6777 -1.834 0.0666 . BAMEyn 1.5157 0.3096 4.895 0.000000983 *** Gender -1.1957 0.3006 -3.978 0.000069441 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The odds ratio for BAMEyn is 4.56. To interpret the number, we can say in the following ways:

- The odds for a non-BAME applicant to be shortlisted is 4.56 to 1.
- The odds for a BAME applicant to be shortlisted is 0.22 (1/4.56) to 1.
- Non-BAME applicants are 4.56 times more likely to get shortlisted than BAME applicants.

> exp(logit.short$coefficients) (Intercept) BAMEyn Gender 0.2884636 4.5525019 0.3025043

To explain the odds ratio for gender variable, we can say one of the followings:

- The odds for women to get shortlisted is 0.30 to 1.
- The odds for men to get shortlisted is 3.33 (1/0.3) to 1.
- Men are 3.33 times more likely to get shortlisted than women.

The results above clearly suggest that there may be some bias in the shortlisting process. It would be important to add additional data to the logistic regression that would help us to get a better picture of whether there is something about the female and BAME applicants that might explain these preferences that is NOT linked to any kind of discrimination; for example, it may be the education background or work experience. Often, an interesting finding that an analysis uncovers will lead to further investigation before the real picture is obtained.

Complete data file and source code in Github