In the previous post, we learned from a financial institution that the type of the team – sales or professional service has an impact on the prevalence of the minority groups. The conclusion was that the proportion of the staff with different ethnic backgrounds is significantly lower in sales than in the professional service. In this post, we want to know if we can come up with a model to predict employee diversity across the teams.

There are many easily observed indicators of diversity. The primary ones include gender, ethnicity, age, disabilities, etc. The secondary characteristics include social status, nationality, educational background, etc. The main issue boils down to the availability of data. The dataset we have includes “group size”, “number of team leads”, “number of female team leads”, and “London Or Not” (“Central or Greater London” or “Rest of UK”). So we will apply linear regression to all of them and see if we can remove the ones that have the least impact and then build our predictive model.

**Linear Regression** can help us establish the relationship between the dependent variable and multiple possible predictors or independent variables. In our example, BAME is the dependent variable. For the explanation about BAME, click here. The independent variables are “group size”, “number of team leads”, “number of female team leads”, and “London Or Not”.

Linear Regression can also show us which of the predictors has the greatest impact on the dependent variable. It will give us the key information to identify where the problem areas are in the organization.

The final model looks like the following:

Call:lm(formula = BAME ~ NumberFeMaleTeamLeads + Function + LondonorNot + PercentMale, data = emp.data) Residuals: Min 1Q Median 3Q Max -0.17939 -0.07939 -0.02577 0.06485 0.36470 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.1962486 0.0272103 7.212 1.39e-12 *** NumberFeMaleTeamLeads -0.0007551 0.0036094 -0.209 0.834Function0.0375160 0.0095487 3.929 9.35e-05***LondonorNot-0.0850269 0.0079674 -10.672 < 2e-16***PercentMale -0.0002746 0.0002235 -1.228 0.220 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1032 on 725 degrees of freedom (198 observations deleted due to missingness) Multiple R-squared: 0.1737, Adjusted R-squared: 0.1692 F-statistic: 38.11 on 4 and 725 DF,p-value: < 2.2e-16

R gives us a significant p-value of < 0.001 which means that there is less than 1 in 1,000 chance that we would get this result by chance alone, so the model is significant. It also tells us that Function and LondonOrNot variables have significant impact on BAME. By calculating the standardized beta values of these variables (see the chart below), we know for sure that "LondonOrNot" has greater impact on BAME than the rest of the variables because its absolute value is the biggest of all.

NumberFeMaleTeamLeads Function LondonorNot PercentMale

-0.007130388 0.165380072 -0.360604079 -0.052002819

Thus, combined with the finding from the last post, we can conclude that the company has a significantly higher proportion of BAME within the professional service functional group as compared to the sales teams even when we take into account that the diversity levels tend to be much higher in their London locations than the rest of their UK ones. This is potentially useful information given the fact in terms of the general population, London area tends to draw more people with different ethnic backgrounds than the rest of the regions in UK, according to a report from The Business in the Community.