Predicting Individual Turnover

In this post we will predict the factors that affect the individual employee turnover. Because our response variable is the turnover status, and its value in the dataset is binary type (1=leave, 0=stay), we will apply logistic regression to investigate the data.

The analysis produces three types of information. The first one is called Nagelkerke R-square statistics. Nagelkerke R-square figure is an indicator of the degree of variance in the dependent variable that is accounted for by the variation in our predictor variables. Here it says that from the six predictor variables, we can account for 5.6% of the turnover across the organization. While it seems a small percentage, we need to consider all of the possible causes of people leaving an organization.

> PseudoR2(logit.ind.tnvr)
 McFadden      Adj.McFadden  Cox.Snell     Nagelkerke    McKelvey.Zavoina 
 3.971238e-02  1.419408e-02  2.973055e-02  5.584941e-02  7.417359e-02 
 Effron        Count         Adj.Count     AIC           Corrected.AIC 
 3.120676e-02  NA            NA            1.234203e+03  1.234497e+03

The second set of figures are the coefficients of the predictor variables. We can determine the significant variables from their p-value. In our case, “gender” and “appraisal rating” are the factors because their p-value are less than 0.05.

glm(formula = LeaverStatus ~ ., family = binomial(link = "logit"), 
 data = emp)

Deviance Residuals: 
 Min 1Q Median 3Q Max 
-1.1041 -0.5538 -0.4593 -0.3760 2.3990

                    Estimate      Std. Error    z value   Pr(>|z|) 
(Intercept)         0.168371      0.656593      0.256     0.797618 
BossGender          -0.183855     0.163275     -1.126    0.260147 
Gender              -0.608303     0.161704     -3.762    0.000169 ***
Age                 -0.001347     0.009392     -0.143    0.885962 
LengthOfService    -0.017194      0.010900     -1.577    0.114683 
AppraisalRating    -0.248002      0.097284     -2.549    0.010795 * 
CountryBelgium     -0.249247      0.603715     -0.413    0.679712 
CountrySweden       0.001673      0.593760      0.003    0.997752 
CountryItaly       -0.058845      0.516308     -0.114    0.909259 
CountryFrance      -0.455198      0.624276     -0.729    0.465903 
CountryPoland      -0.734433      0.648954     -1.132    0.257753 
CountryMexico      -0.415403      0.596774     -0.696    0.486379 
CountrySpain       -0.571556      0.535051     -1.068    0.285418 
CountryUK          -0.505547      0.470520     -1.074    0.282624 
CountryUS          -0.698382      0.433822     -1.610    0.107433 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1254.0 on 1649 degrees of freedom
Residual deviance: 1204.2 on 1635 degrees of freedom
 (3 observations deleted due to missingness)
AIC: 1234.2

Number of Fisher Scoring iterations: 5

The third statistic figure it produces is odds ratio (the exp() function below). Odds ratio is the ratio of the odds of the event occurring to it not occurring. In the turnover example, the odds ratio is the likelihood or odds of leaver status = 1 if the value of a predictor variable is increased by one unit.

For example, with “Gender”, the odds ratio is 0.544. It means that the probability of leaving / probability of staying = 0.544 if the gender = 1 (men). Because the odds ratio is less than 1, we know that the probability of staying is greater than the probability of leaving if the employee is man. On the other hand, women are more likely to leave. We could present this in a number of ways:

  • The odds of men leaving the organization is 0.544 to 1.
  • The odds of men staying the organization is 1.838 (1/0.544) to 1.
  • Women are 1.838 times more likely to leave the organization than men.
> exp(logit.ind.tnvr$coefficients)
 (Intercept)         BossGender    Gender        Age            LengthOfService 
 0.9223082           0.8320562     0.5442737     0.9986539      0.9829526 
 AppraisalRating     CountrySweden CountryItaly  CountryFrance  CountryPoland 
 0.7803582           1.2852077     1.2097357     0.8138732      0.6155830 
 CountryMexico       CountrySpain  CountryUK     CountryUS      CountryAustralia 
 0.8469146           0.7244742     0.7739099     0.6381798      1.2830591

The other significant predictor is “appraisal rating” which odds ratio is 0.78. The probability of leaving / probability of staying = 0.78 if the appraisal rating goes up one point. In other words, employees are more likely to stay if they received higher ratings. We can rephrase it in the following ways:

  • Individuals who got a “3” performance rating are 1.282 (1/0.78) times more likely to to stay than those who got a “2” performance rating.
  • Individuals who got a “4” performance rating are 28.2% more likely to to stay the following year than those who got a “3” performance rating.

The conclusion we can draw is that country differences do not come out as significant in accounting for turnover. However, women are twice more likely to leave than men. Also, a higher appraisal rating will increase the chances of employee staying. Thus, women who received a lower rating have a higher risk of leaving the company than the other employees.

Complete data file and source code in Github

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.