Machine Learning for Credit Scoring: Improving Logistic Regression with Non Linear Decision Tree Effects

Mercredi | 2018-02-08
Salle B103 – 12h00

Sullivan HUE – Elena-Ivona DUMITRESCU – Christophe HURLIN – Sessi TOKPAVI

Decision trees and related ensemble methods like random forest are state-of-the-art tools in the eld of machine learning for predictive regression and classi cation. However, they lack interpretability and can be less relevant in credit scoring applications, where decision-makers and regulators need a transparent linear score function that usually corresponds to the link function in logistic regressions. In this paper, we propose to improve the framework of logistic regression by using information from decision trees. Formally, rules extracted from various short-depth decision trees built with different sets of predictive variables (singletons and couples) are considered as predictors in a penalized or regularized logistic regression. By modeling such univariate and bivariate threshold effects we achieve significant improvement in model performance. Applications using simulated and real data sets for credit scoring show that the new method outperforms traditional logistic regression. Moreover, it compares competitively to random forest, while providing an interpretable scoring function.