A Spatio-Temporal Machine Learning Model for Mortgage Credit Risk: Default Probabilities and Loan Portfolios
ArXiv ID: 2410.02846 “View on arXiv”
Authors: Unknown
Abstract
We introduce a novel machine learning model for credit risk by combining tree-boosting with a latent spatio-temporal Gaussian process model accounting for frailty correlation. This allows for modeling non-linearities and interactions among predictor variables in a flexible data-driven manner and for accounting for spatio-temporal variation that is not explained by observable predictor variables. We also show how estimation and prediction can be done in a computationally efficient manner. In an application to a large U.S. mortgage credit risk data set, we find that both predictive default probabilities for individual loans and predictive loan portfolio loss distributions obtained with our novel approach are more accurate compared to conventional independent linear hazard models and also linear spatio-temporal models. Using interpretability tools for machine learning models, we find that the likely reasons for this outperformance are strong interaction and non-linear effects in the predictor variables and the presence of spatio-temporal frailty effects.
Keywords: Credit Risk, Tree-Boosting, Gaussian Process, Mortgage Default Prediction, Spatio-temporal Modelling, Credit
Complexity vs Empirical Score
- Math Complexity: 9.0/10
- Empirical Rigor: 8.5/10
- Quadrant: Holy Grail
- Why: The paper integrates advanced spatio-temporal Gaussian processes with tree-boosting, involving significant latent variable modeling and non-linear function estimation, indicating high mathematical complexity. It also demonstrates empirical rigor through application to a large U.S. mortgage dataset with comparative backtesting against multiple models, including portfolio-level metrics.
flowchart TD
A["Research Goal<br>Mortgage Credit Risk Modeling"] --> B{"Methodology"}
B --> C["Data Inputs<br>Large U.S. Mortgage Dataset"]
C --> D["Novel ML Model<br>Tree-Boosting + Latent Spatio-Temporal GP"]
D --> E["Computational Process<br>Efficient Estimation & Prediction"]
E --> F["Key Findings"]
F --> G["Predictive Accuracy<br>Superior vs. Linear Models"]
F --> H["Interpretable Drivers<br>Non-linearities & Spatio-Temporal Frailty"]