false

Time Series Feature Redundancy Paradox: An Empirical Study Based on Mortgage Default Prediction

Time Series Feature Redundancy Paradox: An Empirical Study Based on Mortgage Default Prediction ArXiv ID: 2501.00034 “View on arXiv” Authors: Unknown Abstract With the widespread application of machine learning in financial risk management, conventional wisdom suggests that longer training periods and more feature variables contribute to improved model performance. This paper, focusing on mortgage default prediction, empirically discovers a phenomenon that contradicts traditional knowledge: in time series prediction, increased training data timespan and additional non-critical features actually lead to significant deterioration in prediction effectiveness. Using Fannie Mae’s mortgage data, the study compares predictive performance across different time window lengths (2012-2022) and feature combinations, revealing that shorter time windows (such as single-year periods) paired with carefully selected key features yield superior prediction results. The experimental results indicate that extended time spans may introduce noise from historical data and outdated market patterns, while excessive non-critical features interfere with the model’s learning of core default factors. This research not only challenges the traditional “more is better” approach in data modeling but also provides new insights and practical guidance for feature selection and time window optimization in financial risk prediction. ...

December 23, 2024 · 2 min · Research Team

A Spatio-Temporal Machine Learning Model for Mortgage Credit Risk: Default Probabilities and Loan Portfolios

A Spatio-Temporal Machine Learning Model for Mortgage Credit Risk: Default Probabilities and Loan Portfolios ArXiv ID: 2410.02846 “View on arXiv” Authors: Unknown Abstract We introduce a novel machine learning model for credit risk by combining tree-boosting with a latent spatio-temporal Gaussian process model accounting for frailty correlation. This allows for modeling non-linearities and interactions among predictor variables in a flexible data-driven manner and for accounting for spatio-temporal variation that is not explained by observable predictor variables. We also show how estimation and prediction can be done in a computationally efficient manner. In an application to a large U.S. mortgage credit risk data set, we find that both predictive default probabilities for individual loans and predictive loan portfolio loss distributions obtained with our novel approach are more accurate compared to conventional independent linear hazard models and also linear spatio-temporal models. Using interpretability tools for machine learning models, we find that the likely reasons for this outperformance are strong interaction and non-linear effects in the predictor variables and the presence of spatio-temporal frailty effects. ...

October 3, 2024 · 2 min · Research Team