Towards a data-driven debt collection strategy based on an advanced machine learning framework

ArXiv ID: 2311.06292 “View on arXiv”

Authors: Unknown

Abstract

The European debt purchase market as measured by the total book value of purchased debt approached 25bn euros in 2020 and it was growing at double-digit rates. This is an example of how big the debt collection and debt purchase industry has grown and the important impact it has in the financial sector. However, in order to ensure an adequate return during the debt collection process, a good estimation of the propensity to pay and/or the expected cashflow is crucial. These estimations can be employed, for instance, to create different strategies during the amicable collection to maximize quality standards and revenues. And not only that, but also to prioritize the cases in which a legal process is necessary when debtors are unreachable for an amicable negotiation. This work offers a solution for these estimations. Specifically, a new machine learning modelling pipeline is presented showing how outperforms current strategies employed in the sector. The solution contains a pre-processing pipeline and a model selector based on the best model calibration. Performance is validated with real historical data of the debt industry.

Keywords: Debt collection, Propensity to pay, Machine learning pipeline, Credit risk, Cash flow prediction

Complexity vs Empirical Score

  • Math Complexity: 4.0/10
  • Empirical Rigor: 6.5/10
  • Quadrant: Street Traders
  • Why: The paper introduces advanced calibration metrics (ECCE-MAD, ECCE-R) but the math is mostly applied from prior work, not dense derivation; empirical rigor is strong with real historical data validation and a detailed ML pipeline, though no explicit code or backtest metrics are provided.
  flowchart TD
    A["Research Goal:<br>Improve Debt Collection ROI<br>via ML"] --> B["Data Source:<br>Historical Debt Data<br>Europe 2020"]
    B --> C["Methodology:<br>ML Pipeline with<br>Advanced Pre-processing"]
    C --> D{"Model Selection<br>Optimized for Calibration"}
    D --> E["Output:<br>Predicted Propensity to Pay & Cashflow"]
    E --> F["Outcome:<br>Outperforms Current<br>Industry Strategies"]