Cash Flow Underwriting with Bank Transaction Data: Advancing MSME Financial Inclusion in Malaysia

ArXiv ID: 2510.16066 “View on arXiv”

Authors: Chun Chet Ng, Wei Zeng Low, Jia Yu Lim, Yin Yin Boon

Abstract

Despite accounting for 96.1% of all businesses in Malaysia, access to financing remains one of the most persistent challenges faced by Micro, Small, and Medium Enterprises (MSMEs). Newly established businesses are often excluded from formal credit markets as traditional underwriting approaches rely heavily on credit bureau data. This study investigates the potential of bank statement data as an alternative data source for credit assessment to promote financial inclusion in emerging markets. First, we propose a cash flow-based underwriting pipeline where we utilise bank statement data for end-to-end data extraction and machine learning credit scoring. Second, we introduce a novel dataset of 611 loan applicants from a Malaysian lending institution. Third, we develop and evaluate credit scoring models based on application information and bank transaction-derived features. Empirical results show that the use of such data boosts the performance of all models on our dataset, which can improve credit scoring for new-to-lending MSMEs. Finally, we will release the anonymised bank transaction dataset to facilitate further research on MSME financial inclusion within Malaysia’s emerging economy.

Keywords: Credit scoring, Bank statement data, Cash flow-based underwriting, Machine learning, Financial inclusion, Credit/Consumer Lending

Complexity vs Empirical Score

  • Math Complexity: 3.5/10
  • Empirical Rigor: 6.5/10
  • Quadrant: Street Traders
  • Why: The paper employs standard machine learning models (e.g., Logistic Regression, Random Forest) and an agentic workflow without novel mathematical derivations or heavy quantitative modeling. However, it demonstrates strong empirical rigor by introducing a novel real-world dataset of 611 loan applicants, detailing an end-to-end implementation pipeline with specific modules, and providing evaluation results.
  flowchart TD
    Start["Research Goal:<br/>Assess bank transaction data for<br/>MSME credit scoring in Malaysia"] --> Inputs["Data Input:<br/>Novel dataset of 611 MSME loan applicants<br/>with bank statements & application info"]
    Inputs --> Features["Feature Engineering:<br/>Extract cash flow metrics &<br/>traditional application features"]
    Features --> Modeling["Computational Process:<br/>Develop & evaluate ML<br/>credit scoring models"]
    Modeling --> Results["Key Findings:<br/>Transaction data significantly<br/>improves model performance"]
    Results --> Outcome["Outcome:<br/>Enables credit scoring for<br/>new-to-lending MSMEs &<br/>release of anonymized dataset"]