Quantitative Financial Modeling for Sri Lankan Markets: Approach Combining NLP, Clustering and Time-Series Forecasting

ArXiv ID: 2512.20216 “View on arXiv”

Authors: Linuk Perera

Abstract

This research introduces a novel quantitative methodology tailored for quantitative finance applications, enabling banks, stockbrokers, and investors to predict economic regimes and market signals in emerging markets, specifically Sri Lankan stock indices (S&P SL20 and ASPI) by integrating Environmental, Social, and Governance (ESG) sentiment analysis with macroeconomic indicators and advanced time-series forecasting. Designed to leverage quantitative techniques for enhanced risk assessment, portfolio optimization, and trading strategies in volatile environments, the architecture employs FinBERT, a transformer-based NLP model, to extract sentiment from ESG texts, followed by unsupervised clustering (UMAP/HDBSCAN) to identify 5 latent ESG regimes, validated via PCA. These regimes are mapped to economic conditions using a dense neural network and gradient boosting classifier, achieving 84.04% training and 82.0% validation accuracy. Concurrently, time-series models (SRNN, MLP, LSTM, GRU) forecast daily closing prices, with GRU attaining an R-squared of 0.801 and LSTM delivering 52.78% directional accuracy on intraday data. A strong correlation between S&P SL20 and S&P 500, observed through moving average and volatility trend plots, further bolsters forecasting precision. A rule-based fusion logic merges ESG and time-series outputs for final market signals. By addressing literature gaps that overlook emerging markets and holistic integration, this quant-driven framework combines global correlations and local sentiment analysis to offer scalable, accurate tools for quantitative finance professionals navigating complex markets like Sri Lanka.

Keywords: ESG sentiment analysis, time-series forecasting, emerging markets, quantitative finance, portfolio optimization

Complexity vs Empirical Score

  • Math Complexity: 7.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced mathematics including transformer architectures (FinBERT), manifold learning (UMAP), density-based clustering (HDBSCAN), and validation via PCA, with several mathematical formulations provided, warranting a high complexity score. It presents strong empirical evidence with specific performance metrics (82% validation accuracy, R²=0.801), uses real-world datasets (Bloomberg, CBSL, Yahoo Finance), and includes a pipeline with data preprocessing, model training, and validation, indicating substantial implementation and backtesting readiness.
  flowchart TD
    A["Research Goal: Develop Quantitative Model<br>for Sri Lankan Financial Markets"] --> B["Data Collection & Input<br>ESG Texts, S&P SL20/ASPI, S&P 500, Macro Indicators"]
    B --> C1["NLP & Clustering<br>FinBERT + UMAP/HDBSCAN<br>Identifies 5 ESG Regimes"]
    B --> C2["Time-Series Forecasting<br>LSTM/GRU/MLP/SRNN<br>Predicts Prices & Direction"]
    C1 --> D["Data Fusion<br>Rule-based Logic combining<br>ESG Regimes & Forecast Signals"]
    C2 --> D
    D --> E["Final Market Signals<br>For Risk, Portfolio, & Trading"]