Statistical Arbitrage in Polish Equities Market Using Deep Learning Techniques

ArXiv ID: 2512.02037 “View on arXiv”

Authors: Marek Adamczyk, Michał Dąbrowski

Abstract

We study a systematic approach to a popular Statistical Arbitrage technique: Pairs Trading. Instead of relying on two highly correlated assets, we replace the second asset with a replication of the first using risk factor representations. These factors are obtained through Principal Components Analysis (PCA), exchange traded funds (ETFs), and, as our main contribution, Long Short Term Memory networks (LSTMs). Residuals between the main asset and its replication are examined for mean reversion properties, and trading signals are generated for sufficiently fast mean reverting portfolios. Beyond introducing a deep learning based replication method, we adapt the framework of Avellaneda and Lee (2008) to the Polish market. Accordingly, components of WIG20, mWIG40, and selected sector indices replace the original S&P500 universe, and market parameters such as the risk free rate and transaction costs are updated to reflect local conditions. We outline the full strategy pipeline: risk factor construction, residual modeling via the Ornstein Uhlenbeck process, and signal generation. Each replication technique is described together with its practical implementation. Strategy performance is evaluated over two periods: 2017-2019 and the recessive year 2020. All methods yield profits in 2017-2019, with PCA achieving roughly 20 percent cumulative return and an annualized Sharpe ratio of up to 2.63. Despite multiple adaptations, our conclusions remain consistent with those of the original paper. During the COVID-19 recession, only the ETF based approach remains profitable (about 5 percent annual return), while PCA and LSTM methods underperform. LSTM results, although negative, are promising and indicate potential for future optimization.

Keywords: Statistical Arbitrage, Pairs Trading, Principal Components Analysis (PCA), Long Short Term Memory (LSTM), Mean Reversion, Equities (Stocks)

Complexity vs Empirical Score

  • Math Complexity: 7.0/10
  • Empirical Rigor: 8.5/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced statistical methods (PCA, Ornstein-Uhlenbeck process, LSTM networks) and derives trading signals via sophisticated modeling, but also demonstrates backtest-ready implementation with detailed performance metrics (Sharpe ratios, cumulative returns) across two distinct market periods, and addresses real-world constraints like transaction costs.
  flowchart TD
    A["Research Goal<br>Statistical Arbitrage in Polish Equities<br>Using Deep Learning Techniques"] --> B["Data Inputs<br>WIG20, mWIG40, Sector Indices<br>2017-2020"]
    B --> C["Methodology: Replication Techniques<br>PCA | ETFs | LSTM"]
    C --> D["Computational Process<br>Residual Modeling via Ornstein-Uhlenbeck<br>Signal Generation"]
    D --> E["Key Findings: 2017-2019<br>PCA: 20% Return, Sharpe 2.63<br>All methods profitable"]
    D --> F["Key Findings: 2020 Recession<br>ETFs: 5% Return<br>PCA & LSTM underperform<br>LSTM shows future potential"]