An Adaptive Dual-level Reinforcement Learning Approach for Optimal Trade Execution
ArXiv ID: 2307.10649 “View on arXiv”
Authors: Unknown
Abstract
The purpose of this research is to devise a tactic that can closely track the daily cumulative volume-weighted average price (VWAP) using reinforcement learning. Previous studies often choose a relatively short trading horizon to implement their models, making it difficult to accurately track the daily cumulative VWAP since the variations of financial data are often insignificant within the short trading horizon. In this paper, we aim to develop a strategy that can accurately track the daily cumulative VWAP while minimizing the deviation from the VWAP. We propose a method that leverages the U-shaped pattern of intraday stock trade volumes and use Proximal Policy Optimization (PPO) as the learning algorithm. Our method follows a dual-level approach: a Transformer model that captures the overall(global) distribution of daily volumes in a U-shape, and a LSTM model that handles the distribution of orders within smaller(local) time intervals. The results from our experiments suggest that this dual-level architecture improves the accuracy of approximating the cumulative VWAP, when compared to previous reinforcement learning-based models.
Keywords: reinforcement learning, VWAP (Volume-Weighted Average Price), Proximal Policy Optimization (PPO), Transformer model, LSTM, Equities
Complexity vs Empirical Score
- Math Complexity: 8.5/10
- Empirical Rigor: 7.0/10
- Quadrant: Holy Grail
- Why: The paper employs advanced mathematics including Markov Decision Processes, Transformer models, and LSTM networks, earning a high math score. Its empirical rigor is strong due to a clearly defined experimental setup with comparisons to previous RL models and use of real financial data, though it lacks explicit backtesting metrics like code or specific datasets.
flowchart TD
A["Research Goal<br>Devise RL tactic to track daily VWAP<br>minimizing deviation over long horizon"] --> B["Methodology: Dual-Level Reinforcement Learning"]
B --> C1["Global Level: Transformer<br>Models U-shaped daily volume distribution"]
B --> C2["Local Level: LSTM<br>Models order distribution in small time intervals"]
C1 & C2 --> D["Core Algorithm: Proximal Policy Optimization PPO"]
D --> E["Inputs: Historical Equities Data"]
E --> F["Simulation Environment<br>Calculates PnL & VWAP Deviation"]
F -- Feedback --> D
D --> G["Outcome: Adaptive Trade Execution Strategy"]
G --> H["Key Finding: Dual-level architecture<br>outperforms baseline RL models<br>in tracking daily cumulative VWAP"]