Evaluation of Reinforcement Learning Techniques for Trading on a Diverse Portfolio

ArXiv ID: 2309.03202 “View on arXiv”

Authors: Unknown

Abstract

This work seeks to answer key research questions regarding the viability of reinforcement learning over the S&P 500 index. The on-policy techniques of Value Iteration (VI) and State-action-reward-state-action (SARSA) are implemented along with the off-policy technique of Q-Learning. The models are trained and tested on a dataset comprising multiple years of stock market data from 2000-2023. The analysis presents the results and findings from training and testing the models using two different time periods: one including the COVID-19 pandemic years and one excluding them. The results indicate that including market data from the COVID-19 period in the training dataset leads to superior performance compared to the baseline strategies. During testing, the on-policy approaches (VI and SARSA) outperform Q-learning, highlighting the influence of bias-variance tradeoff and the generalization capabilities of simpler policies. However, it is noted that the performance of Q-learning may vary depending on the stability of future market conditions. Future work is suggested, including experiments with updated Q-learning policies during testing and trading diverse individual stocks. Additionally, the exploration of alternative economic indicators for training the models is proposed.

Keywords: Reinforcement Learning, Q-Learning, SARSA, Value Iteration, Stock Market

Complexity vs Empirical Score

  • Math Complexity: 6.0/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced reinforcement learning algorithms (VI, SARSA, Q-learning) which involve mathematical modeling of states, actions, and policies, while the use of a long-term, multi-year stock market dataset with specific pandemic-period analysis demonstrates substantial empirical and backtesting rigor.
  flowchart TD
    A["Research Goal:<br>RL Viability vs S&P 500"] --> B["Methodology:<br>VI, SARSA, Q-Learning"]
    B --> C["Data Input:<br>Stock Data 2000-2023"]
    C --> D{"Split Training Data?"}
    D -- With COVID Data --> E["Training: Policy Extraction"]
    D -- Without COVID Data --> F["Training: Policy Extraction"]
    E --> G["Testing & Evaluation"]
    F --> G
    G --> H["Key Findings:<br>On-policy > Q-Learning<br>COVID data boosts performance"]