Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions

ArXiv ID: 2512.02036 “View on arXiv”

Authors: Juan C. King, Jose M. Amigo

Abstract

The aim of this paper is the analysis and selection of stock trading systems that combine different models with data of different nature, such as financial and microeconomic information. Specifically, based on previous work by the authors and applying advanced techniques of Machine Learning and Deep Learning, our objective is to formulate trading algorithms for the stock market with empirically tested statistical advantages, thus improving results published in the literature. Our approach integrates Long Short-Term Memory (LSTM) networks with algorithms based on decision trees, such as Random Forest and Gradient Boosting. While the former analyze price patterns of financial assets, the latter are fed with economic data of companies. Numerical simulations of algorithmic trading with data from international companies and 10-weekday predictions confirm that an approach based on both fundamental and technical variables can outperform the usual approaches, which do not combine those two types of variables. In doing so, Random Forest turned out to be the best performer among the decision trees. We also discuss how the prediction performance of such a hybrid approach can be boosted by selecting the technical variables.

Keywords: Long Short-Term Memory (LSTM), Random Forest, Gradient Boosting, Algorithmic Trading, Hybrid Machine Learning Models, Equities (Stocks)

Complexity vs Empirical Score

  • Math Complexity: 4.0/10
  • Empirical Rigor: 7.5/10
  • Quadrant: Street Traders
  • Why: The paper focuses on applied machine learning (LSTM, Random Forest) without heavy theoretical derivations, but reports detailed backtests with specific metrics, cross-validation, and trading simulations on financial data.
  flowchart TD
    A["Research Goal:<br>Formulate trading algorithms with<br>statistical advantages using hybrid ML"] --> B["Data Preparation:<br>Financial + Microeconomic Data"]
    B --> C["Model Architecture"]
    C --> D["LSTM Network<br>(Technical Analysis<br>Price Patterns)"]
    C --> E["Decision Tree Ensemble<br>(Fundamental Analysis<br>Economic Data)"]
    E --> F{"Algorithmic Trading Simulation"}
    D --> F
    F --> G["Comparative Analysis<br>(10-Week Prediction Window)"]
    G --> H{"Outcomes & Selection"}
    H --> I["Random Forest<br>Selected as best performer"]
    H --> J["Hybrid Approach<br>Outperforms single-type models"]
    H --> K["Feature Selection<br>Boosts prediction accuracy"]