SARF: Enhancing Stock Market Prediction with Sentiment-Augmented Random Forest

ArXiv ID: 2410.07143 “View on arXiv”

Authors: Unknown

Abstract

Stock trend forecasting, a challenging problem in the financial domain, involves ex-tensive data and related indicators. Relying solely on empirical analysis often yields unsustainable and ineffective results. Machine learning researchers have demonstrated that the application of random forest algorithm can enhance predictions in this context, playing a crucial auxiliary role in forecasting stock trends. This study introduces a new approach to stock market prediction by integrating sentiment analysis using FinGPT generative AI model with the traditional Random Forest model. The proposed technique aims to optimize the accuracy of stock price forecasts by leveraging the nuanced understanding of financial sentiments provided by FinGPT. We present a new methodology called “Sentiment-Augmented Random Forest” (SARF), which in-corporates sentiment features into the Random Forest framework. Our experiments demonstrate that SARF outperforms conventional Random Forest and LSTM models with an average accuracy improvement of 9.23% and lower prediction errors in pre-dicting stock market movements.

Keywords: Sentiment Analysis, FinGPT, Random Forest, Stock Trend Forecasting, Generative AI

Complexity vs Empirical Score

  • Math Complexity: 2.5/10
  • Empirical Rigor: 6.5/10
  • Quadrant: Street Traders
  • Why: The paper uses established machine learning methods (Random Forest, FinGPT) without novel mathematical derivations, keeping math complexity low, but includes detailed empirical work with specific datasets, APIs, and comparative metrics.
  flowchart TD
    Start(["Research Goal: Enhance Stock Trend Prediction"]) --> Inputs
    subgraph Inputs ["Data & Models"]
        direction LR
        I1["(Historical Market Data)"]
        I2["(Financial News via FinGPT)"]
    end
    Inputs --> Method
    subgraph Method ["Key Methodology: SARF"]
        direction TB
        M1["Sentiment Feature Extraction"] --> M2["Feature Fusion<br>Sentiment + Technical Data"]
        M2 --> M3["Random Forest Model Training"]
    end
    Method --> Outcome
    subgraph Outcome ["Findings & Outcomes"]
        direction LR
        O1["9.23% Accuracy Improvement"]
        O2["Lower Prediction Errors"]
        O3["Superior to LSTM/RF"]
    end