News-Driven Stock Price Forecasting in Indian Markets: A Comparative Study of Advanced Deep Learning Models

ArXiv ID: 2411.05788 “View on arXiv”

Authors: Unknown

Abstract

Forecasting stock market prices remains a complex challenge for traders, analysts, and engineers due to the multitude of factors that influence price movements. Recent advancements in artificial intelligence (AI) and natural language processing (NLP) have significantly enhanced stock price prediction capabilities. AI’s ability to process vast and intricate data sets has led to more sophisticated forecasts. However, achieving consistently high accuracy in stock price forecasting remains elusive. In this paper, we leverage 30 years of historical data from national banks in India, sourced from the National Stock Exchange, to forecast stock prices. Our approach utilizes state-of-the-art deep learning models, including multivariate multi-step Long Short-Term Memory (LSTM), Facebook Prophet with LightGBM optimized through Optuna, and Seasonal Auto-Regressive Integrated Moving Average (SARIMA). We further integrate sentiment analysis from tweets and reliable financial sources such as Business Standard and Reuters, acknowledging their crucial influence on stock price fluctuations.

Keywords: Long Short-Term Memory (LSTM), Prophet, Sentiment Analysis, Stock Price Forecasting

Complexity vs Empirical Score

  • Math Complexity: 7.0/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs advanced deep learning architectures (e.g., multivariate multi-step LSTM) and integrated NLP for sentiment analysis, requiring significant algorithmic complexity. It demonstrates high empirical rigor by using 30 years of real-world stock and news data from the Indian market and evaluating performance with standard metrics like RMSE, indicating a backtest-ready, data-heavy implementation.
  flowchart TD
    A["Research Goal:<br>Forecast Indian Stock Prices<br>using News & Historical Data"] --> B["Data Collection & Integration"]
    
    subgraph B ["Data Sources"]
        B1["30 Years Historical Data<br>NSE National Banks"]
        B2["News/Sentiment Data<br>Tweets, Business Standard, Reuters"]
    end
    
    B --> C["Data Preprocessing<br>& Sentiment Analysis"]
    
    subgraph D ["Deep Learning Models"]
        direction LR
        D1["LSTM<br>Multivariate Multi-step"]
        D2["Facebook Prophet +<br>LightGBM w/ Optuna"]
        D3["SARIMA<br>Seasonal Time-series"]
    end
    
    C --> D
    
    D --> E{"Model Comparison<br>& Evaluation"}
    
    E --> F["Key Findings<br>State-of-the-art DL models<br>improve accuracy.<br>Sentiment integration<br>is crucial for volatility."]
    
    style A fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    style B fill:#fff3e0,stroke:#e65100,stroke-width:2px
    style C fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    style D fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px
    style E fill:#fff9c4,stroke:#fbc02d,stroke-width:2px
    style F fill:#ffebee,stroke:#b71c1c,stroke-width:2px