Regression and Forecasting of U.S. Stock Returns Based on LSTM

ArXiv ID: 2502.05210 “View on arXiv”

Authors: Unknown

Abstract

This paper analyses the investment returns of three stock sectors, Manuf, Hitec, and Other, in the U.S. stock market, based on the Fama-French three-factor model, the Carhart four-factor model, and the Fama-French five-factor model, in order to test the validity of the Fama-French three-factor model, the Carhart four-factor model, and the Fama-French five-factor model for the three sectors of the market. French five-factor model for the three sectors of the market. Also, the LSTM model is used to explore the additional factors affecting stock returns. The empirical results show that the Fama-French five-factor model has better validity for the three segments of the market under study, and the LSTM model has the ability to capture the factors affecting the returns of certain industries, and can better regress and predict the stock returns of the relevant industries. Keywords- Fama-French model; Carhart model; Factor model; LSTM model.

Keywords: Fama-French Five-Factor Model, LSTM, Factor Models, Stock Returns, Carhart Model, Stocks

Complexity vs Empirical Score

  • Math Complexity: 6.5/10
  • Empirical Rigor: 7.0/10
  • Quadrant: Holy Grail
  • Why: The paper employs established finance factor models (Fama-French, Carhart) and advanced neural network architectures (LSTM) with significant mathematical formulation, while also detailing empirical procedures including data sourcing from major U.S. exchanges, data cleaning, and the use of standard statistical metrics (R-squared, RMSE, MAE) for backtest-ready validation.
  flowchart TD
    A["Research Goal<br/>Test Factor Models & Forecast Stock Returns"] --> B["Data Inputs<br/>(U.S. Stock Sectors: Manuf, Hitec, Other)"]
    B --> C["Methodology<br/>1. Factor Models (Fama-French 3/5, Carhart 4)<br/>2. LSTM Regression & Forecasting"]
    C --> D["Computational Process<br/>Regress sector returns vs. factors<br/>Train LSTM on historical data"]
    D --> E["Outcome 1<br/>Fama-French Five-Factor model<br/>has best validity for all sectors"]
    D --> F["Outcome 2<br/>LSTM captures industry-specific factors<br/>and predicts returns effectively"]