Forecasting the Performance of US Stock Market Indices During COVID-19: RF vs LSTM
ArXiv ID: 2306.03620 “View on arXiv”
Authors: Unknown
Abstract
The US stock market experienced instability following the recession (2007-2009). COVID-19 poses a significant challenge to US stock traders and investors. Traders and investors should keep up with the stock market. This is to mitigate risks and improve profits by using forecasting models that account for the effects of the pandemic. With consideration of the COVID-19 pandemic after the recession, two machine learning models, including Random Forest and LSTM are used to forecast two major US stock market indices. Data on historical prices after the big recession is used for developing machine learning models and forecasting index returns. To evaluate the model performance during training, cross-validation is used. Additionally, hyperparameter optimizing, regularization, such as dropouts and weight decays, and preprocessing improve the performances of Machine Learning techniques. Using high-accuracy machine learning techniques, traders and investors can forecast stock market behavior, stay ahead of their competition, and improve profitability. Keywords: COVID-19, LSTM, S&P500, Random Forest, Russell 2000, Forecasting, Machine Learning, Time Series JEL Code: C6, C8, G4.
Keywords: Long Short-Term Memory (LSTM), Random Forest, Time Series Forecasting, Cross-Validation, Hyperparameter Optimization, Equities
Complexity vs Empirical Score
- Math Complexity: 4.0/10
- Empirical Rigor: 6.0/10
- Quadrant: Street Traders
- Why: The paper applies established machine learning models (Random Forest, LSTM) to financial time series forecasting with moderate mathematical complexity, but it is highly empirical, focusing on data preprocessing, hyperparameter tuning, and backtest-ready methodology for trading signals.
flowchart TD
A["Research Goal: Forecast US Stock Indices<br/>S&P500 & Russell 2000<br/>During COVID-19"] --> B["Data Collection<br/>Historical Prices Post-2008 Recession"]
B --> C["Preprocessing & Feature Engineering<br/>Normalization, Technical Indicators, Split Data"]
C --> D{"Model Training & Optimization"}
D --> D1["Random Forest (RF)"]
D --> D2["LSTM Neural Network<br/>Dropout & Weight Decay"]
D1 & D2 --> E["Model Evaluation<br/>Cross-Validation & Performance Metrics"]
E --> F["Key Findings<br/>LSTM outperforms RF<br/>Validated for Risk Mitigation & Profitability"]