LSTM | Quant Finance Research Hub

Risk forecasting using Long Short-Term Memory Mixture Density Networks

Risk forecasting using Long Short-Term Memory Mixture Density Networks ArXiv ID: 2501.01278 “View on arXiv” Authors: Unknown Abstract This work aims to implement Long Short-Term Memory mixture density networks (LSTM-MDNs) for Value-at-Risk forecasting and compare their performance with established models (historical simulation, CMM, and GARCH) using a defined backtesting procedure. The focus was on the neural network’s ability to capture volatility clustering and its real-world applicability. Three architectures were tested: a 2-component mixture density network, a regularized 2-component model (Arimond et al., 2020), and a 3-component mixture model, the latter being tested for the first time in Value-at-Risk forecasting. Backtesting was performed on three stock indices (FTSE 100, S&P 500, EURO STOXX 50) over two distinct two-year periods (2017-2018 as a calm period, 2021-2022 as turbulent). Model performance was assessed through unconditional coverage and independence assumption tests. The neural network’s ability to handle volatility clustering was validated via correlation analysis and graphical evaluation. Results show limited success for the neural network approach. LSTM-MDNs performed poorly for 2017/2018 but outperformed benchmark models in 2021/2022. The LSTM mechanism allowed the neural network to capture volatility clustering similarly to GARCH models. However, several issues were identified: the need for proper model initialization and reliance on large datasets for effective learning. The findings suggest that while LSTM-MDNs provide adequate risk forecasts, further research and adjustments are necessary for stable performance. ...

Enhanced Momentum with Momentum Transformers

Enhanced Momentum with Momentum Transformers ArXiv ID: 2412.12516 “View on arXiv” Authors: Unknown Abstract The primary objective of this research is to build a Momentum Transformer that is expected to outperform benchmark time-series momentum and mean-reversion trading strategies. We extend the ideas introduced in the paper Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture to equities as the original paper primarily only builds upon futures and equity indices. Unlike conventional Long Short-Term Memory (LSTM) models, which operate sequentially and are optimized for processing local patterns, an attention mechanism equips our architecture with direct access to all prior time steps in the training window. This hybrid design, combining attention with an LSTM, enables the model to capture long-term dependencies, enhance performance in scenarios accounting for transaction costs, and seamlessly adapt to evolving market conditions, such as those witnessed during the Covid Pandemic. We average 4.14% returns which is similar to the original papers results. Our Sharpe is lower at an average of 1.12 due to much higher volatility which may be due to stocks being inherently more volatile than futures and indices. ...

CatNet: Controlling the False Discovery Rate in LSTM with SHAP Feature Importance and Gaussian Mirrors

CatNet: Controlling the False Discovery Rate in LSTM with SHAP Feature Importance and Gaussian Mirrors ArXiv ID: 2411.16666 “View on arXiv” Authors: Unknown Abstract We introduce CatNet, an algorithm that effectively controls False Discovery Rate (FDR) and selects significant features in LSTM. CatNet employs the derivative of SHAP values to quantify the feature importance, and constructs a vector-formed mirror statistic for FDR control with the Gaussian Mirror algorithm. To avoid instability due to nonlinear or temporal correlations among features, we also propose a new kernel-based independence measure. CatNet performs robustly on different model settings with both simulated and real-world data, which reduces overfitting and improves interpretability of the model. Our framework that introduces SHAP for feature importance in FDR control algorithms and improves Gaussian Mirror can be naturally extended to other time-series or sequential deep learning models. ...

A Deep Learning Approach to Predict the Fall of Price of Cryptocurrency Long Before its Actual Fall

A Deep Learning Approach to Predict the Fall [“of Price”] of Cryptocurrency Long Before its Actual Fall ArXiv ID: 2411.13615 “View on arXiv” Authors: Unknown Abstract In modern times, the cryptocurrency market is one of the world’s most rapidly rising financial markets. The cryptocurrency market is regarded to be more volatile and illiquid than traditional markets such as equities, foreign exchange, and commodities. The risk of this market creates an uncertain condition among the investors. The purpose of this research is to predict the magnitude of the risk factor of the cryptocurrency market. Risk factor is also called volatility. Our approach will assist people who invest in the cryptocurrency market by overcoming the problems and difficulties they experience. Our approach starts with calculating the risk factor of the cryptocurrency market from the existing parameters. In twenty elements of the cryptocurrency market, the risk factor has been predicted using different machine learning algorithms such as CNN, LSTM, BiLSTM, and GRU. All of the models have been applied to the calculated risk factor parameter. A new model has been developed to predict better than the existing models. Our proposed model gives the highest RMSE value of 1.3229 and the lowest RMSE value of 0.0089. Following our model, it will be easier for investors to trade in complicated and challenging financial assets like bitcoin, Ethereum, dogecoin, etc. Where the other existing models, the highest RMSE was 14.5092, and the lower was 0.02769. So, the proposed model performs much better than models with proper generalization. Using our approach, it will be easier for investors to trade in complicated and challenging financial assets like Bitcoin, Ethereum, and Dogecoin. ...

Deep Learning in Long-Short Stock Portfolio Allocation: An Empirical Study

Deep Learning in Long-Short Stock Portfolio Allocation: An Empirical Study ArXiv ID: 2411.13555 “View on arXiv” Authors: Unknown Abstract This paper provides an empirical study explores the application of deep learning algorithms-Multilayer Perceptron (MLP), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Transformer-in constructing long-short stock portfolios. Two datasets comprising randomly selected stocks from the S&P500 and NASDAQ indices, each spanning a decade of daily data, are utilized. The models predict daily stock returns based on historical features such as past returns,Relative Strength Index (RSI), trading volume, and volatility. Portfolios are dynamically adjusted by longing stocks with positive predicted returns and shorting those with negative predictions, with equal asset weights. Performance is evaluated over a two-year testing period, focusing on return, Sharpe ratio, and maximum drawdown metrics. The results demonstrate the efficacy of deep learning models in enhancing long-short stock portfolio performance. ...

Comparative Analysis of LSTM, GRU, and Transformer Models for Stock Price Prediction

Comparative Analysis of LSTM, GRU, and Transformer Models for Stock Price Prediction ArXiv ID: 2411.05790 “View on arXiv” Authors: Unknown Abstract In recent fast-paced financial markets, investors constantly seek ways to gain an edge and make informed decisions. Although achieving perfect accuracy in stock price predictions remains elusive, artificial intelligence (AI) advancements have significantly enhanced our ability to analyze historical data and identify potential trends. This paper takes AI driven stock price trend prediction as the core research, makes a model training data set of famous Tesla cars from 2015 to 2024, and compares LSTM, GRU, and Transformer Models. The analysis is more consistent with the model of stock trend prediction, and the experimental results show that the accuracy of the LSTM model is 94%. These methods ultimately allow investors to make more informed decisions and gain a clearer insight into market behaviors. ...

Achilles, Neural Network to Predict the Gold Vs US Dollar Integration with Trading Bot for Automatic Trading

Achilles, Neural Network to Predict the Gold Vs US Dollar Integration with Trading Bot for Automatic Trading ArXiv ID: 2410.21291 “View on arXiv” Authors: Unknown Abstract Predicting the stock market is a big challenge for the machine learning world. It is known how difficult it is to have accurate and consistent predictions with ML models. Some architectures are able to capture the movement of stocks but almost never are able to be launched to the production world. We present Achilles, with a classical architecture of LSTM(Long Short Term Memory) neural network this model is able to predict the Gold vs USD commodity. With the predictions minute-per-minute of this model we implemented a trading bot to run during 23 days of testing excluding weekends. At the end of the testing period we generated $1623.52 in profit with the methodology used. The results of our method demonstrate Machine Learning can successfully be implemented to predict the Gold vs USD commodity. ...

Stock Price Prediction and Traditional Models: An Approach to Achieve Short-, Medium- and Long-Term Goals

Stock Price Prediction and Traditional Models: An Approach to Achieve Short-, Medium- and Long-Term Goals ArXiv ID: 2410.07220 “View on arXiv” Authors: Unknown Abstract A comparative analysis of deep learning models and traditional statistical methods for stock price prediction uses data from the Nigerian stock exchange. Historical data, including daily prices and trading volumes, are employed to implement models such as Long Short Term Memory (LSTM) networks, Gated Recurrent Units (GRUs), Autoregressive Integrated Moving Average (ARIMA), and Autoregressive Moving Average (ARMA). These models are assessed over three-time horizons: short-term (1 year), medium-term (2.5 years), and long-term (5 years), with performance measured by Mean Squared Error (MSE) and Mean Absolute Error (MAE). The stability of the time series is tested using the Augmented Dickey-Fuller (ADF) test. Results reveal that deep learning models, particularly LSTM, outperform traditional methods by capturing complex, nonlinear patterns in the data, resulting in more accurate predictions. However, these models require greater computational resources and offer less interpretability than traditional approaches. The findings highlight the potential of deep learning for improving financial forecasting and investment strategies. Future research could incorporate external factors such as social media sentiment and economic indicators, refine model architectures, and explore real-time applications to enhance prediction accuracy and scalability. ...

Optimizing Time Series Forecasting: A Comparative Study of Adam and Nesterov Accelerated Gradient on LSTM and GRU networks Using Stock Market data

Optimizing Time Series Forecasting: A Comparative Study of Adam and Nesterov Accelerated Gradient on LSTM and GRU networks Using Stock Market data ArXiv ID: 2410.01843 “View on arXiv” Authors: Unknown Abstract Several studies have discussed the impact different optimization techniques in the context of time series forecasting across different Neural network architectures. This paper examines the effectiveness of Adam and Nesterov’s Accelerated Gradient (NAG) optimization techniques on LSTM and GRU neural networks for time series prediction, specifically stock market time-series. Our study was done by training LSTM and GRU models with two different optimization techniques - Adam and Nesterov Accelerated Gradient (NAG), comparing and evaluating their performance on Apple Inc’s closing price data over the last decade. The GRU model optimized with Adam produced the lowest RMSE, outperforming the other model-optimizer combinations in both accuracy and convergence speed. The GRU models with both optimizers outperformed the LSTM models, whilst the Adam optimizer outperformed the NAG optimizer for both model architectures. The results suggest that GRU models optimized with Adam are well-suited for practitioners in time-series prediction, more specifically stock price time series prediction producing accurate and computationally efficient models. The code for the experiments in this project can be found at https://github.com/AhmadMak/Time-Series-Optimization-Research Keywords: Time-series Forecasting, Neural Network, LSTM, GRU, Adam Optimizer, Nesterov Accelerated Gradient (NAG) Optimizer ...

Pricing American Options using Machine Learning Algorithms

Pricing American Options using Machine Learning Algorithms ArXiv ID: 2409.03204 “View on arXiv” Authors: Unknown Abstract This study investigates the application of machine learning algorithms, particularly in the context of pricing American options using Monte Carlo simulations. Traditional models, such as the Black-Scholes-Merton framework, often fail to adequately address the complexities of American options, which include the ability for early exercise and non-linear payoff structures. By leveraging Monte Carlo methods in conjunction Least Square Method machine learning was used. This research aims to improve the accuracy and efficiency of option pricing. The study evaluates several machine learning models, including neural networks and decision trees, highlighting their potential to outperform traditional approaches. The results from applying machine learning algorithm in LSM indicate that integrating machine learning with Monte Carlo simulations can enhance pricing accuracy and provide more robust predictions, offering significant insights into quantitative finance by merging classical financial theories with modern computational techniques. The dataset was split into features and the target variable representing bid prices, with an 80-20 train-validation split. LSTM and GRU models were constructed using TensorFlow’s Keras API, each with four hidden layers of 200 neurons and an output layer for bid price prediction, optimized with the Adam optimizer and MSE loss function. The GRU model outperformed the LSTM model across all evaluated metrics, demonstrating lower mean absolute error, mean squared error, and root mean squared error, along with greater stability and efficiency in training. ...