Enhanced Momentum with Momentum Transformers
ArXiv ID: 2412.12516 “View on arXiv”
Authors: Unknown
Abstract
The primary objective of this research is to build a Momentum Transformer that is expected to outperform benchmark time-series momentum and mean-reversion trading strategies. We extend the ideas introduced in the paper Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture to equities as the original paper primarily only builds upon futures and equity indices. Unlike conventional Long Short-Term Memory (LSTM) models, which operate sequentially and are optimized for processing local patterns, an attention mechanism equips our architecture with direct access to all prior time steps in the training window. This hybrid design, combining attention with an LSTM, enables the model to capture long-term dependencies, enhance performance in scenarios accounting for transaction costs, and seamlessly adapt to evolving market conditions, such as those witnessed during the Covid Pandemic. We average 4.14% returns which is similar to the original papers results. Our Sharpe is lower at an average of 1.12 due to much higher volatility which may be due to stocks being inherently more volatile than futures and indices.
Keywords: Momentum Trading, Transformer Model, Time-Series Analysis, Attention Mechanism, LSTM
Complexity vs Empirical Score
- Math Complexity: 8.5/10
- Empirical Rigor: 7.0/10
- Quadrant: Holy Grail
- Why: The paper employs advanced deep learning architecture (Transformer with attention mechanisms, GRNs, GLUs) and sophisticated loss functions (negative Sharpe ratio optimization), indicating high mathematical complexity. It demonstrates strong empirical rigor with detailed backtesting on real financial data (CRSP/Compustat), explicit handling of data leakage and survivorship bias, and reporting of performance metrics like Sharpe ratios and returns.
flowchart TD
A["Research Goal<br>Build a Momentum Transformer to outperform<br>benchmark momentum and mean-reversion strategies"] --> B["Methodology<br>Hybrid Architecture: Transformer (Attention) + LSTM<br>Adapted for Equity Markets"]
B --> C["Input Data<br>Equity Time-Series Data<br>(including Covid Pandemic period)"]
C --> D["Computational Process<br>Training with Attention Mechanism<br>Capturing long-term dependencies & local patterns"]
D --> E["Key Findings & Outcomes<br>• Avg Return: 4.14% (Comparable to Futures)<br>• Avg Sharpe: 1.12 (Lower due to stock volatility)<br>• Effective handling of transaction costs"]