Risk-Adjusted Performance of Random Forest Models in High-Frequency Trading

ArXiv ID: 2412.15448 “View on arXiv”

Authors: Unknown

Abstract

Because of the theoretical challenges posed by the Efficient Market Hypothesis to technical analysis, the effectiveness of technical indicators in high-frequency trading remains inadequately explored, particularly at the minute-level frequency, where effects of the microstructure of the market dominate. This study evaluates the integration of traditional technical indicators with random forest regression models using minute-level SPY data, analyzing 13 distinct model configurations. Our empirical results reveal a stark contrast between in-sample and out-of-sample performance, with $R^2$ values deteriorating from 0.749–0.812 during training to negative values in testing. A feature importance analysis demonstrates that primary price-based features dominate the predictions made by the model, accounting for over 60% of the importance, while established technical indicators, such as RSI and Bollinger Bands, account for only 14%–15%. Although the indicator-enhanced models achieved superior risk-adjusted metrics, with Rachev ratios between 0.919 and 0.961, they consistently underperformed a simple buy-and-hold strategy, generating returns ranging from -2.4% to -3.9%. These findings challenge conventional assumptions about the usefulness of technical indicators in algorithmic trading, suggesting that in high-frequency contexts, they may be more relevant to risk management rather than to predicting returns. For practitioners and researchers, our findings indicate that successful high-frequency trading strategies should focus on adaptive feature selection and regime-specific modeling rather than relying on traditional technical indicators, as well as indicating the critical importance of robust out-of-sample testing in the development of a model.

Keywords: High-Frequency Trading (HFT), Random Forest, Technical Indicators, Microstructure, Feature Importance

Complexity vs Empirical Score

  • Math Complexity: 4.0/10
  • Empirical Rigor: 8.0/10
  • Quadrant: Street Traders
  • Why: The paper relies on standard statistical metrics and machine learning implementation details (e.g., random forest regression on minute-level data) rather than advanced mathematical derivations, yet it is heavily data-driven, featuring specific backtest results, out-of-sample testing, and risk-adjusted performance metrics like the Rachev ratio.
  flowchart TD
    subgraph Goal
        A["Research Goal<br>Evaluate RF integration with technical indicators<br>for minute-level HFT performance"]
    end

    subgraph Methodology
        B["Data & Configuration<br>Minute-level SPY data<br>13 Random Forest model configurations"]
        C["Computational Process<br>In-Sample Training vs Out-of-Sample Testing<br>Feature Importance Analysis<br>Risk-Adjusted Metric Calculation"]
    end

    subgraph Outcomes
        D["Performance Results<br>R^2 drop 0.749-0.812 (Train) to Negative (Test)<br>Returns: -2.4% to -3.9%"]
        E["Feature Analysis<br>Price-based features: >60% importance<br>Technical indicators (RSI/Bollinger): 14-15%"]
        F["Key Findings<br>Technical indicators relate to risk management<br>Focus on adaptive feature selection & regime-specific modeling"]
    end

    A --> B
    B --> C
    C --> D
    C --> E
    D --> F
    E --> F