HARd to Beat: The Overlooked Impact of Rolling Windows in the Era of Machine Learning

ArXiv ID: 2406.08041 “View on arXiv”

Authors: Unknown

Abstract

We investigate the predictive abilities of the heterogeneous autoregressive (HAR) model compared to machine learning (ML) techniques across an unprecedented dataset of 1,455 stocks. Our analysis focuses on the role of fitting schemes, particularly the training window and re-estimation frequency, in determining the HAR model’s performance. Despite extensive hyperparameter tuning, ML models fail to surpass the linear benchmark set by HAR when utilizing a refined fitting approach for the latter. Moreover, the simplicity of HAR allows for an interpretable model with drastically lower computational costs. We assess performance using QLIKE, MSE, and realized utility metrics, finding that HAR consistently outperforms its ML counterparts when both rely solely on realized volatility and VIX as predictors. Our results underscore the importance of a correctly specified fitting scheme. They suggest that properly fitted HAR models provide superior forecasting accuracy, establishing robust guidelines for their practical application and use as a benchmark. This study not only reaffirms the efficacy of the HAR model but also provides a critical perspective on the practical limitations of ML approaches in realized volatility forecasting.

Keywords: Heterogeneous Autoregressive (HAR) Model, Realized Volatility Forecasting, Machine Learning, Time Series Analysis, QLIKE, Equities

Complexity vs Empirical Score

Math Complexity: 4.0/10
Empirical Rigor: 8.5/10
Quadrant: Street Traders
Why: The paper relies on straightforward autoregressive and statistical models with minimal advanced mathematics, but demonstrates high empirical rigor through extensive backtesting on a large dataset (1,445 stocks), comprehensive hyperparameter tuning, and robust error metrics (MSE, QLIKE, utility).

  flowchart TD
    A["Research Goal<br>Assess HAR vs. ML Models<br>for Volatility Forecasting"] --> B["Dataset<br>1,455 Stocks<br>Realized Vol & VIX"]
    B --> C["Modeling Approaches"]
    C --> D["HAR Model<br>Linear Benchmark"]
    C --> E["Machine Learning<br>Models"]
    D --> F["Optimized Fitting<br>Window & Re-estimation"]
    E --> G["Extensive<br>Hyperparameter Tuning"]
    F --> H{"Performance Evaluation<br>Metrics: QLIKE, MSE, Utility"}
    G --> H
    H --> I["Key Findings<br>HAR Outperforms ML<br>Low Computational Cost<br>Correct Fitting is Crucial"]

HARd to Beat: The Overlooked Impact of Rolling Windows in the Era of Machine Learning#

Abstract#

Complexity vs Empirical Score#

HARd to Beat: The Overlooked Impact of Rolling Windows in the Era of Machine Learning

Abstract

Complexity vs Empirical Score