HARd to Beat: The Overlooked Impact of Rolling Windows in the Era of Machine Learning
ArXiv ID: 2406.08041 “View on arXiv”
Authors: Unknown
Abstract
We investigate the predictive abilities of the heterogeneous autoregressive (HAR) model compared to machine learning (ML) techniques across an unprecedented dataset of 1,455 stocks. Our analysis focuses on the role of fitting schemes, particularly the training window and re-estimation frequency, in determining the HAR model’s performance. Despite extensive hyperparameter tuning, ML models fail to surpass the linear benchmark set by HAR when utilizing a refined fitting approach for the latter. Moreover, the simplicity of HAR allows for an interpretable model with drastically lower computational costs. We assess performance using QLIKE, MSE, and realized utility metrics, finding that HAR consistently outperforms its ML counterparts when both rely solely on realized volatility and VIX as predictors. Our results underscore the importance of a correctly specified fitting scheme. They suggest that properly fitted HAR models provide superior forecasting accuracy, establishing robust guidelines for their practical application and use as a benchmark. This study not only reaffirms the efficacy of the HAR model but also provides a critical perspective on the practical limitations of ML approaches in realized volatility forecasting.
Keywords: Heterogeneous Autoregressive (HAR) Model, Realized Volatility Forecasting, Machine Learning, Time Series Analysis, QLIKE, Equities
Complexity vs Empirical Score
- Math Complexity: 4.0/10
- Empirical Rigor: 8.5/10
- Quadrant: Street Traders
- Why: The paper relies on straightforward autoregressive and statistical models with minimal advanced mathematics, but demonstrates high empirical rigor through extensive backtesting on a large dataset (1,445 stocks), comprehensive hyperparameter tuning, and robust error metrics (MSE, QLIKE, utility).
flowchart TD
A["Research Goal<br>Assess HAR vs. ML Models<br>for Volatility Forecasting"] --> B["Dataset<br>1,455 Stocks<br>Realized Vol & VIX"]
B --> C["Modeling Approaches"]
C --> D["HAR Model<br>Linear Benchmark"]
C --> E["Machine Learning<br>Models"]
D --> F["Optimized Fitting<br>Window & Re-estimation"]
E --> G["Extensive<br>Hyperparameter Tuning"]
F --> H{"Performance Evaluation<br>Metrics: QLIKE, MSE, Utility"}
G --> H
H --> I["Key Findings<br>HAR Outperforms ML<br>Low Computational Cost<br>Correct Fitting is Crucial"]