The bias of IID resampled backtests for rolling-window mean-variance portfolios
ArXiv ID: 2505.06383 “View on arXiv”
Authors: Andrew Paskaramoorthy, Terence van Zyl, Tim Gebbie
Abstract
Backtests on historical data are the basis for practical evaluations of portfolio selection rules, but their reliability is often limited by reliance on a single sample path. This can lead to high estimation variance. Resampling techniques offer a potential solution by increasing the effective sample size, but can disrupt the temporal ordering inherent in financial data and introduce significant bias. This paper investigates the critical questions: First, How large is this bias for Sharpe Ratio estimates?, and then, second: What are its primary drivers?. We focus on the canonical rolling-window mean-variance portfolio rule. Our contributions are identifying the bias mechanism, and providing a practical heuristic for gauging bias severity. We show that the bias arises from the disruption of train-test dependence linked to the return auto-covariance structure and derive bounds for the bias which show a strong dependence on the observable first-lag autocorrelation. Using simulations to confirm these findings, it is revealed that the resulting Sharpe Ratio bias is often a fraction of a typical backtest’s estimation noise, benefiting from partial offsetting of component biases. Empirical analysis further illustrates that differences between IID-resampled and standard backtests align qualitatively with these drivers. Surprisingly, our results suggest that while IID resampling can disrupt temporal dependence, its resulting bias can often be tolerable. However, we highlight the need for structure-preserving resampling methods.
Keywords: Backtesting, Resampling Techniques, Rolling-window Mean-Variance, Sharpe Ratio Bias, Auto-covariance Structure, Portfolio Management / General Financial Assets
Complexity vs Empirical Score
- Math Complexity: 7.5/10
- Empirical Rigor: 6.0/10
- Quadrant: Holy Grail
- Why: The paper employs advanced mathematical derivations to establish bias bounds and analyze the impact of temporal dependence on IID resampling, placing it on the higher end of math complexity. It also uses simulations and real-world data for validation, though it lacks extensive backtesting code or large-scale datasets, keeping empirical rigor moderate to high.
flowchart TD
A["Research Goal:<br>Quantify Bias of IID Resampling<br>in Rolling-Window MV Backtests"] --> B["Key Methodology:<br>Theoretical Analysis + Simulations"]
B --> C["Inputs:<br>Historical Return Data<br>Auto-covariance Structure"]
C --> D["Computational Process:<br>1. Standard Rolling-Window MV<br>2. IID Resampled Rolling-Window MV<br>3. Compare Sharpe Ratio Estimates"]
D --> E["Key Findings:<br>Bias Mechanism: Disrupted<br>Train-Test Dependence<br>Bias Proportional to First-Lag<br>Autocorrelation"]
E --> F["Outcome:<br>Resampling Bias is Often<br>Tolerable vs. Estimation Noise<br>Recommends Structure-Preserving<br>Methods"]
F --> G((End))