ProteuS: A Generative Approach for Simulating Concept Drift in Financial Markets
ArXiv ID: 2509.11844 “View on arXiv”
Authors: Andrés L. Suárez-Cetrulo, Alejandro Cervantes, David Quintana
Abstract
Financial markets are complex, non-stationary systems where the underlying data distributions can shift over time, a phenomenon known as regime changes, as well as concept drift in the machine learning literature. These shifts, often triggered by major economic events, pose a significant challenge for traditional statistical and machine learning models. A fundamental problem in developing and validating adaptive algorithms is the lack of a ground truth in real-world financial data, making it difficult to evaluate a model’s ability to detect and recover from these drifts. This paper addresses this challenge by introducing a novel framework, named ProteuS, for generating semi-synthetic financial time series with pre-defined structural breaks. Our methodology involves fitting ARMA-GARCH models to real-world ETF data to capture distinct market regimes, and then simulating realistic, gradual, and abrupt transitions between them. The resulting datasets, which include a comprehensive set of technical indicators, provide a controlled environment with a known ground truth of regime changes. An analysis of the generated data confirms the complexity of the task, revealing significant overlap between the different market states. We aim to provide the research community with a tool for the rigorous evaluation of concept drift detection and adaptation mechanisms, paving the way for more robust financial forecasting models.
Keywords: Regime Change Detection, Concept Drift, Time Series Simulation, ARMA-GARCH, Semi-Synthetic Data, ETFs
Complexity vs Empirical Score
- Math Complexity: 6.5/10
- Empirical Rigor: 8.0/10
- Quadrant: Holy Grail
- Why: The paper employs advanced statistical models (ARMA-GARCH) and a structured generative framework, demonstrating moderate-to-high mathematical complexity. It is highly empirical, with a concrete implementation (GitHub repo), simulated datasets for backtesting, and a clear methodology for evaluating drift detection mechanisms.
flowchart TD
A["Research Goal<br>Simulate concept drift with<br>known ground truth for<br>financial ML models"] --> B["Methodology<br>ARMA-GARCH fitting &<br>Transition simulation"]
B --> C["Inputs<br>Historical ETF Data"]
C --> D["Computations<br>1. Fit ARMA-GARCH models<br>to distinct regimes<br>2. Simulate abrupt/gradual<br>transitions"]
D --> E["Output<br>Semi-Synthetic Time Series<br>with pre-defined structural breaks"]
E --> F["Key Findings<br>1. Validated drift detection<br>2. Confirmed significant overlap<br>between market states<br>3. Provided controlled<br>benchmark dataset"]