false

History Is Not Enough: An Adaptive Dataflow System for Financial Time-Series Synthesis

History Is Not Enough: An Adaptive Dataflow System for Financial Time-Series Synthesis ArXiv ID: 2601.10143 “View on arXiv” Authors: Haochong Xia, Yao Long Teng, Regan Tan, Molei Qin, Xinrun Wang, Bo An Abstract In quantitative finance, the gap between training and real-world performance-driven by concept drift and distributional non-stationarity-remains a critical obstacle for building reliable data-driven systems. Models trained on static historical data often overfit, resulting in poor generalization in dynamic markets. The mantra “History Is Not Enough” underscores the need for adaptive data generation that learns to evolve with the market rather than relying solely on past observations. We present a drift-aware dataflow system that integrates machine learning-based adaptive control into the data curation process. The system couples a parameterized data manipulation module comprising single-stock transformations, multi-stock mix-ups, and curation operations, with an adaptive planner-scheduler that employs gradient-based bi-level optimization to control the system. This design unifies data augmentation, curriculum learning, and data workflow management under a single differentiable framework, enabling provenance-aware replay and continuous data quality monitoring. Extensive experiments on forecasting and reinforcement learning trading tasks demonstrate that our framework enhances model robustness and improves risk-adjusted returns. The system provides a generalizable approach to adaptive data management and learning-guided workflow automation for financial data. ...

January 15, 2026 · 2 min · Research Team

Robust Optimization in Causal Models and G-Causal Normalizing Flows

Robust Optimization in Causal Models and G-Causal Normalizing Flows ArXiv ID: 2510.15458 “View on arXiv” Authors: Gabriele Visentin, Patrick Cheridito Abstract In this paper, we show that interventionally robust optimization problems in causal models are continuous under the $G$-causal Wasserstein distance, but may be discontinuous under the standard Wasserstein distance. This highlights the importance of using generative models that respect the causal structure when augmenting data for such tasks. To this end, we propose a new normalizing flow architecture that satisfies a universal approximation property for causal structural models and can be efficiently trained to minimize the $G$-causal Wasserstein distance. Empirically, we demonstrate that our model outperforms standard (non-causal) generative models in data augmentation for causal regression and mean-variance portfolio optimization in causal factor models. ...

October 17, 2025 · 2 min · Research Team

CTBench: Cryptocurrency Time Series Generation Benchmark

CTBench: Cryptocurrency Time Series Generation Benchmark ArXiv ID: 2508.02758 “View on arXiv” Authors: Yihao Ang, Qiang Wang, Qiang Huang, Yifan Bao, Xinyu Xi, Anthony K. H. Tung, Chen Jin, Zhiyong Huang Abstract Synthetic time series are essential tools for data augmentation, stress testing, and algorithmic prototyping in quantitative finance. However, in cryptocurrency markets, characterized by 24/7 trading, extreme volatility, and rapid regime shifts, existing Time Series Generation (TSG) methods and benchmarks often fall short, jeopardizing practical utility. Most prior work (1) targets non-financial or traditional financial domains, (2) focuses narrowly on classification and forecasting while neglecting crypto-specific complexities, and (3) lacks critical financial evaluations, particularly for trading applications. To address these gaps, we introduce \textsf{“CTBench”}, the first comprehensive TSG benchmark tailored for the cryptocurrency domain. \textsf{“CTBench”} curates an open-source dataset from 452 tokens and evaluates TSG models across 13 metrics spanning 5 key dimensions: forecasting accuracy, rank fidelity, trading performance, risk assessment, and computational efficiency. A key innovation is a dual-task evaluation framework: (1) the \emph{“Predictive Utility”} task measures how well synthetic data preserves temporal and cross-sectional patterns for forecasting, while (2) the \emph{“Statistical Arbitrage”} task assesses whether reconstructed series support mean-reverting signals for trading. We benchmark eight representative models from five methodological families over four distinct market regimes, uncovering trade-offs between statistical fidelity and real-world profitability. Notably, \textsf{“CTBench”} offers model ranking analysis and actionable guidance for selecting and deploying TSG models in crypto analytics and strategy development. ...

August 3, 2025 · 2 min · Research Team

Simulating Liquidity: Agent-Based Modeling of Illiquid Markets for Fractional Ownership

Simulating Liquidity: Agent-Based Modeling of Illiquid Markets for Fractional Ownership ArXiv ID: 2411.13381 “View on arXiv” Authors: Unknown Abstract This research investigates liquidity dynamics in fractional ownership markets, focusing on illiquid alternative investments traded on a FinTech platform. By leveraging empirical data and employing agent-based modeling (ABM), the study simulates trading behaviors in sell offer-driven systems, providing a foundation for generating insights into how different market structures influence liquidity. The ABM-based simulation model provides a data augmentation environment which allows for the exploration of diverse trading architectures and rules, offering an alternative to direct experimentation. This approach bridges academic theory and practical application, supported by collaboration with industry and Swiss federal funding. The paper lays the foundation for planned extensions, including the identification of a liquidity-maximizing trading environment and the design of a market maker, by simulating the current functioning of the investment platform using an ABM specified with empirical data. ...

November 20, 2024 · 2 min · Research Team

DiffsFormer: A Diffusion Transformer on Stock Factor Augmentation

DiffsFormer: A Diffusion Transformer on Stock Factor Augmentation ArXiv ID: 2402.06656 “View on arXiv” Authors: Unknown Abstract Machine learning models have demonstrated remarkable efficacy and efficiency in a wide range of stock forecasting tasks. However, the inherent challenges of data scarcity, including low signal-to-noise ratio (SNR) and data homogeneity, pose significant obstacles to accurate forecasting. To address this issue, we propose a novel approach that utilizes artificial intelligence-generated samples (AIGS) to enhance the training procedures. In our work, we introduce the Diffusion Model to generate stock factors with Transformer architecture (DiffsFormer). DiffsFormer is initially trained on a large-scale source domain, incorporating conditional guidance so as to capture global joint distribution. When presented with a specific downstream task, we employ DiffsFormer to augment the training procedure by editing existing samples. This editing step allows us to control the strength of the editing process, determining the extent to which the generated data deviates from the target domain. To evaluate the effectiveness of DiffsFormer augmented training, we conduct experiments on the CSI300 and CSI800 datasets, employing eight commonly used machine learning models. The proposed method achieves relative improvements of 7.2% and 27.8% in annualized return ratio for the respective datasets. Furthermore, we perform extensive experiments to gain insights into the functionality of DiffsFormer and its constituent components, elucidating how they address the challenges of data scarcity and enhance the overall model performance. Our research demonstrates the efficacy of leveraging AIGS and the DiffsFormer architecture to mitigate data scarcity in stock forecasting tasks. ...

February 5, 2024 · 2 min · Research Team