Deep Generative Models for Synthetic Financial Data: Applications to Portfolio and Risk Modeling
ArXiv ID: 2512.21798 “View on arXiv”
Authors: Christophe D. Hounwanou, Yae Ulrich Gaba
Abstract
Synthetic financial data provides a practical solution to the privacy, accessibility, and reproducibility challenges that often constrain empirical research in quantitative finance. This paper investigates the use of deep generative models, specifically Time-series Generative Adversarial Networks (TimeGAN) and Variational Autoencoders (VAEs) to generate realistic synthetic financial return series for portfolio construction and risk modeling applications. Using historical daily returns from the S and P 500 as a benchmark, we generate synthetic datasets under comparable market conditions and evaluate them using statistical similarity metrics, temporal structure tests, and downstream financial tasks. The study shows that TimeGAN produces synthetic data with distributional shapes, volatility patterns, and autocorrelation behaviour that are close to those observed in real returns. When applied to mean–variance portfolio optimization, the resulting synthetic datasets lead to portfolio weights, Sharpe ratios, and risk levels that remain close to those obtained from real data. The VAE provides more stable training but tends to smooth extreme market movements, which affects risk estimation. Finally, the analysis supports the use of synthetic datasets as substitutes for real financial data in portfolio analysis and risk simulation, particularly when models are able to capture temporal dynamics. Synthetic data therefore provides a privacy-preserving, cost-effective, and reproducible tool for financial experimentation and model development.
Keywords: synthetic data, generative models, portfolio construction, risk modeling, TimeGAN
Complexity vs Empirical Score
- Math Complexity: 6.5/10
- Empirical Rigor: 6.0/10
- Quadrant: Holy Grail
- Why: The paper introduces formal mathematical frameworks for portfolio optimization and generative models, but focuses on empirical evaluation using real financial data (S&P 500) with statistical tests and downstream tasks, resulting in moderate scores in both dimensions.
flowchart TD
A["Research Goal:<br/>Generate realistic synthetic financial data<br/>for portfolio and risk modeling"] --> B["Data Used:<br/>Historical S&P 500 Daily Returns"]
B --> C{"Computational Process:<br/>Deep Generative Models"}
C --> D["Time-series GAN<br/>TimeGAN"]
C --> E["Variational Autoencoder<br/>VAE"]
D --> F["Downstream Evaluation"]
E --> F
F --> G["Key Findings:<br/>TimeGAN captures volatility & correlations<br/>VAE smooths extreme movements<br/>Synthetic data viable for financial tasks"]