Mean-Variance Portfolio Selection

Adaptive Partitioning and Learning for Stochastic Control of Diffusion Processes

Adaptive Partitioning and Learning for Stochastic Control of Diffusion Processes ArXiv ID: 2512.14991 “View on arXiv” Authors: Hanqing Jin, Renyuan Xu, Yanzhao Yang Abstract We study reinforcement learning for controlled diffusion processes with unbounded continuous state spaces, bounded continuous actions, and polynomially growing rewards: settings that arise naturally in finance, economics, and operations research. To overcome the challenges of continuous and high-dimensional domains, we introduce a model-based algorithm that adaptively partitions the joint state-action space. The algorithm maintains estimators of drift, volatility, and rewards within each partition, refining the discretization whenever estimation bias exceeds statistical confidence. This adaptive scheme balances exploration and approximation, enabling efficient learning in unbounded domains. Our analysis establishes regret bounds that depend on the problem horizon, state dimension, reward growth order, and a newly defined notion of zooming dimension tailored to unbounded diffusion processes. The bounds recover existing results for bounded settings as a special case, while extending theoretical guarantees to a broader class of diffusion-type problems. Finally, we validate the effectiveness of our approach through numerical experiments, including applications to high-dimensional problems such as multi-asset mean-variance portfolio selection. ...

Solving dynamic portfolio selection problems via score-based diffusion models

Solving dynamic portfolio selection problems via score-based diffusion models ArXiv ID: 2507.09916 “View on arXiv” Authors: Ahmad Aghapour, Erhan Bayraktar, Fengyi Yuan Abstract In this paper, we tackle the dynamic mean-variance portfolio selection problem in a {"\it model-free"} manner, based on (generative) diffusion models. We propose using data sampled from the real model $\mathbb P$ (which is unknown) with limited size to train a generative model $\mathbb Q$ (from which we can easily and adequately sample). With adaptive training and sampling methods that are tailor-made for time series data, we obtain quantification bounds between $\mathbb P$ and $\mathbb Q$ in terms of the adapted Wasserstein metric $\mathcal A W_2$. Importantly, the proposed adapted sampling method also facilitates {"\it conditional sampling"}. In the second part of this paper, we provide the stability of the mean-variance portfolio optimization problems in $\mathcal A W _2$. Then, combined with the error bounds and the stability result, we propose a policy gradient algorithm based on the generative environment, in which our innovative adapted sampling method provides approximate scenario generators. We illustrate the performance of our algorithm on both simulated and real data. For real data, the algorithm based on the generative environment produces portfolios that beat several important baselines, including the Markowitz portfolio, the equal weight (naive) portfolio, and S&P 500. ...

Dynamic Factor Model-Based Multiperiod Mean-Variance Portfolio Selection with Portfolio Constraints

Dynamic Factor Model-Based Multiperiod Mean-Variance Portfolio Selection with Portfolio Constraints ArXiv ID: 2502.17915 “View on arXiv” Authors: Unknown Abstract Motivated by practical applications, we explore the constrained multi-period mean-variance portfolio selection problem within a market characterized by a dynamic factor model. This model captures predictability in asset returns driven by state variables and incorporates cone-type portfolio constraints that are crucial in practice. The model is broad enough to encompass various dynamic factor frameworks, including practical considerations such as no-short-selling and cardinality constraints. We derive a semi-analytical optimal solution using dynamic programming, revealing it as a piecewise linear feedback policy to wealth, with all factors embedded within the allocation vectors. Additionally, we demonstrate that the portfolio policies are determined by two specific stochastic processes resulting from the stochastic optimizations, for which we provide detailed algorithms. These processes reflect the investor’s assessment of future investment opportunities and play a crucial role in characterizing the time consistency and efficiency of the optimal policy through the variance-optimal signed supermartingale measure of the market. We present numerical examples that illustrate the model’s application in various settings. Using real market data, we investigate how the factors influence portfolio policies and demonstrate that incorporating the factor structure may enhance out-of-sample performance. ...

Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study

Mean–Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study ArXiv ID: 2412.16175 “View on arXiv” Authors: Unknown Abstract We study continuous-time mean–variance portfolio selection in markets where stock prices are diffusion processes driven by observable factors that are also diffusion processes, yet the coefficients of these processes are unknown. Based on the recently developed reinforcement learning (RL) theory for diffusion processes, we present a general data-driven RL algorithm that learns the pre-committed investment strategy directly without attempting to learn or estimate the market coefficients. For multi-stock Black–Scholes markets without factors, we further devise a baseline algorithm and prove its performance guarantee by deriving a sublinear regret bound in terms of the Sharpe ratio. For performance enhancement and practical implementation, we modify the baseline algorithm and carry out an extensive empirical study to compare its performance, in terms of a host of common metrics, with a large number of widely employed portfolio allocation strategies on S&P 500 constituents. The results demonstrate that the proposed continuous-time RL strategy is consistently among the best, especially in a volatile bear market, and decisively outperforms the model-based continuous-time counterparts by significant margins. ...

Mean-variance portfolio selection in jump-diffusion model under no-shorting constraint: A viscosity solution approach

Mean-variance portfolio selection in jump-diffusion model under no-shorting constraint: A viscosity solution approach ArXiv ID: 2406.03709 “View on arXiv” Authors: Unknown Abstract This paper concerns a continuous time mean-variance (MV) portfolio selection problem in a jump-diffusion financial model with no-shorting trading constraint. The problem is reduced to two subproblems: solving a stochastic linear-quadratic (LQ) control problem under control constraint, and finding a maximal point of a real function. Based on a two-dimensional fully coupled ordinary differential equation (ODE), we construct an explicit viscosity solution to the Hamilton-Jacobi-Bellman equation of the constrained LQ problem. Together with the Meyer-Itô formula and a verification procedure, we obtain the optimal feedback controls of the constrained LQ problem and the original MV problem, which corrects the flawed results in some existing literatures. In addition, closed-form efficient portfolio and efficient frontier are derived. In the end, we present several examples where the two-dimensional ODE is decoupled. ...