false

Multi-Objective Bayesian Optimization of Deep Reinforcement Learning for Environmental, Social, and Governance (ESG) Financial Portfolio Management

Multi-Objective Bayesian Optimization of Deep Reinforcement Learning for Environmental, Social, and Governance (ESG) Financial Portfolio Management ArXiv ID: 2512.14992 “View on arXiv” Authors: M. Coronado-Vaca Abstract DRL agents circumvent the issue of classic models in the sense that they do not make assumptions like the financial returns being normally distributed and are able to deal with any information like the ESG score if they are configured to gain a reward that makes an objective better. However, the performance of DRL agents has high variability and it is very sensible to the value of their hyperparameters. Bayesian optimization is a class of methods that are suited to the optimization of black-box functions, that is, functions whose analytical expression is unknown, are noisy and expensive to evaluate. The hyperparameter tuning problem of DRL algorithms perfectly suits this scenario. As training an agent just for one objective is a very expensive period, requiring millions of timesteps, instead of optimizing an objective being a mixture of a risk-performance metric and an ESG metric, we choose to separate the objective and solve the multi-objective scenario to obtain an optimal Pareto set of portfolios representing the best tradeoff between the Sharpe ratio and the ESG mean score of the portfolio and leaving to the investor the choice of the final portfolio. We conducted our experiments using environments encoded within the OpenAI Gym, adapted from the FinRL platform. The experiments are carried out in the Dow Jones Industrial Average (DJIA) and the NASDAQ markets in terms of the Sharpe ratio achieved by the agent and the mean ESG score of the portfolio. We compare the performance of the obtained Pareto sets in hypervolume terms illustrating how portfolios are the best trade-off between the Sharpe ratio and mean ESG score. Also, we show the usefulness of our proposed methodology by comparing the obtained hypervolume with one achieved by a Random Search methodology on the DRL hyperparameter space. ...

December 17, 2025 · 3 min · Research Team

Portfolio Optimization via Transfer Learning

Portfolio Optimization via Transfer Learning ArXiv ID: 2511.21221 “View on arXiv” Authors: Kexin Wang, Xiaomeng Zhang, Xinyu Zhang Abstract Recognizing that asset markets generally exhibit shared informational characteristics, we develop a portfolio strategy based on transfer learning that leverages cross-market information to enhance the investment performance in the market of interest by forward validation. Our strategy asymptotically identifies and utilizes the informative datasets, selectively incorporating valid information while discarding the misleading information. This enables our strategy to achieve the maximum Sharpe ratio asymptotically. The promising performance is demonstrated by numerical studies and case studies of two portfolios: one consisting of stocks dual-listed in A-shares and H-shares, and another comprising equities from various industries of the United States. ...

November 26, 2025 · 2 min · Research Team

Reinforcement Learning for Portfolio Optimization with a Financial Goal and Defined Time Horizons

Reinforcement Learning for Portfolio Optimization with a Financial Goal and Defined Time Horizons ArXiv ID: 2511.18076 “View on arXiv” Authors: Fermat Leukam, Rock Stephane Koffi, Prudence Djagba Abstract This research proposes an enhancement to the innovative portfolio optimization approach using the G-Learning algorithm, combined with parametric optimization via the GIRL algorithm (G-learning approach to the setting of Inverse Reinforcement Learning) as presented by. The goal is to maximize portfolio value by a target date while minimizing the investor’s periodic contributions. Our model operates in a highly volatile market with a well-diversified portfolio, ensuring a low-risk level for the investor, and leverages reinforcement learning to dynamically adjust portfolio positions over time. Results show that we improved the Sharpe Ratio from 0.42, as suggested by recent studies using the same approach, to a value of 0.483 a notable achievement in highly volatile markets with diversified portfolios. The comparison between G-Learning and GIRL reveals that while GIRL optimizes the reward function parameters (e.g., lambda = 0.0012 compared to 0.002), its impact on portfolio performance remains marginal. This suggests that reinforcement learning methods, like G-Learning, already enable robust optimization. This research contributes to the growing development of reinforcement learning applications in financial decision-making, demonstrating that probabilistic learning algorithms can effectively align portfolio management strategies with investor needs. ...

November 22, 2025 · 2 min · Research Team

Aligning Multilingual News for Stock Return Prediction

Aligning Multilingual News for Stock Return Prediction ArXiv ID: 2510.19203 “View on arXiv” Authors: Yuntao Wu, Lynn Tao, Ing-Haw Cheng, Charles Martineau, Yoshio Nozawa, John Hull, Andreas Veneris Abstract News spreads rapidly across languages and regions, but translations may lose subtle nuances. We propose a method to align sentences in multilingual news articles using optimal transport, identifying semantically similar content across languages. We apply this method to align more than 140,000 pairs of Bloomberg English and Japanese news articles covering around 3500 stocks in Tokyo exchange over 2012-2024. Aligned sentences are sparser, more interpretable, and exhibit higher semantic similarity. Return scores constructed from aligned sentences show stronger correlations with realized stock returns, and long-short trading strategies based on these alignments achieve 10% higher Sharpe ratios than analyzing the full text sample. ...

October 22, 2025 · 2 min · Research Team

Finance-Grounded Optimization For Algorithmic Trading

Finance-Grounded Optimization For Algorithmic Trading ArXiv ID: 2509.04541 “View on arXiv” Authors: Kasymkhan Khubiev, Mikhail Semenov, Irina Podlipnova Abstract Deep Learning is evolving fast and integrates into various domains. Finance is a challenging field for deep learning, especially in the case of interpretable artificial intelligence (AI). Although classical approaches perform very well with natural language processing, computer vision, and forecasting, they are not perfect for the financial world, in which specialists use different metrics to evaluate model performance. We first introduce financially grounded loss functions derived from key quantitative finance metrics, including the Sharpe ratio, Profit-and-Loss (PnL), and Maximum Draw down. Additionally, we propose turnover regularization, a method that inherently constrains the turnover of generated positions within predefined limits. Our findings demonstrate that the proposed loss functions, in conjunction with turnover regularization, outperform the traditional mean squared error loss for return prediction tasks when evaluated using algorithmic trading metrics. The study shows that financially grounded metrics enhance predictive performance in trading strategies and portfolio optimization. ...

September 4, 2025 · 2 min · Research Team

Robust Market Making: To Quote, or not To Quote

Robust Market Making: To Quote, or not To Quote ArXiv ID: 2508.16588 “View on arXiv” Authors: Ziyi Wang, Carmine Ventre, Maria Polukarov Abstract Market making is a popular trading strategy, which aims to generate profit from the spread between the quotes posted at either side of the market. It has been shown that training market makers (MMs) with adversarial reinforcement learning allows to overcome the risks due to changing market conditions and to lead to robust performances. Prior work assumes, however, that MMs keep quoting throughout the trading process, but in practice this is not required, even for ``registered’’ MMs (that only need to satisfy quoting ratios defined by the market rules). In this paper, we build on this line of work and enrich the strategy space of the MM by allowing to occasionally not quote or provide single-sided quotes. Towards this end, in addition to the MM agents that provide continuous bid-ask quotes, we have designed two new agents with increasingly richer action spaces. The first has the option to provide bid-ask quotes or refuse to quote. The second has the option to provide bid-ask quotes, refuse to quote, or only provide single-sided ask or bid quotes. We employ a model-driven approach to empirically compare the performance of the continuously quoting MM with the two agents above in various types of adversarial environments. We demonstrate how occasional refusal to provide bid-ask quotes improves returns and/or Sharpe ratios. The quoting ratios of well-trained MMs can basically meet any market requirements, reaching up to 99.9$%$ in some cases. ...

August 7, 2025 · 2 min · Research Team

Is Causality Necessary for Efficient Portfolios? A Computational Perspective on Predictive Validity and Model Misspecification

Is Causality Necessary for Efficient Portfolios? A Computational Perspective on Predictive Validity and Model Misspecification ArXiv ID: 2507.23138 “View on arXiv” Authors: Alejandro Rodriguez Dominguez Abstract A recent line of research has argued that causal factor models are necessary for portfolio optimization, claiming that structurally misspecified models inevitably produce inverted signals and nonviable frontiers. This paper challenges that view. We show, through theoretical analysis, simulation counterexamples, and empirical validation, that predictive models can remain operationally valid even when structurally incorrect. Our contributions are fourfold. First, we distinguish between directional agreement, ranking, and calibration, proving that sign alignment alone does not ensure efficiency when signals are mis-scaled. Second, we establish that structurally misspecified signals can still yield convex and viable efficient frontiers provided they maintain directional alignment with true returns. Third, we derive and empirically confirm a quantitative scaling law that shows how Sharpe ratios contract smoothly with declining alignment, thereby clarifying the role of calibration within the efficient set. Fourth, we validate these results on real financial data, demonstrating that predictive signals, despite structural imperfections, can support coherent frontiers. These findings refine the debate on causality in portfolio modeling. While causal inference remains valuable for interpretability and risk attribution, it is not a prerequisite for optimization efficiency. Ultimately, what matters is the directional fidelity and calibration of predictive signals in relation to their intended use in robust portfolio construction. ...

July 30, 2025 · 2 min · Research Team

Quantum Stochastic Walks for Portfolio Optimization: Theory and Implementation on Financial Networks

Quantum Stochastic Walks for Portfolio Optimization: Theory and Implementation on Financial Networks ArXiv ID: 2507.03963 “View on arXiv” Authors: Yen Jui Chang, Wei-Ting Wang, Yun-Yuan Wang, Chen-Yu Liu, Kuan-Cheng Chen, Ching-Ray Chang Abstract Financial markets are noisy yet contain a latent graph-theoretic structure that can be exploited for superior risk-adjusted returns. We propose a quantum stochastic walk (QSW) optimizer that embeds assets in a weighted graph: nodes represent securities while edges encode the return-covariance kernel. Portfolio weights are derived from the walk’s stationary distribution. Three empirical studies support the approach. (i) For the top 100 S&P 500 constituents over 2016-2024, six scenario portfolios calibrated on 1- and 2-year windows lift the out-of-sample Sharpe ratio by up to 27% while cutting annual turnover from 480% (mean-variance) to 2-90%. (ii) A $5^{“4”}=625$-point grid search identifies a robust sweet spot, $α,λ\lesssim0.5$ and $ω\in[“0.2,0.4”]$, that delivers Sharpe $\approx0.97$ at $\le 5%$ turnover and Herfindahl-Hirschman index $\sim0.01$. (iii) Repeating the full grid on 50 random 100-stock subsets of the S&P 500 adds 31,350 back-tests: the best-per-draw QSW beats re-optimised mean-variance on Sharpe in 54% of cases and always wins on trading efficiency, with median turnover 36% versus 351%. Overall, QSW raises the annualized Sharpe ratio by 15% and cuts turnover by 90% relative to classical optimisation, all while respecting the UCITS 5/10/40 rule. These results show that hybrid quantum-classical dynamics can uncover non-linear dependencies overlooked by quadratic models and offer a practical, low-cost weighting engine for themed ETFs and other systematic mandates. ...

July 5, 2025 · 2 min · Research Team

Optimization Method of Multi-factor Investment Model Driven by Deep Learning for Risk Control

Optimization Method of Multi-factor Investment Model Driven by Deep Learning for Risk Control ArXiv ID: 2507.00332 “View on arXiv” Authors: Ruisi Li, Xinhui Gu Abstract Propose a deep learning driven multi factor investment model optimization method for risk control. By constructing a deep learning model based on Long Short Term Memory (LSTM) and combining it with a multi factor investment model, we optimize factor selection and weight determination to enhance the model’s adaptability and robustness to market changes. Empirical analysis shows that the LSTM model is significantly superior to the benchmark model in risk control indicators such as maximum retracement, Sharp ratio and value at risk (VaR), and shows strong adaptability and robustness in different market environments. Furthermore, the model is applied to the actual portfolio to optimize the asset allocation, which significantly improves the performance of the portfolio, provides investors with more scientific and accurate investment decision-making basis, and effectively balances the benefits and risks. ...

July 1, 2025 · 2 min · Research Team

Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study

Mean–Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study ArXiv ID: 2412.16175 “View on arXiv” Authors: Unknown Abstract We study continuous-time mean–variance portfolio selection in markets where stock prices are diffusion processes driven by observable factors that are also diffusion processes, yet the coefficients of these processes are unknown. Based on the recently developed reinforcement learning (RL) theory for diffusion processes, we present a general data-driven RL algorithm that learns the pre-committed investment strategy directly without attempting to learn or estimate the market coefficients. For multi-stock Black–Scholes markets without factors, we further devise a baseline algorithm and prove its performance guarantee by deriving a sublinear regret bound in terms of the Sharpe ratio. For performance enhancement and practical implementation, we modify the baseline algorithm and carry out an extensive empirical study to compare its performance, in terms of a host of common metrics, with a large number of widely employed portfolio allocation strategies on S&P 500 constituents. The results demonstrate that the proposed continuous-time RL strategy is consistently among the best, especially in a volatile bear market, and decisively outperforms the model-based continuous-time counterparts by significant margins. ...

December 8, 2024 · 2 min · Research Team