false

DiffVolume: Diffusion Models for Volume Generation in Limit Order Books

DiffVolume: Diffusion Models for Volume Generation in Limit Order Books ArXiv ID: 2508.08698 “View on arXiv” Authors: Zhuohan Wang, Carmine Ventre Abstract Modeling limit order books (LOBs) dynamics is a fundamental problem in market microstructure research. In particular, generating high-dimensional volume snapshots with strong temporal and liquidity-dependent patterns remains a challenging task, despite recent work exploring the application of Generative Adversarial Networks to LOBs. In this work, we propose a conditional \textbf{“Diff”}usion model for the generation of future LOB \textbf{“Volume”} snapshots (\textbf{“DiffVolume”}). We evaluate our model across three axes: (1) \textit{“Realism”}, where we show that DiffVolume, conditioned on past volume history and time of day, better reproduces statistical properties such as marginal distribution, spatial correlation, and autocorrelation decay; (2) \textit{“Counterfactual generation”}, allowing for controllable generation under hypothetical liquidity scenarios by additionally conditioning on a target future liquidity profile; and (3) \textit{“Downstream prediction”}, where we show that the synthetic counterfactual data from our model improves the performance of future liquidity forecasting models. Together, these results suggest that DiffVolume provides a powerful and flexible framework for realistic and controllable LOB volume generation. ...

August 12, 2025 · 2 min · Research Team

Identification of phase correlations in Financial Stock Market Turbulence

Identification of phase correlations in Financial Stock Market Turbulence ArXiv ID: 2508.20105 “View on arXiv” Authors: Kiran Sharma, Abhijit Dutta, Rupak Mukherjee Abstract The basis of arbitrage methods depends on the circulation of information within the framework of the financial market. Following the work of Modigliani and Miller, it has become a vital part of discussions related to the study of financial networks and predictions. The emergence of the efficient market hypothesis by Fama, Fisher, Jensen and Roll in the early 1970s opened up the door for discussion of information affecting the price in the market and thereby creating asymmetries and price distortion. Whenever the micro and macroeconomic factors change, there is a high probability of information asymmetry in the market, and this asymmetry of information creates turbulence in the market. The analysis and interpretation of turbulence caused by the differences in information is crucial in understanding the nature of the stock market using price patterns and fluctuations. Even so, the traditional approaches are not capable of analyzing the cyclical price fluctuations outside the realm of wave structures of securities prices, and a proper and effective technique to assess the nature of the Financial market. Consequently, the analysis of the price fluctuations by applying the theories and computational techniques of mathematical physics ensures that such cycles are disintegrated, and the outcome of decomposed cycles is elucidated to understand the impression of the information on the genesis and discovery of price and to assess the nature of stock market turbulence. In this regard, the paper will provide a framework of Spectrum analysis that decomposes the pricing patterns and is capable of determining the pricing behavior, eventually assisting in examining the nature of turbulence in the National Stock Exchange of India. ...

August 12, 2025 · 3 min · Research Team

A Heterogeneous Spatiotemporal GARCH Model: A Predictive Framework for Volatility in Financial Networks

A Heterogeneous Spatiotemporal GARCH Model: A Predictive Framework for Volatility in Financial Networks ArXiv ID: 2508.20101 “View on arXiv” Authors: Atika Aouri, Philipp Otto Abstract We introduce a heterogeneous spatiotemporal GARCH model for geostatistical data or processes on networks, e.g., for modelling and predicting financial return volatility across firms in a latent spatial framework. The model combines classical GARCH(p, q) dynamics with spatially correlated innovations and spatially varying parameters, estimated using local likelihood methods. Spatial dependence is introduced through a geostatistical covariance structure on the innovation process, capturing contemporaneous cross-sectional correlation. This dependence propagates into the volatility dynamics via the recursive GARCH structure, allowing the model to reflect spatial spillovers and contagion effects in a parsimonious and interpretable way. In addition, this modelling framework allows for spatial volatility predictions at unobserved locations. In an empirical application, we demonstrate how the model can be applied to financial stock networks. Unlike other spatial GARCH models, our framework does not rely on a fixed adjacency matrix; instead, spatial proximity is defined in a proxy space constructed from balance sheet characteristics. Using daily log returns of 50 publicly listed firms over a one-year period, we evaluate the model’s predictive performance in a cross-validation study. ...

August 11, 2025 · 2 min · Research Team

Optimal Fees for Liquidity Provision in Automated Market Makers

Optimal Fees for Liquidity Provision in Automated Market Makers ArXiv ID: 2508.08152 “View on arXiv” Authors: Steven Campbell, Philippe Bergault, Jason Milionis, Marcel Nutz Abstract Passive liquidity providers (LPs) in automated market makers (AMMs) face losses due to adverse selection (LVR), which static trading fees often fail to offset in practice. We study the key determinants of LP profitability in a dynamic reduced-form model where an AMM operates in parallel with a centralized exchange (CEX), traders route their orders optimally to the venue offering the better price, and arbitrageurs exploit price discrepancies. Using large-scale simulations and real market data, we analyze how LP profits vary with market conditions such as volatility and trading volume, and characterize the optimal AMM fee as a function of these conditions. We highlight the mechanisms driving these relationships through extensive comparative statics, and confirm the model’s relevance through market data calibration. A key trade-off emerges: fees must be low enough to attract volume, yet high enough to earn sufficient revenues and mitigate arbitrage losses. We find that under normal market conditions, the optimal AMM fee is competitive with the trading cost on the CEX and remarkably stable, whereas in periods of very high volatility, a high fee protects passive LPs from severe losses. These findings suggest that a threshold-type dynamic fee schedule is both robust enough to market conditions and improves LP outcomes. ...

August 11, 2025 · 2 min · Research Team

Unwitting Markowitz' Simplification of Portfolio Random Returns

Unwitting Markowitz’ Simplification of Portfolio Random Returns ArXiv ID: 2508.08148 “View on arXiv” Authors: Victor Olkhov Abstract In his famous paper, Markowitz (1952) derived the dependence of portfolio random returns on the random returns of its securities. This result allowed Markowitz to obtain his famous expression for portfolio variance. We show that Markowitz’s equation for portfolio random returns and the expression for portfolio variance, which results from it, describe a simplified approximation of the real markets when the volumes of all consecutive trades with the securities are assumed to be constant during the averaging interval. To show this, we consider the investor who doesn’t trade shares of securities of his portfolio. The investor only observes the trades made in the market with his securities and derives the time series that model the trades with his portfolio as with a single security. These time series describe the portfolio return and variance in exactly the same way as the time series of trades with securities describe their returns and variances. The portfolio time series reveal the dependence of portfolio random returns on the random returns of securities and on the ratio of the random volumes of trades with the securities to the random volumes of trades with the portfolio. If we assume that all volumes of the consecutive trades with securities are constant, obtain Markowitz’s equation for the portfolio’s random returns. The market-based variance of the portfolio accounts for the effects of random fluctuations of the volumes of the consecutive trades. The use of Markowitz variance may give significantly higher or lower estimates than market-based portfolio variance. ...

August 11, 2025 · 2 min · Research Team

AlphaEval: A Comprehensive and Efficient Evaluation Framework for Formula Alpha Mining

AlphaEval: A Comprehensive and Efficient Evaluation Framework for Formula Alpha Mining ArXiv ID: 2508.13174 “View on arXiv” Authors: Hongjun Ding, Binqi Chen, Jinsheng Huang, Taian Guo, Zhengyang Mao, Guoyi Shao, Lutong Zou, Luchen Liu, Ming Zhang Abstract Formula alpha mining, which generates predictive signals from financial data, is critical for quantitative investment. Although various algorithmic approaches-such as genetic programming, reinforcement learning, and large language models-have significantly expanded the capacity for alpha discovery, systematic evaluation remains a key challenge. Existing evaluation metrics predominantly include backtesting and correlation-based measures. Backtesting is computationally intensive, inherently sequential, and sensitive to specific strategy parameters. Correlation-based metrics, though efficient, assess only predictive ability and overlook other crucial properties such as temporal stability, robustness, diversity, and interpretability. Additionally, the closed-source nature of most existing alpha mining models hinders reproducibility and slows progress in this field. To address these issues, we propose AlphaEval, a unified, parallelizable, and backtest-free evaluation framework for automated alpha mining models. AlphaEval assesses the overall quality of generated alphas along five complementary dimensions: predictive power, stability, robustness to market perturbations, financial logic, and diversity. Extensive experiments across representative alpha mining algorithms demonstrate that AlphaEval achieves evaluation consistency comparable to comprehensive backtesting, while providing more comprehensive insights and higher efficiency. Furthermore, AlphaEval effectively identifies superior alphas compared to traditional single-metric screening approaches. All implementations and evaluation tools are open-sourced to promote reproducibility and community engagement. ...

August 10, 2025 · 2 min · Research Team

American Option Pricing Under Time-Varying Rough Volatility: A Signature-Based Hybrid Framework

American Option Pricing Under Time-Varying Rough Volatility: A Signature-Based Hybrid Framework ArXiv ID: 2508.07151 “View on arXiv” Authors: Roshan Shah Abstract We introduce a modular framework that extends the signature method to handle American option pricing under evolving volatility roughness. Building on the signature-pricing framework of Bayer et al. (2025), we add three practical innovations. First, we train a gradient-boosted ensemble to estimate the time-varying Hurst parameter H(t) from rolling windows of recent volatility data. Second, we feed these forecasts into a regime switch that chooses either a rough Bergomi or a calibrated Heston simulator, depending on the predicted roughness. Third, we accelerate signature-kernel evaluations with Random Fourier Features (RFF), cutting computational cost while preserving accuracy. Empirical tests on S&P 500 equity-index options reveal that the assumption of persistent roughness is frequently violated, particularly during stable market regimes when H(t) approaches or exceeds 0.5. The proposed hybrid framework provides a flexible structure that adapts to changing volatility roughness, improving performance over fixed-roughness baselines and reducing duality gaps in some regimes. By integrating a dynamic Hurst parameter estimation pipeline with efficient kernel approximations, we propose to enable tractable, real-time pricing of American options in dynamic volatility environments. ...

August 10, 2025 · 2 min · Research Team

Can LLMs Identify Tax Abuse?

Can LLMs Identify Tax Abuse? ArXiv ID: 2508.20097 “View on arXiv” Authors: Andrew Blair-Stanek, Nils Holzenberger, Benjamin Van Durme Abstract We investigate whether large language models can discover and analyze U.S. tax-minimization strategies. This real-world domain challenges even seasoned human experts, and progress can reduce tax revenue lost from well-advised, wealthy taxpayers. We evaluate the most advanced LLMs on their ability to (1) interpret and verify tax strategies, (2) fill in gaps in partially specified strategies, and (3) generate complete, end-to-end strategies from scratch. This domain should be of particular interest to the LLM reasoning community: unlike synthetic challenge problems or scientific reasoning tasks, U.S. tax law involves navigating hundreds of thousands of pages of statutes, case law, and administrative guidance, all updated regularly. Notably, LLM-based reasoning identified an entirely novel tax strategy, highlighting these models’ potential to revolutionize tax agencies’ fight against tax abuse. ...

August 10, 2025 · 2 min · Research Team

Deformation of semi-circle law for the correlated time series and Phase transition

Deformation of semi-circle law for the correlated time series and Phase transition ArXiv ID: 2508.07192 “View on arXiv” Authors: Masato Hisakado, Takuya Kaneko Abstract We study the eigenvalue of the Wigner random matrix, which is created from a time series with temporal correlation. We observe the deformation of the semi-circle law which is similar to the eigenvalue distribution of the Wigner-Lèvy matrix. The distribution has a longer tail and a higher peak than the semi-circle law. In the absence of correlation, the eigenvalue distribution of the Wigner random matrix is known as the semi-circle law in the large $N$ limit. When there is a temporal correlation, the eigenvalue distribution converges to the deformed semi-circle law which has a longer tail and a higher peak than the semi-circle law. When we created the Wigner matrix using financial time series, we test the normal i.i.d. using the Wigner matrix. We observe the difference from the semi-circle law for FX time series. The difference from the semi-circle law is explained by the temporal correlation. Here, we discuss the moments of distribution and convergence to the deformed semi-circle law with a temporal correlation. We discuss the phase transition and compare to the Marchenko-Pastur distribution(MPD) case. ...

August 10, 2025 · 2 min · Research Team

Event-Aware Sentiment Factors from LLM-Augmented Financial Tweets: A Transparent Framework for Interpretable Quant Trading

Event-Aware Sentiment Factors from LLM-Augmented Financial Tweets: A Transparent Framework for Interpretable Quant Trading ArXiv ID: 2508.07408 “View on arXiv” Authors: Yueyi Wang, Qiyao Wei Abstract In this study, we wish to showcase the unique utility of large language models (LLMs) in financial semantic annotation and alpha signal discovery. Leveraging a corpus of company-related tweets, we use an LLM to automatically assign multi-label event categories to high-sentiment-intensity tweets. We align these labeled sentiment signals with forward returns over 1-to-7-day horizons to evaluate their statistical efficacy and market tradability. Our experiments reveal that certain event labels consistently yield negative alpha, with Sharpe ratios as low as -0.38 and information coefficients exceeding 0.05, all statistically significant at the 95% confidence level. This study establishes the feasibility of transforming unstructured social media text into structured, multi-label event variables. A key contribution of this work is its commitment to transparency and reproducibility; all code and methodologies are made publicly available. Our results provide compelling evidence that social media sentiment is a valuable, albeit noisy, signal in financial forecasting and underscore the potential of open-source frameworks to democratize algorithmic trading research. ...

August 10, 2025 · 2 min · Research Team