false

LiveTradeBench: Seeking Real-World Alpha with Large Language Models

LiveTradeBench: Seeking Real-World Alpha with Large Language Models ArXiv ID: 2511.03628 “View on arXiv” Authors: Haofei Yu, Fenghai Li, Jiaxuan You Abstract Large language models (LLMs) achieve strong performance across benchmarks–from knowledge quizzes and math reasoning to web-agent tasks–but these tests occur in static settings, lacking real dynamics and uncertainty. Consequently, they evaluate isolated reasoning or problem-solving rather than decision-making under uncertainty. To address this, we introduce LiveTradeBench, a live trading environment for evaluating LLM agents in realistic and evolving markets. LiveTradeBench follows three design principles: (i) Live data streaming of market prices and news, eliminating dependence on offline backtesting and preventing information leakage while capturing real-time uncertainty; (ii) a portfolio-management abstraction that extends control from single-asset actions to multi-asset allocation, integrating risk management and cross-asset reasoning; and (iii) multi-market evaluation across structurally distinct environments–U.S. stocks and Polymarket prediction markets–differing in volatility, liquidity, and information flow. At each step, an agent observes prices, news, and its portfolio, then outputs percentage allocations that balance risk and return. Using LiveTradeBench, we run 50-day live evaluations of 21 LLMs across families. Results show that (1) high LMArena scores do not imply superior trading outcomes; (2) models display distinct portfolio styles reflecting risk appetite and reasoning dynamics; and (3) some LLMs effectively leverage live signals to adapt decisions. These findings expose a gap between static evaluation and real-world competence, motivating benchmarks that test sequential decision making and consistency under live uncertainty. ...

November 5, 2025 · 2 min · Research Team

When Reasoning Fails: Evaluating 'Thinking' LLMs for Stock Prediction

When Reasoning Fails: Evaluating ‘Thinking’ LLMs for Stock Prediction ArXiv ID: 2511.08608 “View on arXiv” Authors: Rakeshkumar H Sodha Abstract Problem. “Thinking” LLMs (TLLMs) expose explicit or hidden reasoning traces and are widely believed to generalize better on complex tasks than direct LLMs. Whether this promise carries to noisy, heavy-tailed and regime-switching financial data remains unclear. Approach. Using Indian equities (NIFTY constituents), we run a rolling 48m/1m walk-forward evaluation at horizon k = 1 day and dial cross-sectional complexity via the universe size U in {“5, 11, 21, 36”} while keeping the reasoning budget fixed (B = 512 tokens) for the TLLM. We compare a direct LLM (gpt-4o-mini), a TLLM (gpt-5), and classical learners (ridge, random forest) on cross-sectional ranking loss 1 - IC, MSE, and long/short backtests with realistic costs. Statistical confidence is measured with Diebold-Mariano, Pesaran-Timmermann, and SPA tests. Main findings. (i) As U grows under a fixed budget B, the TLLM’s ranking quality deteriorates, whereas the direct LLM remains flat and classical baselines are stable. (ii) TLLM variance is higher, requiring ex-post calibration (winsorization and blending) for stability. (iii) Portfolio results under transaction costs do not support a net advantage for the TLLM. Hypotheses. Our results are consistent with the following testable hypotheses: H1 (Capacity-Complexity Mismatch): for fixed B, TLLM accuracy degrades superlinearly in cross-sectional complexity. H2 (Reasoning Variance): TLLM outputs exhibit higher dispersion date-by-date than direct LLMs, increasing error bars and turnover. H3 (Domain Misfit): next-token prediction objectives and token-budgeted inference are poorly aligned with heavy-tailed, weakly predictable stock returns. Implication. In our setting, “thinking” LLMs are not yet ready to replace classical or direct methods for short-horizon stock ranking; scaling the reasoning budget and/or re-aligning objectives appears necessary. ...

November 5, 2025 · 3 min · Research Team

High-Dimensional Spatial Arbitrage Pricing Theory with Heterogeneous Interactions

High-Dimensional Spatial Arbitrage Pricing Theory with Heterogeneous Interactions ArXiv ID: 2511.01271 “View on arXiv” Authors: Zhaoxing Gao, Sihan Tu, Ruey S. Tsay Abstract This paper investigates estimation and inference of a Spatial Arbitrage Pricing Theory (SAPT) model that integrates spatial interactions with multi-factor analysis, accommodating both observable and latent factors. Building on the classical mean-variance analysis, we introduce a class of Spatial Capital Asset Pricing Models (SCAPM) that account for spatial effects in high-dimensional assets, where we define {"\it spatial rho"} as a counterpart to market beta in CAPM. We then extend SCAPM to a general SAPT framework under a {"\it complete"} market setting by incorporating multiple factors. For SAPT with observable factors, we propose a generalized shrinkage Yule-Walker (SYW) estimation method that integrates ridge regression to estimate spatial and factor coefficients. When factors are latent, we first apply an autocovariance-based eigenanalysis to extract factors, then employ the SYW method using the estimated factors. We establish asymptotic properties for these estimators under high-dimensional settings where both the dimension and sample size diverge. Finally, we use simulated and real data examples to demonstrate the efficacy and usefulness of the proposed model and method. ...

November 3, 2025 · 2 min · Research Team

The Omniscient, yet Lazy, Investor

The Omniscient, yet Lazy, Investor ArXiv ID: 2510.24467 “View on arXiv” Authors: Stanisław M. S. Halkiewicz Abstract We formalize the paradox of an omniscient yet lazy investor - a perfectly informed agent who trades infrequently due to execution or computational frictions. Starting from a deterministic geometric construction, we derive a closed-form expected profit function linking trading frequency, execution cost, and path roughness. We prove existence and uniqueness of the optimal trading frequency and show that this optimum can be interpreted through the fractal dimension of the price path. A stochastic extension under fractional Brownian motion provides analytical expressions for the optimal interval and comparative statics with respect to the Hurst exponent. Empirical illustrations on equity data confirm the theoretical scaling behavior. ...

October 28, 2025 · 2 min · Research Team

3S-Trader: A Multi-LLM Framework for Adaptive Stock Scoring, Strategy, and Selection in Portfolio Optimization

3S-Trader: A Multi-LLM Framework for Adaptive Stock Scoring, Strategy, and Selection in Portfolio Optimization ArXiv ID: 2510.17393 “View on arXiv” Authors: Kefan Chen, Hussain Ahmad, Diksha Goel, Claudia Szabo Abstract Large Language Models (LLMs) have recently gained popularity in stock trading for their ability to process multimodal financial data. However, most existing methods focus on single-stock trading and lack the capacity to reason over multiple candidates for portfolio construction. Moreover, they typically lack the flexibility to revise their strategies in response to market shifts, limiting their adaptability in real-world trading. To address these challenges, we propose 3S-Trader, a training-free framework that incorporates scoring, strategy, and selection modules for stock portfolio construction. The scoring module summarizes each stock’s recent signals into a concise report covering multiple scoring dimensions, enabling efficient comparison across candidates. The strategy module analyzes historical strategies and overall market conditions to iteratively generate an optimized selection strategy. Based on this strategy, the selection module identifies and assembles a portfolio by choosing stocks with higher scores in relevant dimensions. We evaluate our framework across four distinct stock universes, including the Dow Jones Industrial Average (DJIA) constituents and three sector-specific stock sets. Compared with existing multi-LLM frameworks and time-series-based baselines, 3S-Trader achieves the highest accumulated return of 131.83% on DJIA constituents with a Sharpe ratio of 0.31 and Calmar ratio of 11.84, while also delivering consistently strong results across other sectors. ...

October 20, 2025 · 2 min · Research Team

Semi-analytical pricing of American options with hybrid dividends via integral equations and the GIT method

Semi-analytical pricing of American options with hybrid dividends via integral equations and the GIT method ArXiv ID: 2510.18159 “View on arXiv” Authors: Andrey Itkin Abstract This paper introduces a semi-analytical method for pricing American options on assets (stocks, ETFs) that pay discrete and/or continuous dividends. The problem is notoriously complex because discrete dividends create abrupt price drops and affect the optimal exercise timing, making traditional continuous-dividend models unsuitable. Our approach utilizes the Generalized Integral Transform (GIT) method introduced by the author and his co-authors in a number of papers, which transforms the pricing problem from a complex partial differential equation with a free boundary into an integral Volterra equation of the second or first kind. In this paper we illustrate this approach by considering a popular GBM model that accounts for discrete cash and proportional dividends using Dirac delta functions. By reframing the problem as an integral equation, we can sequentially solve for the option price and the early exercise boundary, effectively handling the discontinuities caused by the dividends. Our methodology provides a powerful alternative to standard numerical techniques like binomial trees or finite difference methods, which can struggle with the jump conditions of discrete dividends by losing accuracy or performance. Several examples demonstrate that the GIT method is highly accurate and computationally efficient, bypassing the need for extensive computational grids or complex backward induction steps. ...

October 20, 2025 · 2 min · Research Team

Trading with the Devil: Risk and Return in Foundation Model Strategies

Trading with the Devil: Risk and Return in Foundation Model Strategies ArXiv ID: 2510.17165 “View on arXiv” Authors: Jinrui Zhang Abstract Foundation models - already transformative in domains such as natural language processing - are now starting to emerge for time-series tasks in finance. While these pretrained architectures promise versatile predictive signals, little is known about how they shape the risk profiles of the trading strategies built atop them, leaving practitioners reluctant to commit serious capital. In this paper, we propose an extension to the Capital Asset Pricing Model (CAPM) that disentangles the systematic risk introduced by a shared foundation model - potentially capable of generating alpha if the underlying model is genuinely predictive - from the idiosyncratic risk attributable to custom fine-tuning, which typically accrues no systematic premium. To enable a practical estimation of these separate risks, we align this decomposition with the concepts of uncertainty disentanglement, casting systematic risk as epistemic uncertainty (rooted in the pretrained model) and idiosyncratic risk as aleatory uncertainty (introduced during custom adaptations). Under the Aleatory Collapse Assumption, we illustrate how Monte Carlo dropout - among other methods in the uncertainty-quantization toolkit - can directly measure the epistemic risk, thereby mapping trading strategies to a more transparent risk-return plane. Our experiments show that isolating these distinct risk factors yields deeper insights into the performance limits of foundation-model-based strategies, their model degradation over time, and potential avenues for targeted refinements. Taken together, our results highlight both the promise and the pitfalls of deploying large pretrained models in competitive financial markets. ...

October 20, 2025 · 2 min · Research Team

A three-step machine learning approach to predict market bubbles with financial news

A three-step machine learning approach to predict market bubbles with financial news ArXiv ID: 2510.16636 “View on arXiv” Authors: Abraham Atsiwo Abstract This study presents a three-step machine learning framework to predict bubbles in the S&P 500 stock market by combining financial news sentiment with macroeconomic indicators. Building on traditional econometric approaches, the proposed approach predicts bubble formation by integrating textual and quantitative data sources. In the first step, bubble periods in the S&P 500 index are identified using a right-tailed unit root test, a widely recognized real-time bubble detection method. The second step extracts sentiment features from large-scale financial news articles using natural language processing (NLP) techniques, which capture investors’ expectations and behavioral patterns. In the final step, ensemble learning methods are applied to predict bubble occurrences based on high sentiment-based and macroeconomic predictors. Model performance is evaluated through k-fold cross-validation and compared against benchmark machine learning algorithms. Empirical results indicate that the proposed three-step ensemble approach significantly improves predictive accuracy and robustness, providing valuable early warning insights for investors, regulators, and policymakers in mitigating systemic financial risks. ...

October 18, 2025 · 2 min · Research Team

Sentiment and Volatility in Financial Markets: A Review of BERT and GARCH Applications during Geopolitical Crises

Sentiment and Volatility in Financial Markets: A Review of BERT and GARCH Applications during Geopolitical Crises ArXiv ID: 2510.16503 “View on arXiv” Authors: Domenica Mino, Cillian Williamson Abstract Artificial intelligence techniques have increasingly been applied to understand the complex relationship between public sentiment and financial market behaviour. This study explores the relationship between the sentiment of news related to the Russia-Ukraine war and the volatility of the stock market. A comprehensive dataset of news articles from major US platforms, published between January 1 and July 17, 2024, was analysed using a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model adapted for financial language. We extracted sentiment scores and applied a Generalised Autoregressive Conditional Heteroscedasticity (GARCH) model, enhanced with a Student-t distribution to capture the heavy-tailed nature of financial returns data. The results reveal a statistically significant negative relationship between negative news sentiment and market stability, suggesting that pessimistic war coverage is associated with increased volatility in the S&P 500 index. This research demonstrates how artificial intelligence and natural language processing can be integrated with econometric modelling to assess real-time market dynamics, offering valuable tools for financial risk analysis during geopolitical crises. ...

October 18, 2025 · 2 min · Research Team

Exploring the Synergy of Quantitative Factors and Newsflow Representations from Large Language Models for Stock Return Prediction

Exploring the Synergy of Quantitative Factors and Newsflow Representations from Large Language Models for Stock Return Prediction ArXiv ID: 2510.15691 “View on arXiv” Authors: Tian Guo, Emmanuel Hauptmann Abstract In quantitative investing, return prediction supports various tasks, including stock selection, portfolio optimization, and risk management. Quantitative factors, such as valuation, quality, and growth, capture various characteristics of stocks. Unstructured data, like news and transcripts, has attracted growing attention, driven by recent advances in large language models (LLMs). This paper examines effective methods for leveraging multimodal factors and newsflow in return prediction and stock selection. First, we introduce a fusion learning framework to learn a unified representation from factors and newsflow representations generated by an LLM. Within this framework, we compare three methods of different architectural complexities: representation combination, representation summation, and attentive representations. Next, building on the limitation of fusion learning observed in empirical comparison, we explore the mixture model that adaptively combines predictions made by single modalities and their fusion. To mitigate the training instability of the mixture model, we introduce a decoupled training approach with theoretical insights. Finally, our experiments on real investment universes yield several insights into effective multimodal modeling of factors and news for stock return prediction and selection. ...

October 17, 2025 · 2 min · Research Team