false

From sectorial coarse graining to extreme coarse graining of S&P 500 correlation matrices

From sectorial coarse graining to extreme coarse graining of S&P 500 correlation matrices ArXiv ID: 2511.05463 “View on arXiv” Authors: Manan Vyas, M. Mijaíl Martínez-Ramos, Parisa Majari, Thomas H. Seligman Abstract Starting from the Pearson Correlation Matrix of stock returns and from the desire to obtain a reduced number of parameters relevant for the dynamics of a financial market, we propose to take the idea of a sectorial matrix, which would have a large number of parameters, to the reduced picture of a real symmetric $2 \times 2$ matrix, extreme case, that still conserves the desirable feature that the average correlation can be one of the parameters. This is achieved by averaging the correlation matrix over blocks created by choosing two subsets of stocks for rows and columns and averaging over each of the resulting blocks. Averaging over these blocks, we retain the average of the correlation matrix. We shall use a random selection for two equal block sizes as well as two specific, hopefully relevant, ones that do not produce equal block sizes. The results show that one of the non-random choices has somewhat different properties, whose meaning will have to be analyzed from an economy point of view. ...

November 7, 2025 · 2 min · Research Team

Multi-period Learning for Financial Time Series Forecasting

Multi-period Learning for Financial Time Series Forecasting ArXiv ID: 2511.08622 “View on arXiv” Authors: Xu Zhang, Zhengang Huang, Yunzhi Wu, Xun Lu, Erpeng Qi, Yunkai Chen, Zhongya Xue, Qitong Wang, Peng Wang, Wei Wang Abstract Time series forecasting is important in finance domain. Financial time series (TS) patterns are influenced by both short-term public opinions and medium-/long-term policy and market trends. Hence, processing multi-period inputs becomes crucial for accurate financial time series forecasting (TSF). However, current TSF models either use only single-period input, or lack customized designs for addressing multi-period characteristics. In this paper, we propose a Multi-period Learning Framework (MLF) to enhance financial TSF performance. MLF considers both TSF’s accuracy and efficiency requirements. Specifically, we design three new modules to better integrate the multi-period inputs for improving accuracy: (i) Inter-period Redundancy Filtering (IRF), that removes the information redundancy between periods for accurate self-attention modeling, (ii) Learnable Weighted-average Integration (LWI), that effectively integrates multi-period forecasts, (iii) Multi-period self-Adaptive Patching (MAP), that mitigates the bias towards certain periods by setting the same number of patches across all periods. Furthermore, we propose a Patch Squeeze module to reduce the number of patches in self-attention modeling for maximized efficiency. MLF incorporates multiple inputs with varying lengths (periods) to achieve better accuracy and reduces the costs of selecting input lengths during training. The codes and datasets are available at https://github.com/Meteor-Stars/MLF. ...

November 7, 2025 · 2 min · Research Team

The LLM Pro Finance Suite: Multilingual Large Language Models for Financial Applications

The LLM Pro Finance Suite: Multilingual Large Language Models for Financial Applications ArXiv ID: 2511.08621 “View on arXiv” Authors: Gaëtan Caillaut, Raheel Qader, Jingshu Liu, Mariam Nakhlé, Arezki Sadoune, Massinissa Ahmim, Jean-Gabriel Barthelemy Abstract The financial industry’s growing demand for advanced natural language processing (NLP) capabilities has highlighted the limitations of generalist large language models (LLMs) in handling domain-specific financial tasks. To address this gap, we introduce the LLM Pro Finance Suite, a collection of five instruction-tuned LLMs (ranging from 8B to 70B parameters) specifically designed for financial applications. Our approach focuses on enhancing generalist instruction-tuned models, leveraging their existing strengths in instruction following, reasoning, and toxicity control, while fine-tuning them on a curated, high-quality financial corpus comprising over 50% finance-related data in English, French, and German. We evaluate the LLM Pro Finance Suite on a comprehensive financial benchmark suite, demonstrating consistent improvement over state-of-the-art baselines in finance-oriented tasks and financial translation. Notably, our models maintain the strong general-domain capabilities of their base models, ensuring reliable performance across non-specialized tasks. This dual proficiency, enhanced financial expertise without compromise on general abilities, makes the LLM Pro Finance Suite an ideal drop-in replacement for existing LLMs in financial workflows, offering improved domain-specific performance while preserving overall versatility. We publicly release two 8B-parameters models to foster future research and development in financial NLP applications: https://huggingface.co/collections/DragonLLM/llm-open-finance. ...

November 7, 2025 · 2 min · Research Team

The Shape of Markets: Machine learning modeling and Prediction Using 2-Manifold Geometries

The Shape of Markets: Machine learning modeling and Prediction Using 2-Manifold Geometries ArXiv ID: 2511.05030 “View on arXiv” Authors: Panagiotis G. Papaioannou, Athanassios N. Yannacopoulos Abstract We introduce a Geometry Informed Model for financial forecasting by embedding high dimensional market data onto constant curvature 2manifolds. Guided by the uniformization theorem, we model market dynamics as Brownian motion on spherical S2, Euclidean R2, and hyperbolic H2 geometries. We further include the torus T, a compact, flat manifold admissible as a quotient space of the Euclidean plane anticipating its relevance for capturing cyclical dynamics. Manifold learning techniques infer the latent curvature from financial data, revealing the torus as the best performing geometry. We interpret this result through a macroeconomic lens, the torus circular dimensions align with endogenous cycles in output, interest rates, and inflation described by IS LM theory. Our findings demonstrate the value of integrating differential geometry with data-driven inference for financial modeling. ...

November 7, 2025 · 2 min · Research Team

Causal Regime Detection in Energy Markets With Augmented Time Series Structural Causal Models

Causal Regime Detection in Energy Markets With Augmented Time Series Structural Causal Models ArXiv ID: 2511.04361 “View on arXiv” Authors: Dennis Thumm Abstract Energy markets exhibit complex causal relationships between weather patterns, generation technologies, and price formation, with regime changes occurring continuously rather than at discrete break points. Current approaches model electricity prices without explicit causal interpretation or counterfactual reasoning capabilities. We introduce Augmented Time Series Causal Models (ATSCM) for energy markets, extending counterfactual reasoning frameworks to multivariate temporal data with learned causal structure. Our approach models energy systems through interpretable factors (weather, generation mix, demand patterns), rich grid dynamics, and observable market variables. We integrate neural causal discovery to learn time-varying causal graphs without requiring ground truth DAGs. Applied to real-world electricity price data, ATSCM enables novel counterfactual queries such as “What would prices be under different renewable generation scenarios?”. ...

November 6, 2025 · 2 min · Research Team

Insights into Tail-Based and Order Statistics

Insights into Tail-Based and Order Statistics ArXiv ID: 2511.04784 “View on arXiv” Authors: Hamidreza Maleki Almani Abstract Heavy-tailed phenomena appear across diverse domains –from wealth and firm sizes in economics to network traffic, biological systems, and physical processes– characterized by the disproportionate influence of extreme values. These distributions challenge classical statistical models, as their tails decay too slowly for conventional approximations to hold. Among their key descriptive measures are quantile contributions, which quantify the proportion of a total quantity (such as income, energy, or risk) attributed to observations above a given quantile threshold. This paper presents a theoretical study of the quantile contribution statistic and its relationship with order statistics. We derive a closed-form expression for the joint cumulative distribution function (CDF) of order statistics and, based on it, obtain an explicit CDF for quantile contributions applicable to small samples. We then investigate the asymptotic behavior of these contributions as the sample size increases, establishing the asymptotic normality of the numerator and characterizing the limiting distribution of the quantile contribution. Finally, simulation studies illustrate the convergence properties and empirical accuracy of the theoretical results, providing a foundation for applying quantile contributions in the analysis of heavy-tailed data. ...

November 6, 2025 · 2 min · Research Team

Reasoning on Time-Series for Financial Technical Analysis

Reasoning on Time-Series for Financial Technical Analysis ArXiv ID: 2511.08616 “View on arXiv” Authors: Kelvin J. L. Koa, Jan Chen, Yunshan Ma, Huanhuan Zheng, Tat-Seng Chua Abstract While Large Language Models have been used to produce interpretable stock forecasts, they mainly focus on analyzing textual reports but not historical price data, also known as Technical Analysis. This task is challenging as it switches between domains: the stock price inputs and outputs lie in the time-series domain, while the reasoning step should be in natural language. In this work, we introduce Verbal Technical Analysis (VTA), a novel framework that combine verbal and latent reasoning to produce stock time-series forecasts that are both accurate and interpretable. To reason over time-series, we convert stock price data into textual annotations and optimize the reasoning trace using an inverse Mean Squared Error (MSE) reward objective. To produce time-series outputs from textual reasoning, we condition the outputs of a time-series backbone model on the reasoning-based attributes. Experiments on stock datasets across U.S., Chinese, and European markets show that VTA achieves state-of-the-art forecasting accuracy, while the reasoning traces also perform well on evaluation by industry experts. ...

November 6, 2025 · 2 min · Research Team

Towards Causal Market Simulators

Towards Causal Market Simulators ArXiv ID: 2511.04469 “View on arXiv” Authors: Dennis Thumm, Luis Ontaneda Mijares Abstract Market generators using deep generative models have shown promise for synthetic financial data generation, but existing approaches lack causal reasoning capabilities essential for counterfactual analysis and risk assessment. We propose a Time-series Neural Causal Model VAE (TNCM-VAE) that combines variational autoencoders with structural causal models to generate counterfactual financial time series while preserving both temporal dependencies and causal relationships. Our approach enforces causal constraints through directed acyclic graphs in the decoder architecture and employs the causal Wasserstein distance for training. We validate our method on synthetic autoregressive models inspired by the Ornstein-Uhlenbeck process, demonstrating superior performance in counterfactual probability estimation with L1 distances as low as 0.03-0.10 compared to ground truth. The model enables financial stress testing, scenario analysis, and enhanced backtesting by generating plausible counterfactual market trajectories that respect underlying causal mechanisms. ...

November 6, 2025 · 2 min · Research Team

Data-driven Feynman-Kac Discovery with Applications to Prediction and Data Generation

Data-driven Feynman-Kac Discovery with Applications to Prediction and Data Generation ArXiv ID: 2511.08606 “View on arXiv” Authors: Qi Feng, Guang Lin, Purav Matlia, Denny Serdarevic Abstract In this paper, we propose a novel data-driven framework for discovering probabilistic laws underlying the Feynman-Kac formula. Specifically, we introduce the first stochastic SINDy method formulated under the risk-neutral probability measure to recover the backward stochastic differential equation (BSDE) from a single pair of stock and option trajectories. Unlike existing approaches to identifying stochastic differential equations-which typically require ergodicity-our framework leverages the risk-neutral measure, thereby eliminating the ergodicity assumption and enabling BSDE recovery from limited financial time series data. Using this algorithm, we are able not only to make forward-looking predictions but also to generate new synthetic data paths consistent with the underlying probabilistic law. ...

November 5, 2025 · 2 min · Research Team

LiveTradeBench: Seeking Real-World Alpha with Large Language Models

LiveTradeBench: Seeking Real-World Alpha with Large Language Models ArXiv ID: 2511.03628 “View on arXiv” Authors: Haofei Yu, Fenghai Li, Jiaxuan You Abstract Large language models (LLMs) achieve strong performance across benchmarks–from knowledge quizzes and math reasoning to web-agent tasks–but these tests occur in static settings, lacking real dynamics and uncertainty. Consequently, they evaluate isolated reasoning or problem-solving rather than decision-making under uncertainty. To address this, we introduce LiveTradeBench, a live trading environment for evaluating LLM agents in realistic and evolving markets. LiveTradeBench follows three design principles: (i) Live data streaming of market prices and news, eliminating dependence on offline backtesting and preventing information leakage while capturing real-time uncertainty; (ii) a portfolio-management abstraction that extends control from single-asset actions to multi-asset allocation, integrating risk management and cross-asset reasoning; and (iii) multi-market evaluation across structurally distinct environments–U.S. stocks and Polymarket prediction markets–differing in volatility, liquidity, and information flow. At each step, an agent observes prices, news, and its portfolio, then outputs percentage allocations that balance risk and return. Using LiveTradeBench, we run 50-day live evaluations of 21 LLMs across families. Results show that (1) high LMArena scores do not imply superior trading outcomes; (2) models display distinct portfolio styles reflecting risk appetite and reasoning dynamics; and (3) some LLMs effectively leverage live signals to adapt decisions. These findings expose a gap between static evaluation and real-world competence, motivating benchmarks that test sequential decision making and consistency under live uncertainty. ...

November 5, 2025 · 2 min · Research Team