false

Look-Ahead-Bench: a Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance

Look-Ahead-Bench: a Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance ArXiv ID: 2601.13770 “View on arXiv” Authors: Mostapha Benhenda Abstract We introduce Look-Ahead-Bench, a standardized benchmark measuring look-ahead bias in Point-in-Time (PiT) Large Language Models (LLMs) within realistic and practical financial workflows. Unlike most existing approaches that primarily test inner lookahead knowledge via Q\&A, our benchmark evaluates model behavior in practical scenarios. To distinguish genuine predictive capability from memorization-based performance, we analyze performance decay across temporally distinct market regimes, incorporating several quantitative baselines to establish performance thresholds. We evaluate prominent open-source LLMs – Llama 3.1 (8B and 70B) and DeepSeek 3.2 – against a family of Point-in-Time LLMs (Pitinf-Small, Pitinf-Medium, and frontier-level model Pitinf-Large) from PiT-Inference. Results reveal significant lookahead bias in standard LLMs, as measured with alpha decay, unlike Pitinf models, which demonstrate improved generalization and reasoning abilities as they scale in size. This work establishes a foundation for the standardized evaluation of temporal bias in financial LLMs and provides a practical framework for identifying models suitable for real-world deployment. Code is available on GitHub: https://github.com/benstaf/lookaheadbench ...

January 20, 2026 · 2 min · Research Team

From rough to multifractal multidimensional volatility: A multidimensional Log S-fBM model

From rough to multifractal multidimensional volatility: A multidimensional Log S-fBM model ArXiv ID: 2601.10517 “View on arXiv” Authors: Othmane Zarhali, Emmanuel Bacry, Jean-François Muzy Abstract We introduce the multivariate Log S-fBM model (mLog S-fBM), extending the univariate framework proposed by Wu \textit{“et al.”} to the multidimensional setting. We define the multidimensional Stationary fractional Brownian motion (mS-fBM), characterized by marginals following S-fBM dynamics and a specific cross-covariance structure. It is parametrized by a correlation scale $T$, marginal-specific intermittency parameters and Hurst exponents, as well as their multidimensional counterparts: the co-intermittency matrix and the co-Hurst matrix. The mLog S-fBM is constructed by modeling volatility components as exponentials of the mS-fBM, preserving the dependence structure of the Gaussian core. We demonstrate that the model is well-defined for any co-Hurst matrix with entries in $[“0, \frac{“1”}{“2”}[$, supporting vanishing co-Hurst parameters to bridge rough volatility and multifractal regimes. We generalize the small intermittency approximation technique to the multivariate setting to develop an efficient Generalized Method of Moments calibration procedure, estimating cross-covariance parameters for pairs of marginals. We validate it on synthetic data and apply it to S&P 500 market data, modeling stock return fluctuations. Diagonal estimates of the stock Hurst matrix, corresponding to single-stock log-volatility Hurst exponents, are close to 0, indicating multifractal behavior, while co-Hurst off-diagonal entries are close to the Hurst exponent of the S&P 500 index ($H \approx 0.12$), and co-intermittency off-diagonal entries align with univariate intermittency estimates. ...

January 15, 2026 · 2 min · Research Team

History Is Not Enough: An Adaptive Dataflow System for Financial Time-Series Synthesis

History Is Not Enough: An Adaptive Dataflow System for Financial Time-Series Synthesis ArXiv ID: 2601.10143 “View on arXiv” Authors: Haochong Xia, Yao Long Teng, Regan Tan, Molei Qin, Xinrun Wang, Bo An Abstract In quantitative finance, the gap between training and real-world performance-driven by concept drift and distributional non-stationarity-remains a critical obstacle for building reliable data-driven systems. Models trained on static historical data often overfit, resulting in poor generalization in dynamic markets. The mantra “History Is Not Enough” underscores the need for adaptive data generation that learns to evolve with the market rather than relying solely on past observations. We present a drift-aware dataflow system that integrates machine learning-based adaptive control into the data curation process. The system couples a parameterized data manipulation module comprising single-stock transformations, multi-stock mix-ups, and curation operations, with an adaptive planner-scheduler that employs gradient-based bi-level optimization to control the system. This design unifies data augmentation, curriculum learning, and data workflow management under a single differentiable framework, enabling provenance-aware replay and continuous data quality monitoring. Extensive experiments on forecasting and reinforcement learning trading tasks demonstrate that our framework enhances model robustness and improves risk-adjusted returns. The system provides a generalizable approach to adaptive data management and learning-guided workflow automation for financial data. ...

January 15, 2026 · 2 min · Research Team

Instruction Finetuning LLaMA-3-8B Model Using LoRA for Financial Named Entity Recognition

Instruction Finetuning LLaMA-3-8B Model Using LoRA for Financial Named Entity Recognition ArXiv ID: 2601.10043 “View on arXiv” Authors: Zhiming Lian Abstract Particularly, financial named-entity recognition (NER) is one of the many important approaches to translate unformatted reports and news into structured knowledge graphs. However, free, easy-to-use large language models (LLMs) often fail to differentiate organisations as people, or disregard an actual monetary amount entirely. This paper takes Meta’s Llama 3 8B and applies it to financial NER by combining instruction fine-tuning and Low-Rank Adaptation (LoRA). Each annotated sentence is converted into an instruction-input-output triple, enabling the model to learn task descriptions while fine-tuning with small low-rank matrices instead of updating all weights. Using a corpus of 1,693 sentences, our method obtains a micro-F1 score of 0.894 compared with Qwen3-8B, Baichuan2-7B, T5, and BERT-Base. We present dataset statistics, describe training hyperparameters, and perform visualizations of entity density, learning curves, and evaluation metrics. Our results show that instruction tuning combined with parameter-efficient fine-tuning enables state-of-the-art performance on domain-sensitive NER. ...

January 15, 2026 · 2 min · Research Team

ProbFM: Probabilistic Time Series Foundation Model with Uncertainty Decomposition

ProbFM: Probabilistic Time Series Foundation Model with Uncertainty Decomposition ArXiv ID: 2601.10591 “View on arXiv” Authors: Arundeep Chinta, Lucas Vinh Tran, Jay Katukuri Abstract Time Series Foundation Models (TSFMs) have emerged as a promising approach for zero-shot financial forecasting, demonstrating strong transferability and data efficiency gains. However, their adoption in financial applications is hindered by fundamental limitations in uncertainty quantification: current approaches either rely on restrictive distributional assumptions, conflate different sources of uncertainty, or lack principled calibration mechanisms. While recent TSFMs employ sophisticated techniques such as mixture models, Student’s t-distributions, or conformal prediction, they fail to address the core challenge of providing theoretically-grounded uncertainty decomposition. For the very first time, we present a novel transformer-based probabilistic framework, ProbFM (probabilistic foundation model), that leverages Deep Evidential Regression (DER) to provide principled uncertainty quantification with explicit epistemic-aleatoric decomposition. Unlike existing approaches that pre-specify distributional forms or require sampling-based inference, ProbFM learns optimal uncertainty representations through higher-order evidence learning while maintaining single-pass computational efficiency. To rigorously evaluate the core DER uncertainty quantification approach independent of architectural complexity, we conduct an extensive controlled comparison study using a consistent LSTM architecture across five probabilistic methods: DER, Gaussian NLL, Student’s-t NLL, Quantile Loss, and Conformal Prediction. Evaluation on cryptocurrency return forecasting demonstrates that DER maintains competitive forecasting accuracy while providing explicit epistemic-aleatoric uncertainty decomposition. This work establishes both an extensible framework for principled uncertainty quantification in foundation models and empirical evidence for DER’s effectiveness in financial applications. ...

January 15, 2026 · 2 min · Research Team

Robo-Advising in Motion: A Model Predictive Control Approach

Robo-Advising in Motion: A Model Predictive Control Approach ArXiv ID: 2601.09127 “View on arXiv” Authors: Tomasz R. Bielecki, Igor Cialenco Abstract Robo-advisors (RAs) are automated portfolio management systems that complement traditional financial advisors by offering lower fees and smaller initial investment requirements. While most existing RAs rely on static, one-period allocation methods, we propose a dynamic, multi-period asset-allocation framework that leverages Model Predictive Control (MPC) to generate suboptimal but practically effective strategies. Our approach combines a Hidden Markov Model with Black-Litterman (BL) methodology to forecast asset returns and covariances, and incorporates practically important constraints, including turnover limits, transaction costs, and target portfolio allocations. We study two predominant optimality criteria in wealth management: dynamic mean-variance (MV) and dynamic risk-budgeting (MRB). Numerical experiments demonstrate that MPC-based strategies consistently outperform myopic approaches, with MV providing flexible and diversified portfolios, while MRB delivers smoother allocations less sensitive to key parameters. These findings highlight the trade-offs between adaptability and stability in practical robo-advising design. ...

January 14, 2026 · 2 min · Research Team

The Fourier estimator of spot volatility: Unbounded coefficients and jumps in the price process

The Fourier estimator of spot volatility: Unbounded coefficients and jumps in the price process ArXiv ID: 2601.09074 “View on arXiv” Authors: L. J. Espinosa González, Erick Treviño Aguilar Abstract In this paper we study the Fourier estimator of Malliavin and Mancino for the spot volatility. We establish the convergence of the trigonometric polynomial to the volatility’s path in a setting that includes the following aspects. First, the volatility is required to satisfy a mild integrability condition, but otherwise allowed to be unbounded. Second, the price process is assumed to have cadlag paths, not necessarily continuous. We obtain convergence rates for the probability of a bad approximation in estimated coefficients, with a speed that allow to obtain an almost sure convergence and not just in probability in the estimated reconstruction of the volatility’s path. This is a new result even in the setting of continuous paths. We prove that a rescaled trigonometric polynomial approximate the quadratic jump process. ...

January 14, 2026 · 2 min · Research Team

Feasibility-First Satellite Integration in Robust Portfolio Architectures

Feasibility-First Satellite Integration in Robust Portfolio Architectures ArXiv ID: 2601.08721 “View on arXiv” Authors: Roberto Garrone Abstract The integration of thematic satellite allocations into core-satellite portfolio architectures is commonly approached using factor exposures, discretionary convictions, or backtested performance, with feasibility assessed primarily through liquidity screens or market-impact considerations. While such approaches may be appropriate at institutional scale, they are ill-suited to small portfolios and robustness-oriented allocation frameworks, where dominant constraints arise not from return predictability or trading capacity, but from fixed costs, irreversibility risk, and governance complexity. This paper develops a feasibility-first, non-predictive framework for satellite integration that is explicitly scale-aware. We formalize four nested feasibility layers (physical, economic, structural, and epistemic) that jointly determine whether a satellite allocation is admissible. Physical feasibility ensures implementability under concave market-impact laws; economic feasibility suppresses noise-dominated reallocations via cost-dominance threshold constraints; structural feasibility bounds satellite size through an explicit optionality budget defined by tolerable loss under thesis failure; and epistemic feasibility limits satellite breadth and dispersion through an entropy-based complexity budget. Within this hierarchy, structural optionality is identified as the primary design principle for thematic satellites, with the remaining layers acting as robustness lenses rather than optimization criteria. The framework yields closed-form feasibility bounds on satellite size, turnover, and breadth without reliance on return forecasts, factor premia, or backtested performance, providing a disciplined basis for integrating thematic satellites into small, robustness-oriented portfolios. ...

January 13, 2026 · 2 min · Research Team

Regime Discovery and Intra-Regime Return Dynamics in Global Equity Markets

Regime Discovery and Intra-Regime Return Dynamics in Global Equity Markets ArXiv ID: 2601.08571 “View on arXiv” Authors: Salam Rabindrajit Luwang, Buddha Nath Sharma, Kundan Mukhia, Md. Nurujjaman, Anish Rai, Filippo Petroni, Luis E. C. Rocha Abstract Financial markets alternate between tranquil periods and episodes of stress, and return dynamics can change substantially across these regimes. We study regime-dependent dynamics in developed and developing equity indices using a data-driven Hilbert–Huang-based regime identification and profiling pipeline, followed by variable-length Markov modeling of categorized returns. Market regimes are identified using an Empirical Mode Decomposition-based Hilbert–Huang Transform, where instantaneous energy from the Hilbert spectrum separates Normal, High, and Extreme regimes. We then profile each regime using Holo–Hilbert Spectral Analysis, which jointly resolves carrier frequencies, amplitude-modulation frequencies, and amplitude-modulation energy (AME). AME, interpreted as volatility intensity, declines monotonically from Extreme to High to Normal regimes. This decline is markedly sharper in developed markets, while developing markets retain higher baseline volatility intensity even in Normal regimes. Building on these regime-specific volatility signatures, we discretize daily returns into five quintile states $\mathtt{“R”}_1$ to $\mathtt{“R”}_5$ and estimate Variable-Length Markov Chains via context trees within each regime. Unconditional state probabilities show tail states dominate in Extreme regimes and recede as regimes stabilize, alongside persistent downside asymmetry. Entropy peaks in High regimes, indicating maximum unpredictability during moderate-volatility periods. Conditional transition dynamics, evaluated over contexts of length up to three days from the context-tree estimates, indicate that developed markets normalize more effectively as stress subsides, whereas developing markets retain residual tail dependence and downside persistence even in Normal regimes, consistent with a coexistence of continuation and burst-like shifts. ...

January 13, 2026 · 2 min · Research Team

Resisting Manipulative Bots in Memecoin Copy Trading: A Multi-Agent Approach with Chain-of-Thought Reasoning

Resisting Manipulative Bots in Memecoin Copy Trading: A Multi-Agent Approach with Chain-of-Thought Reasoning ArXiv ID: 2601.08641 “View on arXiv” Authors: Yichen Luo, Yebo Feng, Jiahua Xu, Yang Liu Abstract The launch of $Trump coin ignited a wave in meme coin investment. Copy trading, as a strategy-agnostic approach that eliminates the need for deep trading knowledge, quickly gains widespread popularity in the meme coin market. However, copy trading is not a guarantee of profitability due to the prevalence of manipulative bots, the uncertainty of the followed wallets’ future performance, and the lag in trade execution. Recently, large language models (LLMs) have shown promise in financial applications by effectively understanding multi-modal data and producing explainable decisions. However, a single LLM struggles with complex, multi-faceted tasks such as asset allocation. These challenges are even more pronounced in cryptocurrency markets, where LLMs often lack sufficient domain-specific knowledge in their training data. To address these challenges, we propose an explainable multi-agent system for meme coin copy trading. Inspired by the structure of an asset management team, our system decomposes the complex task into subtasks and coordinates specialized agents to solve them collaboratively. Employing few-shot chain-of-though (CoT) prompting, each agent acquires professional meme coin trading knowledge, interprets multi-modal data, and generates explainable decisions. Using a dataset of 1,000 meme coin projects’ transaction data, our empirical evaluation shows that the proposed multi-agent system outperforms both traditional machine learning models and single LLMs, achieving 73% and 70% precision in identifying high-quality meme coin projects and key opinion leader (KOL) wallets, respectively. The selected KOLs collectively generated a total profit of $500,000 across these projects. ...

January 13, 2026 · 2 min · Research Team