false

Look-Ahead-Bench: a Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance

Look-Ahead-Bench: a Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance ArXiv ID: 2601.13770 “View on arXiv” Authors: Mostapha Benhenda Abstract We introduce Look-Ahead-Bench, a standardized benchmark measuring look-ahead bias in Point-in-Time (PiT) Large Language Models (LLMs) within realistic and practical financial workflows. Unlike most existing approaches that primarily test inner lookahead knowledge via Q\&A, our benchmark evaluates model behavior in practical scenarios. To distinguish genuine predictive capability from memorization-based performance, we analyze performance decay across temporally distinct market regimes, incorporating several quantitative baselines to establish performance thresholds. We evaluate prominent open-source LLMs – Llama 3.1 (8B and 70B) and DeepSeek 3.2 – against a family of Point-in-Time LLMs (Pitinf-Small, Pitinf-Medium, and frontier-level model Pitinf-Large) from PiT-Inference. Results reveal significant lookahead bias in standard LLMs, as measured with alpha decay, unlike Pitinf models, which demonstrate improved generalization and reasoning abilities as they scale in size. This work establishes a foundation for the standardized evaluation of temporal bias in financial LLMs and provides a practical framework for identifying models suitable for real-world deployment. Code is available on GitHub: https://github.com/benstaf/lookaheadbench ...

January 20, 2026 · 2 min · Research Team

Temporal Kolmogorov-Arnold Networks (T-KAN) for High-Frequency Limit Order Book Forecasting: Efficiency, Interpretability, and Alpha Decay

Temporal Kolmogorov-Arnold Networks (T-KAN) for High-Frequency Limit Order Book Forecasting: Efficiency, Interpretability, and Alpha Decay ArXiv ID: 2601.02310 “View on arXiv” Authors: Ahmad Makinde Abstract High-Frequency trading (HFT) environments are characterised by large volumes of limit order book (LOB) data, which is notoriously noisy and non-linear. Alpha decay represents a significant challenge, with traditional models such as DeepLOB losing predictive power as the time horizon (k) increases. In this paper, using data from the FI-2010 dataset, we introduce Temporal Kolmogorov-Arnold Networks (T-KAN) to replace the fixed, linear weights of standard LSTMs with learnable B-spline activation functions. This allows the model to learn the ‘shape’ of market signals as opposed to just their magnitude. This resulted in a 19.1% relative improvement in the F1-score at the k = 100 horizon. The efficacy of T-KAN networks cannot be understated, producing a 132.48% return compared to the -82.76% DeepLOB drawdown under 1.0 bps transaction costs. In addition to this, the T-KAN model proves quite interpretable, with the ‘dead-zones’ being clearly visible in the splines. The T-KAN architecture is also uniquely optimized for low-latency FPGA implementation via High level Synthesis (HLS). The code for the experiments in this project can be found at https://github.com/AhmadMak/Temporal-Kolmogorov-Arnold-Networks-T-KAN-for-High-Frequency-Limit-Order-Book-Forecasting. ...

January 5, 2026 · 2 min · Research Team

Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning

Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning ArXiv ID: 2512.23515 “View on arXiv” Authors: Zuoyou Jiang, Li Zhao, Rui Sun, Ruohan Sun, Zhongjian Li, Jing Li, Daxin Jiang, Zuo Bai, Cheng Hua Abstract Signal decay and regime shifts pose recurring challenges for data-driven investment strategies in non-stationary markets. Conventional time-series and machine learning approaches, which rely primarily on historical correlations, often struggle to generalize when the economic environment changes. While large language models (LLMs) offer strong capabilities for processing unstructured information, their potential to support quantitative factor screening through explicit economic reasoning remains underexplored. Existing factor-based methods typically reduce alphas to numerical time series, overlooking the semantic rationale that determines when a factor is economically relevant. We propose Alpha-R1, an 8B-parameter reasoning model trained via reinforcement learning for context-aware alpha screening. Alpha-R1 reasons over factor logic and real-time news to evaluate alpha relevance under changing market conditions, selectively activating or deactivating factors based on contextual consistency. Empirical results across multiple asset pools show that Alpha-R1 consistently outperforms benchmark strategies and exhibits improved robustness to alpha decay. The full implementation and resources are available at https://github.com/FinStep-AI/Alpha-R1. ...

December 29, 2025 · 2 min · Research Team

Not All Factors Crowd Equally: Modeling, Measuring, and Trading on Alpha Decay

Not All Factors Crowd Equally: Modeling, Measuring, and Trading on Alpha Decay ArXiv ID: 2512.11913 “View on arXiv” Authors: Chorok Lee Abstract We derive a specific functional form for factor alpha decay – hyperbolic decay alpha(t) = K/(1+lambda*t) – from a game-theoretic equilibrium model, and test it against linear and exponential alternatives. Using eight Fama-French factors (1963–2024), we find: (1) Hyperbolic decay fits mechanical factors. Momentum exhibits clear hyperbolic decay (R^2 = 0.65), outperforming linear (0.51) and exponential (0.61) baselines – validating the equilibrium foundation. (2) Not all factors crowd equally. Mechanical factors (momentum, reversal) fit the model; judgment-based factors (value, quality) do not – consistent with a signal-ambiguity taxonomy paralleling Hua and Sun’s “barriers to entry.” (3) Crowding accelerated post-2015. Out-of-sample, the model over-predicts remaining alpha (0.30 vs. 0.15), correlating with factor ETF growth (rho = -0.63). (4) Average returns are efficiently priced. Crowding-based factor selection fails to generate alpha (Sharpe: 0.22 vs. 0.39 factor momentum benchmark). (5) Crowding predicts tail risk. Out-of-sample (2001–2024), crowded reversal factors show 1.7–1.8x higher crash probability (bottom decile returns), while crowded momentum shows lower crash risk (0.38x, p = 0.006). Our findings extend equilibrium crowding models (DeMiguel et al.) to temporal dynamics and show that crowding predicts crashes, not means – useful for risk management, not alpha generation. ...

December 11, 2025 · 2 min · Research Team