false

Look-Ahead-Bench: a Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance

Look-Ahead-Bench: a Standardized Benchmark of Look-ahead Bias in Point-in-Time LLMs for Finance ArXiv ID: 2601.13770 “View on arXiv” Authors: Mostapha Benhenda Abstract We introduce Look-Ahead-Bench, a standardized benchmark measuring look-ahead bias in Point-in-Time (PiT) Large Language Models (LLMs) within realistic and practical financial workflows. Unlike most existing approaches that primarily test inner lookahead knowledge via Q\&A, our benchmark evaluates model behavior in practical scenarios. To distinguish genuine predictive capability from memorization-based performance, we analyze performance decay across temporally distinct market regimes, incorporating several quantitative baselines to establish performance thresholds. We evaluate prominent open-source LLMs – Llama 3.1 (8B and 70B) and DeepSeek 3.2 – against a family of Point-in-Time LLMs (Pitinf-Small, Pitinf-Medium, and frontier-level model Pitinf-Large) from PiT-Inference. Results reveal significant lookahead bias in standard LLMs, as measured with alpha decay, unlike Pitinf models, which demonstrate improved generalization and reasoning abilities as they scale in size. This work establishes a foundation for the standardized evaluation of temporal bias in financial LLMs and provides a practical framework for identifying models suitable for real-world deployment. Code is available on GitHub: https://github.com/benstaf/lookaheadbench ...

January 20, 2026 · 2 min · Research Team

Alleviating Non-identifiability: a High-fidelity Calibration Objective for Financial Market Simulation with Multivariate Time Series Data

Alleviating Non-identifiability: a High-fidelity Calibration Objective for Financial Market Simulation with Multivariate Time Series Data ArXiv ID: 2407.16566 “View on arXiv” Authors: Unknown Abstract The non-identifiability issue has been frequently reported in social simulation works, where different parameters of an agent-based simulation model yield indistinguishable simulated time series data under certain discrepancy metrics. This issue largely undermines the simulation fidelity yet lacks dedicated investigations. This paper theoretically demonstrates that incorporating multiple time series data features during the model calibration phase can exponentially alleviate non-identifiability as the number of features increases. To implement this theoretical finding, a maximization-based aggregation function is proposed based on existing discrepancy metrics to form a new calibration objective function. For verification, the task of calibrating the Financial Market Simulation (FMS), a typical yet complex social simulation, is considered. Empirical studies confirm the significant improvements in alleviating the non-identifiability of calibration tasks. Furthermore, as a model-agnostic method, it achieves much higher simulation fidelity of the chosen FMS model on both synthetic and real market data. Moreover, it is both theoretically and empirically analyzed that as long as the features are selected and not linearly correlated, they can contribute to alleviation, which demonstrates the robustness of the proposed objective. Hence, this work is expected to provide not only a rigorous understanding of non-identifiability in social simulation but also an off-the-shelf high-fidelity calibration objective function for FMS. ...

July 23, 2024 · 2 min · Research Team

Dynamic Time Warping for Lead-Lag Relationships in Lagged Multi-Factor Models

Dynamic Time Warping for Lead-Lag Relationships in Lagged Multi-Factor Models ArXiv ID: 2309.08800 “View on arXiv” Authors: Unknown Abstract In multivariate time series systems, lead-lag relationships reveal dependencies between time series when they are shifted in time relative to each other. Uncovering such relationships is valuable in downstream tasks, such as control, forecasting, and clustering. By understanding the temporal dependencies between different time series, one can better comprehend the complex interactions and patterns within the system. We develop a cluster-driven methodology based on dynamic time warping for robust detection of lead-lag relationships in lagged multi-factor models. We establish connections to the multireference alignment problem for both the homogeneous and heterogeneous settings. Since multivariate time series are ubiquitous in a wide range of domains, we demonstrate that our algorithm is able to robustly detect lead-lag relationships in financial markets, which can be subsequently leveraged in trading strategies with significant economic benefits. ...

September 15, 2023 · 2 min · Research Team