false

When Reasoning Fails: Evaluating 'Thinking' LLMs for Stock Prediction

When Reasoning Fails: Evaluating ‘Thinking’ LLMs for Stock Prediction ArXiv ID: 2511.08608 “View on arXiv” Authors: Rakeshkumar H Sodha Abstract Problem. “Thinking” LLMs (TLLMs) expose explicit or hidden reasoning traces and are widely believed to generalize better on complex tasks than direct LLMs. Whether this promise carries to noisy, heavy-tailed and regime-switching financial data remains unclear. Approach. Using Indian equities (NIFTY constituents), we run a rolling 48m/1m walk-forward evaluation at horizon k = 1 day and dial cross-sectional complexity via the universe size U in {“5, 11, 21, 36”} while keeping the reasoning budget fixed (B = 512 tokens) for the TLLM. We compare a direct LLM (gpt-4o-mini), a TLLM (gpt-5), and classical learners (ridge, random forest) on cross-sectional ranking loss 1 - IC, MSE, and long/short backtests with realistic costs. Statistical confidence is measured with Diebold-Mariano, Pesaran-Timmermann, and SPA tests. Main findings. (i) As U grows under a fixed budget B, the TLLM’s ranking quality deteriorates, whereas the direct LLM remains flat and classical baselines are stable. (ii) TLLM variance is higher, requiring ex-post calibration (winsorization and blending) for stability. (iii) Portfolio results under transaction costs do not support a net advantage for the TLLM. Hypotheses. Our results are consistent with the following testable hypotheses: H1 (Capacity-Complexity Mismatch): for fixed B, TLLM accuracy degrades superlinearly in cross-sectional complexity. H2 (Reasoning Variance): TLLM outputs exhibit higher dispersion date-by-date than direct LLMs, increasing error bars and turnover. H3 (Domain Misfit): next-token prediction objectives and token-budgeted inference are poorly aligned with heavy-tailed, weakly predictable stock returns. Implication. In our setting, “thinking” LLMs are not yet ready to replace classical or direct methods for short-horizon stock ranking; scaling the reasoning budget and/or re-aligning objectives appears necessary. ...

November 5, 2025 · 3 min · Research Team

Calculating Profits and Losses for Algorithmic Trading Strategies: A Short Guide

Calculating Profits and Losses for Algorithmic Trading Strategies: A Short Guide ArXiv ID: 2411.14068 “View on arXiv” Authors: Unknown Abstract We present a series of equations that track the total realized and unrealized profits and losses at any time, incorporating the spread. The resulting formalism is ideally suited to evaluate the performance of trading model algorithms. Keywords: realized profit/loss, unrealized profit/loss, spread, trading algorithms, performance evaluation, Trading Strategies Complexity vs Empirical Score Math Complexity: 3.5/10 Empirical Rigor: 2.0/10 Quadrant: Philosophers Why: The paper presents a series of algebraic equations to formalize profit and loss calculations, which is moderately math-intensive but lacks the deep stochastic calculus or advanced statistics often seen in quant finance research. Empirically, it is a theoretical guide with illustrative examples but no backtested performance, real-world datasets, or implementation code. flowchart TD A["Research Goal: Develop<br>algorithms to track<br>realized & unrealized PnL"] --> B["Key Methodology: Mathematical Formalism"] B --> C["Data/Inputs: Trades, Prices, Spread"] C --> D["Computational Process:<br>Equations for PnL Calculation"] D --> E["Key Findings: Robust<br>Performance Evaluation"]

November 21, 2024 · 1 min · Research Team

Randomized Control in Performance Analysis and Empirical Asset Pricing

Randomized Control in Performance Analysis and Empirical Asset Pricing ArXiv ID: 2403.00009 “View on arXiv” Authors: Unknown Abstract The present article explores the application of randomized control techniques in empirical asset pricing and performance evaluation. It introduces geometric random walks, a class of Markov chain Monte Carlo methods, to construct flexible control groups in the form of random portfolios adhering to investor constraints. The sampling-based methods enable an exploration of the relationship between academically studied factor premia and performance in a practical setting. In an empirical application, the study assesses the potential to capture premias associated with size, value, quality, and momentum within a strongly constrained setup, exemplified by the investor guidelines of the MSCI Diversified Multifactor index. Additionally, the article highlights issues with the more traditional use case of random portfolios for drawing inferences in performance evaluation, showcasing challenges related to the intricacies of high-dimensional geometry. ...

February 14, 2024 · 2 min · Research Team

Optimal Portfolio with Ratio Type Periodic Evaluation under Short-Selling Prohibition

Optimal Portfolio with Ratio Type Periodic Evaluation under Short-Selling Prohibition ArXiv ID: 2311.12517 “View on arXiv” Authors: Unknown Abstract This paper studies some unconventional utility maximization problems when the ratio type relative portfolio performance is periodically evaluated over an infinite horizon. Meanwhile, the agent is prohibited from short-selling stocks. Our goal is to understand the impact of the periodic reward structure on the long-run constrained portfolio strategy. For power and logarithmic utilities, we can reformulate the original problem into an auxiliary one-period optimization problem. To cope with the auxiliary problem with no short-selling, the dual control problem is introduced and studied, which gives the characterization of the candidate optimal portfolio within one period. With the help of the results from the auxiliary problem, the value function and the optimal constrained portfolio for the original problem with periodic evaluation can be derived and verified, allowing us to discuss some financial implications under the new performance paradigm. ...

November 21, 2023 · 2 min · Research Team

Value at Risk Models inFinance

Value at Risk Models inFinance ArXiv ID: ssrn-356220 “View on arXiv” Authors: Unknown Abstract The main objective of this paper is to survey and evaluate the performance of the most popular univariate VaR methodologies, paying particular attention to thei Keywords: Value at Risk (VaR), Univariate methodologies, Performance evaluation, Risk Management Complexity vs Empirical Score Math Complexity: 6.5/10 Empirical Rigor: 8.0/10 Quadrant: Holy Grail Why: The paper involves advanced econometrics (CAViaR, GARCH, EVT) and Monte Carlo simulations, indicating high math complexity; its extensive simulation study with specific data-generating processes and performance comparisons provides strong empirical rigor. flowchart TD A["Research Goal: Evaluate performance of popular univariate VaR models"] --> B["Data Input: Daily Financial Return Series"] B --> C["Methodology: VaR Model Application<br/>Parametric, Historical, Monte Carlo"] C --> D["Computational Process:<br/>Backtesting & Performance Metrics<br/>Kupiec Test, Traffic Lights, Loss Functions"] D --> E["Key Findings:<br/>Model Suitability & Accuracy Outcomes<br/>Performance Rankings"]

February 25, 2003 · 1 min · Research Team