Agent Trading Arena: A Study on Numerical Understanding in LLM-Based Agents

ArXiv ID: 2502.17967 “View on arXiv”

Authors: Unknown

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in natural language tasks, yet their performance in dynamic, real-world financial environments remains underexplored. Existing approaches are limited to historical backtesting, where trading actions cannot influence market prices and agents train only on static data. To address this limitation, we present the Agent Trading Arena, a virtual zero-sum stock market in which LLM-based agents engage in competitive multi-agent trading and directly impact price dynamics. By simulating realistic bid-ask interactions, our platform enables training in scenarios that closely mirror live markets, thereby narrowing the gap between training and evaluation. Experiments reveal that LLMs struggle with numerical reasoning when given plain-text data, often overfitting to local patterns and recent values. In contrast, chart-based visualizations significantly enhance both numerical reasoning and trading performance. Furthermore, incorporating a reflection module yields additional improvements, especially with visual inputs. Evaluations on NASDAQ and CSI datasets demonstrate the superiority of our method, particularly under high volatility. All code and data are available at https://github.com/wekjsdvnm/Agent-Trading-Arena.

Keywords: Large Language Models (LLMs), Multi-agent trading, Agent Trading Arena, Visual reasoning, Reinforcement learning, Equities

Complexity vs Empirical Score

  • Math Complexity: 4.0/10
  • Empirical Rigor: 8.5/10
  • Quadrant: Street Traders
  • Why: The paper focuses on an experimental framework (Agent Trading Arena) with extensive empirical evaluation on real datasets (NASDAQ/CSI), providing code, and reporting performance metrics, indicating high empirical rigor. The mathematics is relatively low in complexity, primarily involving market microstructure simulation logic and basic reasoning patterns rather than advanced derivations or dense formulas.
  flowchart TD
    A["Research Goal<br>Understand & improve numerical reasoning<br>of LLM-based trading agents"] --> B["Agent Trading Arena<br>Virtual zero-sum market<br>Agents impact price dynamics"]
    B --> C["Inputs & Setup<br>Dataset: NASDAQ & CSI stocks<br>Inputs: Plain-text vs. Chart-based data<br>Agents: Base LLM & + Reflection Module"]
    C --> D["Core Process<br>Competitive Multi-Agent Trading<br>Reinforcement Learning & Simulation"]
    D --> E["Key Findings & Outcomes"]
    E --> F["1. Visual Input Superiority<br>Charts > Plain-text for numerical reasoning"]
    E --> G["2. Reflection Enhances Gains<br>Reflection module improves performance, especially with charts"]
    E --> H["3. Robustness in Volatility<br>Method excels in high-volatility markets"]